Landscape and flux for quantifying global stability and dynamics of game theory

Game theory has been widely applied to many research areas including economics, biology and social sciences. However, it is still challenging to quantify the global stability and global dynamics of the game theory. We developed a landscape and flux framework to quantify the global stability and global dynamics of the game theory. As an example, we investigated a model of three-strategy game: a special replicator mutator game termed as the repeated Prison Dilemma model. In this model, one stable state, two stable states and limit cycle can emerge under different parameters. The repeated Prisoner’s Dilemma system has Hopf bifurcation from one stable state to limit cycle state, and then to another one stable state or two stable states, and vice versa. We quantified the global stability of the repeated Prisoner’s Dilemma system and identified optimal kinetic paths between the basins of attractor. The optimal paths are irreversible due to the non-zero flux. We also quantified the interplay between Peace and War.


Introduction
Game theory is the study of conflict and cooperative strategic decision making between intelligent rational decision-makers. Game theory has widely been recognized to be important and useful in many fields such as economics, political science, psychology, computer science, biology etc. The game dynamics can usually converge to stable point attractors [1,2]. However, a more complex dynamics can emerge as stable oscillations. The cyclical oscillations have been explored in the game-theoretical models of price dispersion [3]. The cyclical oscillation has also been found in the side-blotched lizard with three color morphs in male signalling. They played three different alternative reproductive strategies(Uta stansburiana) [4]. Even though numerous studies have been explored over the past decades with great advances in this field, there are still several unsolved problems. But understanding the underlying the global dynamics and global stability of the game theory is still one of the greatest challenges at present.
The evolutionary stability was first introduced and formulated by Maynard Smith and Price in 1973 [5,6]. They first applied the game theoretical ideas to evolutionary biology and population dynamics. This is the birth of evolutionary game theory which studies the behaviors of the large populations [2]. Evolutionary population game is a general framework for exploring the strategic interactions among large populations of agents. The agents play pure strategies with random matching. The significant applications of the game theory are on modeling and analyzing varieties of human and animal behaviors around us. The game theory systems involve different interactions among the agents. The strategic interactions can lead to complex dynamics. The dynamics of biological structure and population behaviors often have high stability and robustness. Thus, one of the central problems is how to explore the global stability and robustness of the evolutionary game theory in a population.
There have been many studies on the stability of game theory [2,6,7]. However, most of the investigations are focused on the local stability. The purpose of local stability analogs is to uncover whether a system can restore to equilibrium under a small disturbance. An evolutionary stable strategy (ESS) is a strategy which is resistent to the invasions by a few mutants playing a different strategy in a population [6]. The system can move far from its ESS equilibrium since it is under continuous small perturbations from the mutations and the events by chance [7]. Thus, the ESS can not guarantee the global stability of the system. A Nash equilibrium (NE) is a result of a non-cooperative game [6]. Each player is assumed to know the other players' equilibrium strategy and the players gain nothing by altering only their own strategy. The NE can be stable or unstable. It is very similar to an ESS [6]. The evolutionary stable strategy (ESS) and Nash equilibrium (NE) are insufficient conditions of dynamic stability since they are only local criterions under small fuctuations [7]. It is often not clear whether the system will reach equilibrium from an arbitrary initial state or whether the system can be switched from one locally stable state to another. These dynamical issues depend on the global structure of the system. Furthermore, the link between the global characterization and the dynamics of the game theory systems is often lacking understanding the global stability of the game theory is thus still challenging at present.
Deterministic population dynamics can only describe the average dynamics of the system. Both external and intrinsic fluctuations are unavoidable [8]. The environmental fluctuations can influence the behaviors of population. The intrinsic fluctuations originated from mutations or random errors in implementations, can not be neglected in finite population. They may play an essential role in the dynamics of the system. The stochastic evolutionary game dynamics was first studied by Foster and Young in 1990 [7]. They defined the stochastic stability and the stochastically stable set which is a set of stochastically stable equilibrium states. It is assumed that the fluctuations approach to zero slowly in the long run(so called long-run equilibria) [7,9]. The stable state sets can be obtained by the potential theory [7]. However, the general approach for exploring the global stability of the game theory systems are still absent.
The researchers have also explored the game theory system with the method of Lyapunov function which can be used to study the global stability [7,9]. Certain analytical Lyapunov functions were found for some highly simplified game theory models [2]. However, it is still challenging to find the Lyapunov function for the general game theory with complex dynamics. In this study, we will provide a general approach to investigate the Lyapunov function. We will also develop a general framework for exploring the robustness and the global stability of the game theory systems.
In addition to the dynamics with simple convergent stable states, exploring the mechanism of the non-convergent behavior is even more important for understanding the nature of the dynamics for evolutionary game theory. This is because certain more complicated behaviors such as oscillations and chaos often emerge in real biological interactions [10]. The most well known model of evolutionary dynamics is the replicator model. The simplest replicator dynamics of three-strategy games can give arise to certain behaviors: sinks, sources and saddles or heteroclinic cycles for Rock-Paper-Scissors(RPS) game [10,11]. However, the replicator dynamics can not provide a stable limit cycle behavior [10,12]. On the other hand, Lyapunov stable equilibria of the replicator dynamics are Nash equilibria and ESS which are asymptotically stable [13,14]. Mutation effects can be included in order to promote the chances that players change from one strategy to another spontaneously. The selection and mutation model has been explored in population genetics for decades [6]. The replicator-mutator dynamics plays a key role in evolutionary theory [6,[13][14][15][16].
In this study, we will develop a non-equilibrium landscape and flux framework to quantify the global stability and robustness of evolutionary game theory. Conventional stability analysis of game theory (Nash equilibrium and ESS) only provieds a static view and local description. We will give a dynamical view and global quantification of the stability. We found the flux is an additional force as a signature and quantitative measure of the degree of detailed balance breaking or nonequilibriumness, which is not present in determining the global dynamics of the conventional game theory. Both landscape and flux determines the game dynamics. The landscape topography and kinetic transitions as well as optimal kinetic paths from one basin to another (local stable strategy) can provide the global quantification of the strategy switching process and functional stability. This is also not present in the current game theory. The global stability can also be systematically studied with the landscape and flux approach via Lyapunov functions. It is worth noticing that the Lyapunov functions are only found for special cases (one dimensional case) in the current game theory [7]. In present evolutional game theory, the driving forces have never been decomposed. The landscape and flux theory provides a framework to quantify each component of the driving forces and describe the global evolutionary game dynamics. We also explored non-equilibrium thermodynamics which is not covered in current game or evolutionary game theory.
The prisoner's dilemma is a famous example in game theory [2,6]. Two players might not cooperate, even though more profit can be earned if they cooperate. For example, the two gang members are arrested into a prison. They are not allowed to communicate with each other. The prosecutors lack sufficient evidence to convict them of a crime. Both Gang member A and gang member B will be in prison have 2 years if each betrays the other. Gang member A will be free and gang member B will be in prison for 3 years (and vice versa) if gang member A betrays member B but gang member B remains silent. Gang member A and gang member B will only be in prison for 1 year if both keep silent. This shows that betraying gains a greater reward than cooperation. Therefore, the two gang members tend to betray each other to gain more profits from their own consideration [2,6]. In reality, human beings often choose a cooperative strategy as keeping silent to avoid losing more profits. This can lead to a win-win situation. However, since both game players make decisions for the goals of gaining more profits, the win-win scenario for the best profit may not be realized in real life. In fact, this is common in public resource development and utilization, the market competition, and the environmental issues [2,6]. Therefore, when everyone is trying to maximize his or her own benefits, the profits gained from both sides are impaired. This is also a common situation in microeconomics when everyone maximizes his or her own benefit. This phenomenon challenges the conventional thinking. It raises some interesting questions such as how to avoid the prisoner's dilemma, cooperate, abide by the agreement and so on. Thus, personal best interest maximization is not necessarily the best strategy in an interactive world [2,6]! We use a representative Prisoner's Dilemma model-the repeated Prisoner's Dilemma model [13,[17][18][19] as an example to illustrate our general theory. There are three interactional strategies in this model: always cooperate simplified by ALLC; always defect simplified by ALLD; and tit-for-tat simplified by TFT. Fig 1 shows the schematics of repeated Prisoner's Dilemma model. ALLD players are the first winners with random initial strategies in the population. Then small numbers of ALLC players will invade and replace strategy ALLD. The consideration of mutation effect can lead the repeated Prisoner's Dilemma to a stable limit cycle state rather than a ALLD dominant state, even though ALLD is the only strict Nash equilibrium [6]. There existed sustained oscillations among ALLD, ALLC and TFT strategies in the repeated Prisoner's Dilemma model. We developed a landscape and flux landscape theory [20][21][22][23][24][25][26] to explore the global behavior and the dynamics of the repeated Prisoner's Dilemma game system. We quantified the population landscape related to the steady state probability distribution. It can determine the global behavior of the system. The landscape has the attractor basins for multi-stability games and Mexican hat for limit cycle oscillations games. We also found the intrinsic landscape with a Lyapunov function feature of the repeated Prisoner's Dilemma game model. It can be used to quantify the global stability of the system. The non-equilibrium evolutionary game theory dynamics for repeated Prisoner's Dilemma model is found to be determined by both the landscape and the curl flux. The curl steady state probability flux can result in the break down of the detailed balance and drive the stable periodical oscillation flow along the limit cycle ring [20]. We also explored the stability and robustness of the repeated Prisoner's Dilemma game against the mutation rate and the pay-off matrix. The optimal kinetic pathways between basins are quantified by the path integral method and irreversible due to the non-zero flux.

The landscape and flux quantification for game theory
The landscape and flux theory as well as the non-equilibrium thermodynamics for the general dynamical systems have been explored in several different fields [20][21][22][23][24][25][26][27]. They can be used to address the issues of global stability, function, robustness for dynamical systems. Here, we will apply the landscape and flux theory to quantify the global stability and robustness of the game theory.
We consider a large finite population of players who play a finite set of pure strategies with random matching. Each player chooses a pure strategy from the strategy set S = 1, 2, 3, . . ., n [1,2]. The aggregated behaviors of these players are described by a population state x, with x i representing the proportion of players choosing pure strategy S i . And x i represents the frequency of strategy S i . Due to the intrinsic and the extrinsic fluctuations [8], the deterministic dynamics described by a set of ordinary differential equations are supplemented with the additional fluctuation force. Then the stochastic dynamics emerges [28,29]: dx = F(x)dt + g Á d W, where x is the state vector representing the strategy density in game dynamics), F(x) is the driving force, W as a vector coupled through the matrix g represents an independent Wiener process. The evolution of the stochastic dynamics is thus more appropriately described by the probability evolution. The probability distribution P(x, t) in time can be obtained by solving the corresponding Fokker-Planck diffusion evolution equation [28,29]: We set DD = (1/2)(g Á g T ), where D is a constant describing the scale of the fluctuations and D represents the anisotropy diffusion matrix of the fluctuations. The Fokker-Plank diffusion equation describes the game dynamics as a conservation law of probability. The change in local probability is equal to the net flux in or out. The game process can be treated as a binomial sampling process: N players play the games with different strategies from a large population of players, just like n alleles in a diploid population of constant size N [24]. So we can set D ij = x i (δ ij − x j ) coming from the sampling nature of the game which is widely used in evolutionary population dynamics [24].
The matrix D ij has some special features [24]. The first is so that P n i¼1 ðr Á DÞ i ¼ 0. The second is its inverse matrix is known to have the feature [24]: where F n ¼ À P nÀ 1 i¼1 F i . The steady state probability distribution P ss can be derived from the long time limit of the Fokker-Planck equation @P/@t = 0. The steady state probability flux is defined as J ss = FP ss − Dr Á (DP ss ). The steady state flux is divergent free from Eq 1 and therefore a rotational curl. The population landscape is defined as U = − lnP ss . Then, the deterministic driving force F can be decomposed as: F = −DD Á rU + J ss /P ss + Dr Á D. The flux J ss = 0 denotes the equilibrium with detailed balance since net flux is zero while the net flux J ss 6 ¼ 0 measures the degrees of detailed balance breaking and therefore the degree of one-equilibriumness. Therefore, the dynamics of the equilibrium game system is determined only by the population landscape gradient. The dynamics of the non-equilibrium system is determined by both the potential landscape and non-zero flux. [20] The intrinsic landscape at the zero fluctuation limit has the the feature of the Lyapunov function [24,[30][31][32] and can be used to quantify the global stability and function of the game theory systems. The intrinsic landscape ϕ 0 follows the Hamilton-Jacobi equation as below [23,24,33]: and As seen, the intrinsic landscape ϕ 0 is a Lyapunov function monotonically decreasing along a deterministic path for the game dynamics [24,33]. The intrinsic landscape ϕ 0 quantifies the global stability for general dynamical systems including game dynamics either with or without detailed balance and can be solved by the level set method [34]. The steady state probability distribution P ss and the population landscape U have the relationship of P ss (N) = exp(−U)/Z. Z is the partition function Z = R exp(−U)dN. The entropy of the non-equilibrium game system is given as [24,[35][36][37][38] The energy can be defined as: Thus, the nonequilibrium free energy F is: The nonequilibrium free energy as the combination of energy and entropy reflects the first law of non-equilibrium thermodynamics. The free energy decreases in time monotonically while reaching its minimum value, F ¼ À DlnZ [24,[35][36][37][38]. This reflects the second law of nonequilibrium thermodynamics. The free energy as a Lypunov functional can be used to explore the global stability of the stochastic non-equilibrium systems with finite fluctuations. Game theory systems are often non-equilibrium open systems. They often exchange energy and information from the environments. This leads to dissipation. The evolution of the entropy in time can be separated into two terms [20,24,39,40] is entropy production rate which is positive or zero, and is heat dissipation rate which represents the entropy flow rate to the non-equilibrium system from the environments. It can be either positive or negative. The effective force is shown as F 0 = F − Dr Á D. _ S can be interpreted as the entropy change of the non-equilibrium system, and the _ S t can be interpreted as the total entropy change of the system and environments. _ S t is always non-negative according with the thermodynamic second law. The entropy production rate and heat dissipation rate are equal in steady state [20,24,39,40]. Thus, the entropy production rate is another global physical characterization of a nonequilibrium system.

The repeated Prisoner's Dilemma game theory model with mutations
We set A as the payoff matrix [6,[13][14][15][16]. The scalar A i (x) represents the payoff to strategy S i when the population state is represented by x. Since the sum of all frequencies is equal to 1: ∑ i x i = 1, the system becomes n − 1 dimensional. The average fitness (pay-off) " f is obtained as " f ¼ xAx by the players of the population [1,2,6]. The fitness denotes the individual's evolutionary success [9]. In the game theory, the payoff of the game is the fitness. The fitness to strategy i becomes f i = (Ax) i [1,2]. In this study, we use simple three-strategy game which can be reduced to two dimensional dynamics.
The players in standard Prisoner's Dilemma model play either cooperate strategy or defect strategy simultaneously [41]. The players earn their payoff depending on their choices of the strategies. The mutually aided cooperators will acquire the reward R when the cooperators are encountered. The defectors will obtain a punishment Pu when the defectors are encountered. A cooperator acquires a sucker payoff S and a defector acquires a temptation payoff T when they encountered [18,42]. The payoff matrix of the Prisoner's Dilemma model should satisfy the relationship T > R > Pu > S. Mutual cooperative strategy is better than mutual defective strategy, so the reward R should be greater than the punishment Pu. Defectors gain more temptation T than the reward R that cooperators gain if the partner cooperates. The defectors will lose less punishment Pu than the lost sucker S of cooperators if the partner defects.
We explored a repeated Prisoner's Dilemma game model with mutation using the replicator dynamics. Replicator dynamics was introduced by Taylor and Jonker (1978) [43], which is the best-known dynamics in the models of biological evolution. The mean dynamic evolutionary equation is the replicator dynamics shown as below [1,2] In the presence of mutations, ALLD and ALLC players may mutate to strategy TFT. Therefore, more players choose TFT strategy, and help ALLC players to win the game. Then the ALLD players will emerge again to obtain more profit [6]. This dynamics can be used to quantify the peace and war oscillation [6].
We define the probability that agents with strategy S i mutating to strategy S j as q ij , which satisfies ∑ j q ij = 1. Thus, mutation matrix is Q = [q ij ] [6,13,17,19]. The elements of the mutation matrix Q are defined in terms of a mutation parameter μ satisfying 0 μ 1. The mutation μ denotes the error probability in the process of replication. μ = 0 denotes no mutation with perfect replication while μ = 1 denotes the mutation completely. The replicator-mutator dynamics describes the dynamics of the population distribution x as a result of replication driven by fitness f and mutation driven by Q [6,15,16]: This equation is the quasi-species equation proposed by Manfred Eigen and Peter Schuster [6,15,16]. We set a uniform probability of mutation from one strategy to another strategy with q ii = 1 − 2μ which shows the players with the same strategy can get profits 1 − 2μ for each player, and q ij = μ which shows the player chooses strategy i will get profits μ when he encounters the player with strategy j. Then the matrix Q can be shown as follows [6,15,16]: Let x 1 , x 2 , x 3 represent the fractions of the population choosing the strategies ALLD, TFT, ALLC, respectively. We substitute Q into Eq 8, then the replicator-mutator dynamics are shown as the following simplified equations [18,42]: We considered the players playing with infinite number of rounds. So in these limits, the average payoff matrix with the cost for the strategies ALLD, TFT, ALLC are shown as [17,18]: The mutually aided cooperators will acquire the reward R when the cooperators are encountered. The defectors will obtain a punishment Pu when the defectors are encountered. A cooperator acquires a sucker payoff S and a defector acquires a temptation payoff T when they encountered [18,42]. And we set there is a small complexity cost c to playing strategy TFT [18,42].
TFT strategy is conditional while ALLD and ALLC strategies are unconditional. Thus, the payoff value of strategy TFT may have a small complexity cost [17,44].
The parameters are set as: T = 5,R = 3,Pu = 1, S = 0 [6,18,45]. μ is the average mutation rate between each two of the three strategies. c is a complexity cost for playing TFT. Larger c represents less players who play the strategy TFT. Fig 2(A) shows the phase diagram which shows a S shape for the repeated Prisoner's Dilemma with the constant parameter μ = 0.006 and changing parameter c. There are five regions in this phase diagram. When the cost for TFT c is smaller, the game theory system has only one stable state, which denotes the Peace state in the peace and war game shown in the left Region I. As c increases, the stable state becomes a limit cycle in Region II. Then, as c increases further, another new stable state (can be viewed as War state) and an unstable saddle state emerge beside the limit cycle. This is a saddle-node bifurcation shown in Region III. As cost c increases furthermore, the limit cycle diminishes and becomes an unstable state, along with a stable state (War state) in Region IV. As cost c increases even further, there is only one stable state (War state) in the right Region I.

Phase diagram, Hopf bifurcations and the global stability of evolutionary game dynamics upon TFT cost changes
It is interesting to note that the two-phase dynamics of p53 in the DNA damage response was intensively explored by Zhang et al [46]. They revealed that a sequential predominance of distinct feedback loops can lead the system to multiple-phase dynamical behaviors [46]. This shows another similar behavior as our game theory model where the variation of the parameters can also lead the system to multiple-phase dynamical behaviors. It is clear that the environmental changes and internal variations can lead the system to phase transitions. Fig 2C shows the entropy production rate (EPR) versus cost c. We can see EPR has a bell shape in Region II and Region III under the limit cycle behavior of the system. It indicates that the limit cycle costs more energy to maintain its coherent oscillation.
By solving the Fokker-Planck diffusion equation, we obtain the steady distribution of the probability. Thus, the population landscape of the game system can be obtained as: U = −lnP SS . We solved the Hamilton-Jacobi equation to obtain the intrinsic landscape ϕ 0 by the level set method [34]. . We notice that the oscillations between Peace and War can be explained as: when Peace state sustains for a long time, the population increases and the resources relatively reduce. In order to survive, the populations fight for the resources, to get better livings. After a war, the populations do not engage in the production, livelihood, and fall into a long-term state of tension. Then the Peace state emerges again, the population regrows. The oscillations will thus circulate. −rU is the negative gradient of the population landscape and −rϕ 0 is the negative gradient of the intrinsic landscape. −rU at the top row and −rϕ 0 at the bottom row are represented by black arrows. J ss /P ss is the steady state flux divided by steady state probability and V = J ss / P ss )| D!0 is the intrinsic flux velocity. The flux velocity satisfies V Á rϕ 0 = 0 [24]. This shows that the intrinsic flux velocity is perpendicular to the gradient of the non-equilibrium intrinsic potential ϕ 0 in the zero-fluctuation limit [24]. J ss /P ss at the top row and V at the bottom row are represented by purple arrows in Fig 3. We can see that the flux with purple arrows are nearly but not exactly perpendicular to the negative gradient of U with the black arrows around the basins or the closed ring valley shown on the top row. The region with higher population potential has some disordered oriented arrows due to the lower probability and limit of the computational accuracy. The flux velocities with purple arrows are exactly perpendicular to the negative gradient of ϕ 0 with black arrows at the bottom row. It is due to V Á rϕ 0 = 0. The landscape's gradient force rU or the rϕ 0 attracts the system down to the basin or the closed ring valley, while the flux drives the periodical oscillation flow or spiral descent to the basin. It is necessary to characterize this non-equilibrium repeated Prisoner's Dilemma with both landscape and flux.
We show the 2 dimensional population landscape U for increasing parameter cost c with constant μ = 0.006 in Fig 4. Fig 4(A) shows that the population landscape U has a stable basin near the middle of TFT axis with the cost parameter c = 0.1. It means that most players choose strategies of ALLC and TFT when the cost parameter for TFT is small. The fluxes represented by the purple arrows rotate anticlockwise around this stable state. This state can be viewed as "Peace" state in peace and war game. As the cost parameter for TFT increases to c = 0.2, a limit cycle emerges and replaces the stable state in Fig 4(B). The population landscape has a blue ring valley along the deterministic trajectory. More players choose TFT and ALLC in the Peace state. As more players mutate to ALLC, a small number of ALLD players emerge. This leads to a state with more ALLD players. As the ALLD players become more, the profit obtained from the game becomes less. Some ALLD players convert their strategy to TFT. This makes a circle in the state space of strategy probability. Notice that the ring valley is not homogeneous in landscape depth. There is a deeper area on the left side of the limit cycle, which is still close to TFT axis. This indicates that the Peace state with deeper depth is more stable than other states. The system will stay in Peace state much longer than any other state. Fig 4(C) shows the ring valley of the oscillation expands its amplitude in the strategy-frequency space. A stable state War emerges at the right corner of the triangle, which is close to ALLD ! 1. The stable War state is the one where most of players choose the ALLD strategy. We can see that the limit cycle and the stable state coexist in the strategy-frequency space. It indicates that the system is sometimes in the limit cycle and sometimes in the stable state. The game system can switch between the Peace and War attractor basins under the fluctuations and the mutations. When the cost for TFT increases to c = 0.24 and c = 0.25 shown in Fig 4(D) and 4(E), the ring valley becomes shallower and shallower while the basin of the stable War state becomes deeper and deeper. As c increases further, the profits obtained from the TFT strategy decreases in the whole game, more players give up TFT strategy and choose ALLD strategy to earn more, which leads to more stable and deeper War state basin. When the cost for TFT increases to c = 0.35 shown in Fig 4(F), the oscillation ring valley disappears while the War state survives and becomes deeper and more stable. Fig 4 also shows the changes in both direction and the scale of each flux at the area with higher probability (purple arrows) when the cost c increases. We can see that the fluxes have a anticlockwise rotational nature along the limit cycle. The flux is the driving force for the stable oscillation in game theory.
We quantify the landscape topography and show barrier heights versus cost c with mutation parameter μ = 0.006 in Fig 5(A). We first set U o as the value of population landscape U at the maximum point at the center island of the limit cycle. U s is the value of population landscape U at the saddle point between the limit cycle valley and the stable War state basin. U p is the minimum value of population landscape U along the limit cycle near y axis, which is the Peace state. U w is the minimum value of population landscape U at the stable War state. We set the barrier height for the oscillation ring valley as ΔU Limit = U o − U p , the barrier height between the saddle point and the oscillation as ΔU sp = U s − U p and the barrier height between the saddle point and the War stable state as ΔU sw = U s − U w . We can see as the cost c increases, barrier height ΔU sw increases, barrier height ΔU sp decreases first then increases, barrier height ΔU Limit increases first then decreases. It indicates that the oscillation itself relative to the maximum point in the center of limit cycle becomes more stable first then becomes less stable. It has a turning point during the process of c increasing. The War state becomes more robust, and the barrier height from oscillation to War state ΔU sp becomes less than that of ΔU sw with larger cost c value. This implies that the War attractor state becomes more preferred than that of the oscillation, as the cost increases further. There are four regions in this phase diagram. When the mutation rate μ is small, the system has a stable state (can be viewed as War state) in Region IV. As μ increases, a stable state War, an unstable saddle state and the limit cycle coexist in Region III. As the mutation rate μ increases further, the stable state and saddle point disappear after the saddle-node bifurcation. There is only limit cycle left in Region II. As the mutation rate μ keeps on increasing, the limit cycle becomes a stable mixed state (with moderate probability of more than 20% of the players with ALLD strategy) in Region I. The mixed state is the combination of these three strategies. We can see entropy production rate characterizing the heat dissipation EPR shown in Fig 2(D) has a bell shape as the mutation

Phase diagram, state switching, landscapes and fluxes upon mutations
U o is the value of population potential landscape U at the maximum point on the center island of the limit cycle. U s is the value of population potential landscape U at the saddle point between the limit cycle valley and the stable state basin War. U p is the minimum value of the population potential landscape U along the limit cycle near the y axis, which is the Peace state. U w is the minimum value of population potential landscape U at the stable state War. rate μ increases when the cost c is moderate. This is because that the state of oscillation costs more energy in the strategy probability state space and one stable state costs less energy.
We can also see this process of transition in Fig 6 which shows that the population landscape U and the flux change with the increasing mutation rate μ at constant cost c = 0.22. Fig  5(B) shows that ΔU sp increases and ΔU sw , ΔU Limit decrease as mutation rate μ increases. This shows that when mutation rate is small, the limit cycle ring valley is very stable relative to its oscillation center, but has less probability relative to the War state since the War state is much deeper. It indicates that more players choose strategy ALLD, and the players do not like to mutate to the other two strategies. This leads to more stable War state. As μ increases, the limit cycle ring valley becomes less stable relative to its oscillation center, but becomes more stable relative to the War state. Eventually a much more stable state Peace emerges. The state War becomes shallower and less stable, and finally diminishes as μ increases.   T is the temptation payoff earned by the defector while the cooperator gets sucker payoff S. There are three regions in this phase diagram. When the temptation T is small, the game theory system has two stable states which are denoted as Peace and War in the peace and war game shown in Region left V. As the temptation T increases, a limit cycle emerges with the coexistence of War state in Region III. As the temptation T increases further, the limit cycle diminishes and a stable mixed state emerges, along with a stable state (War state) at the right of Region V. We can see EPR shown in Fig 7(D) also has a bell shape as the temptation T increases. This also implies that limit cycle state cost more energy to maintain its oscillation.

Phase diagram, state switching, landscapes and fluxes upon temptation payoff
We show that the population landscapes U and the flux change with the increasing temptation T of payoff matrix (which denotes a defector acquires a temptation payoff T when they encounter cooperators) in Fig 8. The game system has two stable states when temptation is very small as T = 3. Note that the relationship T > R should hold in this model. The game system covers a large area of state space with a Peace state and a War state. We can see that the basin of Peace state is very stable and deeper while the War state is much shallower and less stable. This illustrates that majority of players choose the cooperation ALLC or TFT strategy. It leads to a more stable Peace state, rather than War state with defection strategy ALLD, when the temptation is less. Defection can not earn more profits. As temptation increases, the Peace state is unstable and becomes a limit cycle state. The limit cycle state adopts the mixed strategy TFT and ALLD. The limit cycle valley representing the Peace state becomes shallower while the War state becomes more stable shown in Fig 8(B), 8(C) and 8(D). When the temptation for this game increases even further, more and more players choose defection strategy ALLD to earn more profits rather than the strategy ALLC. It reflects that more temptation from the defection strategy can lead to more stable War state. The element R of payoff matrix denotes the reward that cooperators will acquire from the mutual aid when the cooperators encounter. The value of reward R should satisfy R > (T + S)/2. If this relationship is not satisfied, the agreement of alternate cooperation and defection will earn more payoff than that of pure cooperation in a repeated Prisoner's Dilemma game [6]. There are three regions in this phase diagram. When the reward R is small, a limit cycle and the stable War state coexist in Region III. As the reward R increases, the War state disappears after the saddle-node bifurcation and the limit cycle is left in Region II. As the reward R increases further, the limit cycle diminishes and a stable Peace state emerges in Region I. Fig 7(E) also shows EPR versus reward R, which implies that the limit cycle state costs more energy than that of one stable state. Fig 9 shows the population landscape U with increasing cooperation R. The system has one deep stable War basin state and a shallower limit cycle ring valley for small parameter R = 2.5, since the reward for cooperation strategy is small. The majority players choose the defection strategy leading to the War state shown in Fig 9(A). When the reward increases to R = 3, a limit cycle valley becomes deeper and stable while the War state becomes shallower and less stable as shown in Fig 9(B). It shows that more and more players prefer the strategy ALLC rather than strategy ALLD since more reward can be obtained from the cooperation. As reward R increases further, the limit cycle valley becomes deeper, and the War state vanishes shown in Fig 9(C). As reward R increases even further, the limit cycle ring valley shrinks into a stable Peace state shown in Fig 9(D). The majority players choose cooperation strategy. It is Landscape and flux for quantifying global stability and dynamics of game theory because more reward from the cooperation strategy can lead to more stable Peace state. It turns out that more reward from cooperation leads to Peace with win-win outcome. Landscape and flux for quantifying global stability and dynamics of game theory saddle-node bifurcation in Region IV. As the punishment Pu increases further, a limit cycle emerges and coexists with the stable state War in Region III; as the punishment Pu increases even further, the limit cycle diminishes and a stable Peace state coexists with a stable War state in Region V. As the punishment Pu approaches to 3, the stable Peace state diminishes after the saddle-node bifurcation, and the stable War state is left in Region I on the right . Fig 7(F) also shows EPR versus punishment Pu, which shows that the system with limit cycle dissipates more energy. Fig 10 shows the population landscape U with increasing element Pu of payoff matrix. When the punishment Pu is small representing that the defector can earn less from the game and almost all players choose strategy ALLD shown in Fig 10(A). ALLD is the only strict Nash solution and the only evolutionary stable strategy. And when the punishment (Pu = 0.3) is small, the value of the first element in the left column of the payoff matrix for ALLD is Pu while the second of that for TFT is Pu − c = 0.08. Thus the fitness for ALLD is much higher than that of TFT, and that of ALLC (The sucker's payoff S = 0). As punishment Pu increases, more and more players give up ALLD since the profits of TFT players are catching up with that of ALLD players when they both encounter the ALLD players. As the punishment Pu increases to Pu = 0.5, a shallow limit cycle valley emerges accompanied with the stable War state shown in Fig 10(B). When the punishment Pu increases further, the limit cycle valley shrinks its size but becomes deeper and more stable while the War state becomes shallower and less stable. This is shown in Fig 10(C), 10(D) and 10(E). Then the limit cycle ring valley shrinks to a stable Peace state where more players choose TFT strategy as shown in Fig 10(F), 10(G) and 10(H). This shows that as the punishment Pu is increased further, the Peace state becomes more stable first and then loses its stability while the War state becomes more stable. It demonstrates that the punishment Pu has an optimal value to lead to a more stable Peace state. At last, when the value of the punishment Pu approaches to the value of reward R, the stable Peace state diminishes and the stable War state is left in Fig 10(I). This shows that when the punishment Pu and the reward R are almost the same, ALLD is dominant than TFT since ALLD is the only strict Nash equilibrium solution and the evolutionary stable state. These results show that the strategy ALLD can be a dominant strategy and has an advantage for selection.

Phase diagram, state switching, landscapes and fluxes upon defector punishment
We explored the free energy versus the parameters of the repeated Prisoner's Dilemma game model in We can see this five free energy profiles have some similarity that each has the opposite tendency with that of the corresponding EPR. These free energies link to the different phases and phase transitions. The first derivative of the free energies are discontinuous at the transition points from a stable state to a limit cycle oscillation state and vice versa [24]. This implies that non-equilibrium thermodynamic phase transition has certain similarities as that of the equilibrium thermodynamic phase transition. Free energy profiles can manifest the phase transitions and can be used to explore the global stability and robustness of the game system.

Kinetic speed and optimal paths of switching between the two stable states
We also explored the kinetic optimal paths of the repeated Prisoner's Dilemma game model. We can obtain the relative probabilities of each path by the quantification of the path weights with path integrals. Path integral weights can be calculated by the action, which is analogous to the classical mechanical systems. The dominant optimal paths with the largest weights can be viewed as the major pathways. The path integral for the probability of (x final , t) starting at initial condition (x initial , 0) is given as [21,24]: The above probability describes the chance of starting at the state x initial at initial time and ending at the state x final at the final time. The probability is the result of the sum of the weights from all possible paths. A(x) is the action for each path. Each weight is exponentially related to the action which has two contributions, one from the stochastic equation of motion for the dynamics and the other is from the variable transformation from the stochastic force to the system variable. Not all the paths contribute equally to the weight. Due to the exponential nature, the optimal path is exponentially larger in weight than the suboptimal ones. Therefore, we can identify the optimal path with the most probable weight.
We studied the repeated Prisoner's Dilemma game model with the parameters μ = 0.006, c = 0.22, T = 5, R = 3, Pu = 2.4, S = 0.0, which has two stable state Peace and War. Fig 11 shows the optimal paths on the population landscape U with different diffusion coefficient D. We can see there are two stable states: War and Peace on the population landscapes. The purple lines represent the optimal paths from the War state to Peace state. The black lines represent the optimal paths from the Peace state to War state. The white arrows represent the steady state probability fluxes which guide the optimal paths apart from the steepest descent path from the landscape. Therefore, the optimal path from War state to Peace state and the optimal path from Peace state to War state are apart from each other. Under more fluctuations (bigger diffusion coefficient D shown in Fig 11(B)), the two optimal paths are further apart from each other due to larger probability fluxes than those in Fig 11(A). We can see the purple lines and black lines are irreversible in both two sub figures. The optimal paths are deviated from the naively expected steepest descent paths based on the potential landscape. These lines are apart from each other due to the non-zero flux. We can clearly see the fluxes have spiral shapes which show the dynamic feature of non-equilibrium system.
We show the population landscape U and the optimal paths with different parameter temptation T at the constant parameters μ = 0.006, c = 0.22, R = 3, Pu = 2.4, S = 0.0, D = 5 × 10 −4 in Fig 12. The purple lines represent the optimal paths from the War state to Peace state. The black lines represent the optimal paths from the Peace state to War state. The white arrows represent the steady state probability fluxes. The optimal paths are deviated from the naively expected steepest descent optimal paths based on the potential landscape, and they are irreversible. The two optimal paths become more closer from each other as T increases. Fig 13(A) shows the barrier heights versus temptation T. We set ΔU sp = U saddle − U Peace as the barrier height between state Peace and the saddle point and ΔU sw = U saddle − U War as the barrier height between state War and the saddle point, where U saddle represents the value of landscape U at the saddle point between state Peace and War, U Peace represents the minimum value of landscape U at state Peace, U War represents the minimum value of landscape U at state War. We can see that the barrier height ΔU sw increases while the barrier height ΔU sp decreases. It shows the Peace state loses its stability as the War state becomes more stable as the temptation T increases. It denotes that the temptation guides more players to choose strategy ALLD. The path weight represents the probability of each route. The path probability can be obtained by action A(x) between Peace and War. We labeled A wp as the action of the dominant optimal path from War state to Peace state, and A pw as the action of the dominant optimal path from Peace state to War state. Fig 13(B) showed the logarithm of the probability of the Peace to War optimal path divided that of War to Peace optimal path increases as the temptation T increases. This shows that the optimal path from Peace state to War state has more weight or chance than that of War state to Peace state due to the losing stability of Peace state which lowers the barrier from Peace state to War state. This originates from the increasing temptation which guide more players to choose strategy ALLD.
We show the population landscape U and the optimal paths with different parameter reward R at the constant parameters μ = 0.006, c = 0.22, T = 5, Pu = 2.4, S = 0.0 in Fig 14. The optimal paths do not follow the steepest descent paths based on the potential landscape. The two optimal paths depart far away from each other as R increases. Fig 13(E) shows the barrier height ΔU sw decreases while the barrier height ΔU sp increases as reward R increases. Landscape and flux for quantifying global stability and dynamics of game theory showed the logarithm probability of Peace to War optimal path divided that of War to Peace optimal path decreases as reward R decreases. This shows that as the reward increases, more and more players choose strategy ALLC which makes the Peace state more stable and the War state less stable. Thus, the optimal path from state War to state Peace has more probability or weight (chance) than the opposite path.
We show the population landscape U and the optimal paths with different parameter punishment Pu at the constant parameters μ = 0.006, c = 0.22, T = 5, R = 3, S = 0.0 in Fig 15. The two optimal paths do not follow the steepest descent optimal paths based on the potential landscape and come closer from each other as Pu increases. Fig 13(I) shows that the barrier height ΔU sw increases while the barrier height ΔU sp increases first and then decreases as punishment Pu increases. Fig 13(J) showed that the logarithm of Peace state to War state optimal path probability divided that of War state to Peace state optimal path decreases first and then increases as punishment Pu increases. This shows that as the punishment increases, more players choose strategy ALLC and ALLD which makes the Peace state and War state both become Landscape and flux for quantifying global stability and dynamics of game theory more stable. As the punishment grows bigger, the War state becomes more stable than that of the Peace state. This shows that the punishment Pu has an optimal value to make the Peace state stable.
We explored the escape time for the evolutionary game system. We used the following equation to determine the escape time τ: F Á rτ + D Á r 2 τ = −1 [28]. The escape time can be viewed as the average time spent for a system from one state to another [24]. We can set the mean first passage time (MFPT) τ pw representing the MFPT from Peace state to War state and τ wp representing the MFPT from War state to Peace state. Fig 13(C), 13(G) and 13(K) shows that the logarithm of the MFPT increases as their corresponding barrier heights increase corresponding to Fig 13(A), 13(E) and 13(I). The logarithm of the MFPT and barrier heights have the positive correlation as: τ exp(ΔBa). The average kinetic speeds of the state switching along the corresponding optimal paths can be measured by the 1/τ. As the barrier height becomes higher, the escape time becomes longer and the kinetic speed becomes slower. Therefore, the state is more stable with higher barrier height. It is more difficult to switch from one basin of attraction to another with higher barriers between two stable states. It takes more time for switching and the kinetic speed is slower. The MFPT, kinetic speed and barrier height can provide the measurements for quantifying the stability of game theory systems. They can also give more quantitative information about the dynamics of a game theory system. This can help to uncover the underlying mechanisms of the state transitions in the game theory systems. Fig 13(D), 13(H) and 13(L) shows the entropy production rate versus parameters T, R, Pu among the phase range of the two stable states. EPR decreased as temptation T increases, while Peace state loses its stability and War state becomes more stable shown in Fig 13(D). EPR increased as reward R increases, while War state loses its stability and Peace state becomes more stable shown in Fig 13(H). EPR decreased as punishment Pu increases, while Peace state loses its stability and War state becomes more stable shown in Fig 13(L). We can clearly see that the system with dominant Peace state will cost more dissipation quantified by EPR than that with dominant War state. In order to illustrate this phenomenon, we should explore the linear stability of the repeated Prisoners' Dilemma game model. The eigenvalues of Peace state have negative real parts and two opposite imaginary parts. This indicates that Peace state is a stable focus which oscillates and spirals to its destiny. Stable focus can switch to an unstable focus which represents a limit cycle. The eigenvalues of War state have negative real parts and no imaginary parts. This indicates that War state is a stable node. Since the stable focus costs more dissipation. This shows that keeping the peace will consume more energy.

Discussion and conclusion
Global stability and the underlying mechanism of the dynamics are crucial for understanding the nature of the game theory. Foster and Young presented the analysis of stochastic perturbation of evolutionary game dynamics, defining the stability for a stochastic dynamics [7,9]. It is viewed as a way of capturing the long-run stability of the stochastic evolutionary game dynamics rather than the evolutionary stable strategy and the Nash equilibrium [7,9]. They also introduced the idea of a potential function which can be used to compute the stochastically stable set. However, their method can only obtain the potential function in one dimension often in equilibrium. In reality, the game systems are often more complex and in high dimensions. The evolutionary game dynamics are also in general of non-equilibrium. It is difficult to obtain the analytical potential functions to capture the stability of the evolutionary game systems in higher dimensions, since a pure gradient of potential landscape cannot be directly obtained for general evolutionary game dynamics.
Many researchers tried to explore the stability of the evolutionary game systems [6,7,9,17]. Some chose the simulations of the trajectories under fluctuations [7,17]. The stability of the repeated Prisoner's Dilemma game can be quantified by the Lyapunov function. But the Lyapunov function cannot be found easily. In this work we developed the landscape and flux theory for quantifying the population and intrinsic landscape of the game theory dynamics. The intrinsic landscape ϕ 0 has a Lyapunov feature which can be used to explore the global stability of the game theory systems. We obtained the numerical Lyapunov function ϕ 0 by solving the Hamilton-Jacobi equation and the population potential landscape U from the Fokker-Plank diffusion equation. Thus we can explore the global stability of the game theory system by the intrinsic landscape ϕ 0 and the population potential landscape U. The repeated Prisoner's Dilemma game system is a non-equilibrium system. The underlying non-equilibrium driving dynamics of the game system is determined by both the force from the gradient of the landscape and the force from the steady state probability flux which breaks the detailed balance. This provides a new view to explore the game theory dynamics. In conventional evolutionary game dynamics, the flux is not considered. Here we point out that both the landscape and flux are necessary to study the evolutionary game dynamics.
The barrier height and the entropy production rate can be used to quantify the global stability of the non-equilibrium repeated Prisoner's Dilemma game. The irreversible optimal paths can be evaluated by the path integral method. The optimal paths are not along the gradient of the landscape due to the non-zero flux. We also quantified the mean first passage time which can measure the kinetic speed of the dynamics of switching from one state to another.
We have found that when the cost for TFT which reduces the values of the element of TFT in payoff matrix is small, and thus the values of payoff elements for TFT are large, the game system approaches to Peace state easily. As the cost c increases further, the game system will go to the War state since the profits from the TFT strategy is much less. We have also found that when c is small, high mutation rate will lead to the Peace state to be far from War state.
When c is moderate, high mutation rate will lead to a mixed strategy state which has almost the same probability of these three strategies. This leads the game system far from War state. However, as the cost c is larger, the system will fall into War state either with low mutation rate or high mutation rate.
We have also found that moderate intensity of punishment for defection strategy (moderate value of parameter Pu) decreases the stability of War state. More reward for cooperation strategy (high value of parameter R) prefers the Peace state. More temptation for the defector from the cooperator will prefer War state to earn more profits using defection strategy. Thus choosing a moderate intensity of punishment for defection strategy and increasing the intensity of reward for cooperation strategy will avoid the lasting War state, and favor the long lasting Peace state.
We provided a path integral method to identify and quantify the optimal paths between each two stable state. The optimal paths between Peace states and War state are irreversible due to the non-zero flux which is the characteristic for non-equilibrium system. The probability of each optimal path can also give us the information about stability. More stable state has less probability of the corresponding path to escape from its attraction. More time will be spent to escape from more stable state with higher barrier heights as shown from the behavior of MFPT. Thus the speeds of swithing between stable states become slower. We have also shown that the game system with dominant stable Peace state has more EPR. This shows that keeping peace will cost more energy.
Our method can provide a way to identify and quantify the optimal paths (the process) and the kinetics (speed) with their corresponding barrier heights (global landscape topography) for game theory systems. Quantitative study as this will help us to find ways to elongate peace and prevent war.
Since the stochastic game theory dynamics is more difficult to explore analytically, we developed a potential-flux framework to explore and quantify the stochastic game theory dynamics. The investigations of the global stability are essential for understanding the nature and the underlying mechanisms of the game theory dynamics. We show this in an example of repeated Prisoner's Dilemma game system. This can help the further understanding of the game theory for the real world.