Using the Logit quantal response form as the response function in each step, the original definition of static quantal response equilibrium (QRE) is extended into an iterative evolution process. QREs remain as the fixed points of the dynamic process. However, depending on whether such fixed points are the long-term solutions of the dynamic process, they can be classified into stable (SQREs) and unstable (USQREs) equilibriums. This extension resembles the extension from static Nash equilibriums (NEs) to evolutionary stable solutions in the framework of evolutionary game theory. The relation between SQREs and other solution concepts of games, including NEs and QREs, is discussed. Using experimental data from other published papers, we perform a preliminary comparison between SQREs, NEs, QREs and the observed behavioral outcomes of those experiments. For certain games, we determine that SQREs have better predictive power than QREs and NEs.
Citation: Zhuang Q, Di Z, Wu J (2014) Stability of Mixed-Strategy-Based Iterative Logit Quantal Response Dynamics in Game Theory. PLoS ONE 9(8): e105391. https://doi.org/10.1371/journal.pone.0105391
Editor: Luo-Luo Jiang, Wenzhou University, China
Received: May 14, 2014; Accepted: July 18, 2014; Published: August 26, 2014
Copyright: © 2014 Zhuang et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The authors confirm that all data underlying the findings are fully available without restriction. Please contact J.W. (firstname.lastname@example.org) for data from our own simulation. Experimental data are from the study of “Learning and equilibrium as useful approximations: Accuracy of prediction on randomly selected constant sum games” whose authors may be contacted at email@example.com.
Funding: This paper was supported by the Fundamental Research Funds for the Central Universities, China. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Game theory has become a powerful and popular tool in many sociological studies. Although several studies have questioned predictive power of the Nash equilibrium (NE) , , it has been used as a primary game solution since its initial proposition , . However, the questions of finding such NEs and refining them when multiple NEs exist are not easy tasks , . Furthermore, people are interested to know how, in experiments or real-life observations, one “preferred” NE emerges from all possible strategy profiles, particularly in a population that does not begin with a NE as the initial strategic state. This phenomenon is the well-known question of learning in games and the converging towards particular solutions , .
To find game solutions with predictive power, in addition to first searching for all NEs and then refining them , dynamic processes have been proposed to describe, mimic or reproduce to a certain extent the strategic thinking processes of game players in the hope that certain long-term solutions of the dynamic processes will lead to the “preferred” NEs , . Well-known examples of such dynamic process include replicator dynamics –, Logit learning , and fictitious play , . In certain cases, a refined NE fits the experimental data well. We refer to such an NE as the preferred NE. In this case, a proposed evolutionary model is a good theory if the model predicts that long-term solutions of the corresponding dynamic processes converge to the refined NE. Alternatively, in other cases, no NE can explain the observed behavior in real experiments. In this case, a good theory means that long-term solutions of the proposed dynamic processes can explain the observed behavior instead of the NEs . To simplify our terminology, we denote both the NE and the long-term solution in these cases, where they are capable of describing experimental or real-life observations, the preferred NE. The primary goal of these typical dynamic processes, and thus of all of these theories, is to determine the preferred NE by solving for the long-term solutions of the proposed dynamic processes. For a dynamic process, usually two central topics should be discussed: how well experimental observations can be explained by long-term solutions of the dynamic process and the relation between the dynamic process's long-term solutions and other solution concepts such as NEs and refined NEs.
In this manuscript, we study properties of a new dynamic process: the iterative Logit quantal response dynamics (ILQRD), which will be defined based on the concept of static Quantal Response Equilibriums (QREs). Our goal of proposing this new dynamic process is solely to capture the preferred NE with long-term stable solutions of ILQRD, which we denote as stable QREs (SQREs).
This manuscript is organized as follows. In this introduction, we first explain our main idea: the evolutionary process. In the next section, we define several notations and the dynamic process. There, we also compare the new dynamic process with other learning models and evolutionary processes in game theory. In the rest parts of this manuscript, we attempt to discuss the two previously introduced central topics of this ILQRD process: how the long-term solutions fit experimental results and what is the relation between its long-term solution and other solution concepts. Next, we illustrate the performance of our dynamic process using examples and provide an analytical proof of the major conclusions for the special cases of 2×2 symmetrical games. Then, we compare our SQRE with a highly similar solution concept: the quantal response stable solutions (QRSS). After that, using SQRE, we re-analyze some collected experimental results. Finally, we summarize our main conclusions and discuss possible future research.
Notations and Definitions
To present our formula in a compact form and to remain consistent with the notations and the terminology of statistical ensembles used in statistical physics, in this section, we introduce a matrix-based notation to represent a general N×M game. The key notation that differs from the conventional mathematical forms of game theory is the matrix representation of probability distributions and the payoff matrices of general N-player games. One may proceed directly to Eq. (18) and Eq. (19) and continue from there if learning these new notation presents an obstacle. Most of our expressions can be understood in terms of the conventional mathematics of game theory. However, we believe that they can be understood more conveniently using the new notation. Furthermore, the new notation is readily applicable to quantum games .
A new set of matrix-based notations
Here, we introduce a matrix-based notation for probability distributions such that the probability distribution of the strategic status of all players and the mathematical description of payoffs for games with an arbitrary number of players and an arbitrary number of strategies become matrices. However, to understand our work in this manuscript, this new set of notation is not necessary. One may skip this section and proceed directly to Eq. (18) and Eq. (19) .
Consider a 2×2 game with the following conventional form of payoff bi-matrix:(1)with the convention that the row (column) strategies belong to the first (second) player and the first (second) number of all entries represents the payoff received by the first (second) player. We denote the first (second) player's strategy C, D (also C, D, although one player's strategies can be totally different from the other player's strategies). Usually, mixed strategies, which include pure strategies as special cases, are written as column vectors: For example,(2)for player 1 and(3)for player 2. The payoff is calculated from the following vector-matrix-vector multiplication,(4)
For 2×2 games, the payoff matrix G indeed appears as a matrix. However, for a general N×M game, the matrix becomes a map from N vectors to , i.e., cubic tensors for 3-player games and T(N, 0)-type tensors for N-player games. The payoff is no longer a matrix.
One can confirm that with the same strategy profiles, Eq. (7) and Eq. (4) result in the same payoffs. For the special case of the 2×2 game, in the new formalism, the payoff matrices and the matrices of strategy state of all players are of dimension 22. The probability distribution of both players is defined as a direct product of each player's state matrix:(8)
Every entry of this state matrix corresponds to the probability of all of the players choosing the corresponding strategic combination defined by the position of the entry. For example, the (1, 1) (upper-left) entry of is p1p2 and means that at this probability, the two players take the strategic combination (C, C). In turn, the correspondence of this entry in the payoff matrices H1,2 – their (1, 1) entries – are naturally . In this sense, from the general expression,(9)
Hi can be interpreted as a linear map from the set of to real number .
Another useful feature of this notation system is that it streamlines the description of correlated strategies. That is, this notation also functions when . For an N×M game, stands for the lth strategy of the ith player, where and . The set of strategies of player i is denoted as . For convenience, we denote the set of probability distributions over as , i.e., and the direct product set of all of these probability distributions as , i.e., . This differs from the set of probability distributions over , which we denote as . This space includes the correlated strategy, whereas is the set of only independent strategies. For example, in this notation, a general possibly correlated equilibrium  can be defined as such that for every player i, ,(10)where is a partial trace, which performs the partial integral/summation over player i's strategy space. For example,(11)and the result is a strategy profile of player 2. This partial trace is the same as the partial summation in deriving partial distribution in probability theory. In our notation, NE is defined as or, equivalently, such that for ,(12)
Definition of iterative Logit quantal response dynamics (ILQRD) and its stable equilibriums
Using the previously described matrix-based notation, for a 2-player game our iterative Logit quantal response dynamics (ILQRD) is defined as follows:(13)(14)where the reduced payoff matrix is defined as(15)(16)
Using the matrix-based notations, one can straightforwardly extend this ILQRD to general N×M games.
To simplify our notation, we denote the RHS of Eq. (18) as , where a, b, c and d can be omitted when it is clear what the parameter a, b, c and d refers to. Formally, we denote this map as(20)and p1(t+1) can be regarded as an intermediate variable. This map is an iterative map from a mixed strategy (p2(t)) to a new mixed strategy (p2(t+1)).
The fixed points of this ILQRD are the same as the static Logit QRE, and they are denoted as . If a fixed point of those QREs is also the long-term evolution of ILQRD, this fixed point is referred to as a stable QRE (SQRE) and denoted as . Otherwise, it will be referred to as an unstable QRE (USQRE). Next, we focus on the relations among pure-strategy NEs, mixed NEs, QREs and SQREs and experimental data. A SQRE must be a QRE. However, the inverse is not necessary true. This stability test potentially differentiates QREs into SQREs and USQREs. In principle, such differentiation can improve the predictive power of QREs for experimental data and examining/demonstrating this is the whole point of the present manuscript.
Difference between our evolutionary process and other learning/imitating models
In ILQRD, a key concept is the use of the quantal response function (as in Eq. (13) and Eq. (14)) to determine a player's strategy profile according to the player's corresponding payoffs. The introduction of parameter β as a description of bounded rationality in this form of quantal response function is common in theories of learning in games  and in the QRE concept . Additionally, this idea may be proposed simply from the viewpoint of statistical physics . The use of parameter β can be justified to a certain degree based on games with limited information , .
In fact, the same expression used in Eq. (13) and Eq. (14) has been used in discussions of Logit QRE, and a highly similar expression has been used in Logit learning , , stochastic fictitious play  and stochastic reinforcement learning . In these theories, the quantal response function, as in Eq. (13) and Eq. (14) , is occasionally referred to as the smoothed best response. However, all of these theories differ from ours in principle, as explained bellow.
First, we compare our expressions with the QRE. In the QRE,(21)is a map from all players' strategy profile to itself. This equation is a fixed-point equation. Our work differs from the QRE in that a QRE only focuses on fixed points solved from static equations, whereas we use iterations to find the stable fixed points and distinguish them from other, unstable fixed points. Later, we will note that such a difference in stability is essential in applying the QRE to explaining experiments.
The QRE has been compared with experimental observation and generally provides a better fit to the data than the NE . However, the QRE has been criticized as an illusory improvement because there is an additional free parameter in QRE when fitting the curve, and one can always improve using an additional parameter. We demonstrate that this free parameter is not completely free. In fact, in certain cases, when the β is sufficiently large, the fixed points from the QRE are no longer stable. Therefore, when comparing experimental data with the stable/unstable QREs and the NEs, one can determine whether the QRE or the NE has more predictive power and then the QRE is not always better than the NE. Distinguishing stable QREs from unstable QREs using iterative dynamics is this manuscript's first contribution. As presented below, we collected experimental data and conducted a preliminary comparison of the theories with experimental observations. After distinguishing SQREs from QREs, we tested SQREs, QREs and NEs against several experiments reported in the literature, which is this manuscript's second contribution. One possible further investigation along this line, which has not been implemented in this work, can be applying our ILQRD to cross-game experiments. For example, using the same players in different games with similar level of payoffs, we can estimate the parameter β from one game and test it in other games. In principle, this should be even more interesting then simply testing SQREs against experimental results.
In a dynamic QRE , so-called Logit learning , which is driven by observations during real game-playing processes, in which each player chooses only one pure strategy to play every turn, the same smoothed best response function is used to mimic the player's response to the opponent's pure strategy, as follows:(22)
If the smoothed best-response function is replaced by the real best-response function then this becomes simply the best response dynamics. Our model differs from this in that states of players can be mixed strategies in our model while only pure strategies in this model.
In fact, this smoothed best-response function defines a probability transition matrix between the current strategy profiles of player i and the previous strategy profiles of all of the other players. This transition probability depends not on player i's previous state si(t−1) but on the previous states of all of the other players s−i(t−1). For simplicity, we express this probability as follows:(23)
If we let each player take his or her turn in the natural order, we will have a transition matrix between the current and the previous strategy profiles of all of the players. For simplicity, we denote this matrix as(24)
The first rule is referred to as alternating updating, whereas the second rule is referred to as simultaneous updating. Regardless of the form assumed, the central task is to determine the invariant probability distribution using the transition matrix M:(27)
To distinguish fixed-point solutions of this transition matrix from a QRE, these long-term solutions are occasionally referred to as end results of Logit response dynamics . Here, we name such solutions quantal response stable solutions (QRSSs). There have been many attempts to solve  or characterize  such a QRSS (Pss) for a given transition matrix M. Because a QRE and a QRSS use similar formulae, with one allows mixed strategies while only pure strategies for the other, in principle, the two should be closely related. However, in this work, we will demonstrate that it is generally not the case: QRSSs differ substantially from QREs and SQREs. Differentiating a QRSS from a QRE and a SQRE is this manuscript's third contribution.
A continuous-time smoothed best-response dynamic , , for simplicity using a single-population symmetric game as an example,(28)seems to be quite similar with our model since its discrete version is, after taking rate of revision strategies is 1,(29)
Here notations in  is used that refers to the portion of population being strategic state i and smoothed best-response function can be exact of the exponential form such as the one in Eq. (18). To this end, we argue that in this work, we focus on stability of fixed points of this discrete dynamic, which we have not seen in the literature.  discussed stability of the time-continuous counterpart. Furthermore, this stability analysis is linked to stability of QREs and thus in turn distinguish unstable QREs from stable QREs and this link as far as we know has not been investigated by others.
Next, we focus on a comparison between ILQRD and fictitious play . In fictitious play, players update their beliefs and choose a pure strategy to play according to certain decision-making rules that relate their strategy choice to their beliefs. Such decision-making rules typically include, for example, the best response myopic strategy  and the smoothed best response . The latter uses the same expression as used in Eq. (21) with only one difference. The difference is, p−i, which is the true current strategy profiles of the others, is replaced by player i's belief regarding the strategy profiles of the others, which is usually taken to be the empirical distribution deduced from the entire history of other players' choices. In Eq. (21), Eq. (13) and Eq. (14), there is no belief and no empirical distribution. When we examine the learning process in real life, it may seem to be more reasonable to take player beliefs into consideration. However, as we have previously noted, we are substantially much concerned about finding the proper solutions, that is, solutions capable of predicting experimental behavioral outcomes, than making the entire dynamic process meaningful. Because fictitious play extracts the empirical distribution of other player strategy profiles from history, the speed of convergence occasionally becomes a problem , . As discussed below, in ILQRD, convergence speed is never an issue.
Similar relation holds between ILQRD and stochastic reinforcement learning . Although the function forms of response in the two models are quite similar, our ILQRD relies only on the current state of all players while the reinforcement learning model take partial or all previous actions and payoffs into consideration. A record of scores for every potential strategy is kept by every player in the reinforcement learning while here in the ILQRD, only the previous mixed strategy is used in the decision making of the current strategy.
Another widely used dynamic process to determine the preferred NE is replicator dynamics , . All such previously mentioned static mapping or dynamics are based on introspective thinking and thus differ from replicator dynamics, where each player plays against a finite or infinite population and individuals learn from simple imitation but not with individual introspective thinking. In this manuscript, we focus on the effects of introspective thinking and only of the two players but not in a model of population dynamics.
We should note that the same notion of ILQRD (referred to as Boltzmann iteration) was proposed in a 2004 unpublished working paper  by one of the authors in a quantum game context. The idea is not a central topic in that working paper and was not developed any further there.
Additionally, a highly similar dynamic process was proposed in , as a concept referred to as noisy rational strategies (NRS):(30)
There, the authors focus on the effect of increasing β () and assume so that becomes irrelevant. According to the authors, such an increase in β can be interpreted as the increasing difficulty of performing a greater number of iterations given a player's finite capability . In this sense, what we discuss in this paper bears a greater resemblance to the following:(31)with a constant β, i.e., βl = β. We do not assume that β is increasing or decreasing, or that or . We do not believe that it is more difficult to perform more iterations because all iteration processes are supposed to be fictitious. We will demonstrate that essentially none of the desired features of SQRE relies on details concerning orders of β or increasing or decreasing values of β, instead depending on iteration. The iteration alone suffices to lead us to the preferred NE. Furthermore, we have not found a thorough discussion of the stability of NRS solutions. Thus, this manuscript can be regarded as a further development of the NRS in that it distinguishes stable from unstable solutions and notes that the key component is the iteration and not the order of β or the assumption of the limit of βn approaching .
In sum, the proposed iterative process differs from many other theories in that it is a map from a mixed strategy of all of the players to a new mixed strategy of all of the players. One might have some questions with respect to the interpretation of this process and comparison with real game playing. However, in physics, it is natural to study the evolution of distribution functions directly instead of the evolution of individual trajectories. Additionally, we do not aim at making the process reflect to reality more closely, but only to make the long-term solution more capable of predicting the game outcomes. Next, we discuss certain features of the proposed iterative process and compare its solutions to observed behavioral outcomes of experiments.
Major features of ILQRD, illustrated through examples
In this section, first, we demonstrate by example that the QRE covers all NEs, including pure and mixed NEs. This conclusion is not new and has been implicitly demonstrated in . For a 2×2 game, this statement can be proved by considering Eq. (18) and Eq. (19) in the extreme case of . We present a general proof. Second, we demonstrate that once there is a preferred pure NE in a game, our SQRE converges to this focal NE. This phenomenon serves as a natural refinement. Unfortunately, we have not proved this conclusion mathematically. Thus, we illustrate this outcome by examples. Third, we demonstrate that all QREs that correspond to mixed NEs are not stable for the case of large β () but that some of these QREs can be stable for finite β. The third conclusion first questions the predictive power of QREs (when they correspond to mixed NEs) and then redeems the QRE as a possibly applicable solution when bounded rationality is considered. This conclusion also distinguishes stable QREs from unstable QREs, which enables examination of the applicability of QREs to the explanation of experimental observations or real-life phenomena. That is, in principle, it is no longer true that QREs are strictly better than NEs. If the experimental data of a game are located in the region of unstable QREs, the QRE is not a practical solution concept for the game because unstable solutions are not reachable following the evolution. As far as we know, the last two conclusions (first, that SQREs converge to preferred NEs when there are such NEs and thus it represent a natural refinement of NEs and, second, QREs that correspond to mixed NEs become unstable for large enough β) are new. This also implies that mixed NE are not directly applicable, since they are in a sense always unstable in the limit of large β (also according to best-response dynamics), unless these mixed NEs are close to SQREs. We believe this also improves our understanding of mixed NEs.
Games with two pure NEs and a mixed NE: Coordination Game and Hawk-Dove Game
It can be demonstrated that for games with a dominant strategy, such as the prisoner's dilemma, the dominant strategy is such that one of the QREs and the SQRE converge toward the dominant strategy in the large β limit. However, this case is trivial. We demonstrate behaviors of our ILQRD starting from the more interesting coordination game, which does not have any dominant strategies. The payoff matrices of coordination game are as follows:(32)
It is known that there are two pure-strategy NEs and a mixed NE. They are (p1, p2) = (0, 0), (1, 1), (0.83, 0.83), which are denoted respectively as PNE00, PNE11 and MNE. Conventionally, the preferred NE — PNE00 — of this game can be found to be the focal NE through refinement , . Evolutionary stability analysis  indicates that the mixed NE is unstable. That is, when (), the population converges to PNE00 (PNE11). Because the β we introduced has no absolute meaning, in all of the manuscript's remaining calculations, we normalize each player's payoffs by their own maximum. For the payoffs provided in Eq. (32) , the maximums are 5 and 5 for the first and second player, respectively.
Fig. 1 shows iterative mappings for a range of values of β of this coordination game. Each curve except the red curve (which is p1 = p1) is a curve of the iterative mapping for a given value of β. The long arrow labelled indicates the shift of the curve of the iterative mapping when β increases. As an example, we also illustrate the first two steps of the iterative process for a specific value of β = 3.1. Usually it takes only less than 20 iterations to find the SQREs with reasonable accuracy starting from any initial value of p1. We can observe that for small β, there is only one QRE, which corresponds to PNE00 (denoted as QRE00). For large β, there are multiple QREs: QRE00, a QRE that corresponds to PNE11 (denoted as QRE11) and a QRE that corresponds to MNE (QREMNE). However, not all of these QREs are equally good. One can observe that for this game QRE00 and QRE11 are always stable, whereas QREMNE is always unstable. Here, stability means that if the initial guess is not correct at the QRE, one iteration step will drive the value of p1 closer to the QRE. Those QREs that are stable in this sense are referred to as SQRE. To simplify our terminology, we will refer to the QREs that correspond to pure (mixed) NEs in the limit of as pure (mixed) QREs, although all QREs for all finite β are in fact mixed strategies. The same naming scheme and the same notation are used for SQREs, for instance, pure SQRE00, pure SQRE11 and mixed SQREMNE.
For a given value of β, the intersections of corresponding curve and the line of p1 = p1 are the QREs. For this game, there is only one QRE for small values of β, whereas there are three QREs(QRE00, QRE11 and QREMNE) for larger β. Here, QRE00 and QRE11 are always stable, whereas QREMNE, which is close to p1 = 0.83, is always unstable. As an example, we illustrate the first two steps of the iterative process for a specific value of β = 3.1.
In principle, all of the information on the QREs and SQREs of this game is included in Fig. 1. However, to better illustrate the stability of QREs, in Fig. 2, we plot the dependence on β of the stability of the QREs of the coordination game. From the lower left section, where QRE00 overlaps with SQRE00, we can observe that for small β (β<βc, which here is approximately 22), there is only one QRE, and it is a SQRE. For all initial values of p1, this SQRE is the only long-term state. When β>βc, there are three QREs: QRE00, QRE11 and QREMNE. However, QREMNE is unstable. For initial values of less than the unstable QRE (the green line, QREMNE), the iteration results in SQRE00(the filled circles). Otherwise SQRE11(the empty circles) becomes the iteration's long-term solution. The corresponding p2 (not shown in the figure) can be straightforwardly calculated using Eq. (19). Note from the upper right section that the region between SQRE11 (the empty circles) and QREMNE (the green line) is narrow compared with the space between SQRE00 (the filled circles) and QREMNE (the green line). This outcome indicates that for a wide range of initial value of p1 the SQRE of this game converges toward SQRE00, which is also PNE00, the preferred NE in this game. This picture, particularly the right half of Fig. 2, is highly similar to results from evolutionary stability analysis. However, the behavior with small β such that no matter what the initial value of p1 is the SQRE is always the QRE that corresponds to PNE00, is a unique result of our ILQRD. We believe that this unique SQRE for small β can be regarded as a refinement of NEs.
When β<βc, which here is approximately 22, there is only one QRE(QRE00) and it is a SQRE. When β>βc, there are three QREs. However, the QREMNE is unstable. For initial values of less than the p1 of QREMNE (the green line), the iteration results in the QRE00 (the pink square; thus, SQRE00 in this case). Otherwise, the QRE11 becomes the long-term solution of the iteration SQRE11 (the gold circle). The corresponding p2 (not shown) can be straightforwardly calculated.
In this game, we observe that QREs cover all NEs and that for a wide range of initial values of p1, SQRE00 is the SQRE, and it corresponds to the preferred NE (p1 = 0). Particularly when β<βc for all values of , the preferred NE is the only SQRE. It is intuitive to expect that even for large β the green line (QREMNE) is closer to the empty circles (QRE11) than the filled circles (QRE00). Thus, for a wide range of , SQRE00 will be the long-term solution of the iterative process. In this sense, ILQRD and its SQRE represent a natural refinement for QREs and NEs. Additionally, we note that for this game, overall, our prediction is somewhat similar to that of evolutionary stability analysis. Next, we discuss another game, in which our prediction differs more significantly from the results of evolutionary stability analysis than the situation in this Coordination Game.
Now, we consider the hawk-dove game, which according to evolutionary game analysis  has one evolutionary stable mixed NE and two evolutionary unstable pure NEs. Its payoff matrices are defined as follows:(33)
Based on a calculation of iterative mappings similar to that shown in Fig. 1, the QREs and SQREs of the hawk-dove game for various values of β are plotted in Fig. 3. We observe that for small β, there is only one QRE(QREMNE) and it is a SQRE. This SQRE is the long-term solution of the ILQRD for all of the initial values of p1. For large β, there are three QREs: QRE01, QRE10 and QREMNE. However, in this case, QREMNE is unstable. The long-term solution of the ILQRD depends on the initial values of p1. When the initial values is above p1 of QREMNE (the green line), QRE10 is the SQRE. Otherwise, QRE01 is the SQRE. This figure provides more information than the QREs and NEs by distinguishing stable QREs from unstable ones. Additionally, this figure differs from the results of the evolutionary stability analysis, which demonstrates that the mixed NE of this game is evolutionary stable. Our results suggest that it is stable only when β<βc, which is approximately 10 for this game. Otherwise, the outcome of this game will prefer to be QRE01 or QRE10. This outcome differs from the results of evolutionary game analysis of the same game. For small β, the QREMNE is the only SQRE, whereas for large β, the mixed QRE becomes unstable: depending on the initial value of p1, different SQREs can be reached.
When β<βc, which is approximately 10 for this game, starting from an arbitrary initial value of p1, the long-term solution of the ILQRD of the hawk-dove Game is SQREMNE. When β>βc, there are three QREs: QRE01, QRE10 and QREMNE. However, here, QREMNE is unstable. In this case, the SQRE depends on the initial values of p1: When it is above p1 of SQREMNE (the green line), it is SQRE10 (pink square). Otherwise, it is SQRE01 (the gold circle).
Games with a unique mixed NE: Tennis Game
The third example is the tennis game , which has one mixed NE (p1, p2) = (0.7, 0.6) but no pure NEs. The payoff matrices are given as follows,(34)
Both the QRE and SQRE are shown in Fig. 4. We find that for this game there is always one and only one QRE for a given value of β and that this QRE converges toward the mixed NE (0.7, 0.6) in the limit of . Therefore, this QRE is QREMNE. However, the SQRE follows this QREMNE only when β is small enough (β≤βc, which for this game is approximately 3.7). When β>βc, this QREMNE becomes unstable; thus, there is no longer any SQRE. This game displays a substantial difference between the SQRE and the mixed NEs. Such games can be used to test the applicability of SQRE relative to NE, as shown in the following section.
From (a), we observe that there is only one QRE (QREMNE: the green line) for each given value of β. There is one SQRE (SQREMNE: the green circle) when β<βc, which is approximately 3.7 for this game, and there is no SQRE for β>βc. (b) shows the plots of p1 and p2 and their relation with β. The green curve of triangles represents values of (p1, p2) when such divergent iterations are performed 1000 times, i.e. those are the unstable run-away points. We observe again that when β>βc, QREMNE can not be reached by any SQREs, although it converges towards the NE.
From these examples, we observed the following: first, QREs exist for all of the games discussed above, and QREs cover all of the NEs in the limit of ; second, for games with a preferred NE, the SQRE can be used as a refinement of the NEs; and finally, mixed QREs become unstable for large enough β values; thus, the SQRE can be regarded as a refinement of the QREs (and therefore the NEs). These observations reflect the three main features of our ILQRD. As we pointed earlier, the first feature was implicitly demonstrated in . However, the latter two features are new. For certain games, the distance between the SQRE and the mixed NE is large. Thus, experimental results on such games are suitable for testing the applicability of the SQRE relative to the NE. Before we proceed to a comparison of our theoretical prediction with experimental results, we prove the previously mentioned first and the third features of our ILQRD. Unfortunately, we cannot prove the second feature because at present we do not know the necessary and sufficient condition for a game to have a preferred NE. Instead, we simply desire to demonstrate that for the coordination game and the hawk-dove game, for small β, the SQRE corresponds to a certain refinement: SQRE00 for the former game ,  and SQREMNE for the latter .
Proof of main conclusions on 2×2 symmetric games
In this section, while considering a symmetric game for simplicity, we wish to prove the previously described three features using examples.
First, there are five possible NEs: the pure NEs (0, 0), (1, 0), (0, 1), (1, 1) and a mixed NE , depending on the values of a, b, c and d. Let us first demonstrate that the QREs cover all of the NEs under proper conditional relations among a, b, c and d. That is Eq. (35) has five possible solutions: QRE00(β), QRE10(β), QRE01(β), QRE11(β) and QREMNE(β), which corresponds to the previously mentioned five NEs in the limit of . In terms of these notations, we wish to demonstrate that(36)
We will prove that the first and the last cases and the extensions to other cases are trivial.
These three conditions mean that f(0; β) is always greater than 0. However, it approaches more closely to 0 when β increases. Additionally, when β is sufficiently large, f(p; β) increases near p = 0. However, it increases slower than p1. This statement is equivalent to stating that f(0; β)−0>0, and there is a such that when β is sufficiently large. Thus, there must be a p*(β), such that f(p*; β)−p* = 0. In the limit of such p*(β) = 0. The situation for p2 can be analysed similarly.
If we can prove this point, then, first, p* is a fixed point in the limit of . Second, this fixed point is not a maximum or minimum of the function f(p; β)−p. The latter means that the curve f(p; β)−p passes across 0 when β is sufficiently large and 0 is not an extremum.
For p(t) that is close enough to p* such that 0<p(t)<1, the limit of the RHS of Eq. (42) when becomes exactly p*.
Up to this point, we have demonstrated that the QREs of Eq. (35) cover all of the possible pure strategy NEs and the mixed NEs. Next, we discuss their stability. We wish to demonstrate that any pure QREs if exist are always SQREs and that mixed QREs are SQREs only for small β but unstable for large β. Additionally, we attempt to define βc, the critical value of β.
Stable solutions are a subset of fixed points. For the iterative mapping defined in Eq. (35) , we use linear stability analysis , according to which a solution of Eq. (35) is stable if the Jacobian (simply a derivative in this case) of the right-hand side is less than 1, i.e.,(43)where p*(β) is defined as in(44)
For pure QREs, for example, QRE00, p*(β) decreases to 0 exponentially as . Therefore, . Thus, Eq. (43) is always satisfied. Other pure QREs can similarly be shown to be stable. Thus, all of them are SQREs.
Therefore, mixed QREs are SQREs in the limit of . Now, consider the case of . Because is a finite number, there is always a βc such that when β>βc, Eq. (43) is no longer valid. βc, which is the critical value of β, is defined as follows:(46)
For a given game with fixed values of a, b, c and d, the numerical value of βc can be solved numerically using Eq. (44) and Eq. (46) together. In Fig. 5, we plot all βc found in numerical simulation of ILQRD (denoted as ) against the βc solved from Eq. (44) and Eq. (46) (denoted as ). We found that the two values agree with each other well.
For asymmetric games, similar βc can be derived. For simplicity, let us define . Then, the critical value of β is determined by(47)(48)(49)for all the examples used in the previous section and also those payoff matrices of the experiments discussed in the next section, we plot βc, which are numerically solved from these equations and the corresponding ones found from simulations. We found that values of βc numerically solved and values found from simulations are in very good agreement.
We have demonstrated that for symmetric 2×2 games, our ILQRD has the following characteristics: (1) the QREs cover all of the pure and mixed NEs, and (2) pure QREs are SQREs, whereas mixed QREs are SQREs for small β but unstable for sufficiently large β. These conclusions cover all the major features that we demonstrated in the preceding section using specific examples. A general proof for general 2×2 games should be straightforward, although it would involve more tedious algebra.
From this general proof, we have observe that for all of the values of β, the QRE exists but not the SQRE. Intuitively, when β increases, the QREs approach NEs because our iteration process moves closer to the best response dynamics. However, QREs might lose their stability when β is sufficiently large. The difference between the SQRE and the NE depends on the competition between these two effects: approaching NE and losing stability. This difference provides a means of examining the applicability of the QRE. The QRE has been criticized because the fixed points, which are what we refer to as QREs, can always surpass mixed the NE as a result of the free parameter β. We agree with this statement. Therefore, that experimental data are closer to the QREs than the mixed NEs does not imply that the QRE is a better solution concept than the NE. However, this criticism does not hold for our SQRE, which loses its stability for larger β, indicating that our SQRE cannot always outperform mixed NEs. By assessing whether the experimental data points are closer to our SQRE than the mixed NEs, we can compare the SQRE solution concept with the NE.
Difference between QRSS and SQRE
QRSS, which is occasionally referred to as Logit response dynamics , , starts from an arbitrary strategy profile for each player and then uses the transition matrix defined in Eq. (22) to evolve the strategic states of all of the players into the invariant distribution defined in Eq. (27). Its stationary states have been discussed by several researchers. However, there is no full study on the necessary and sufficient conditions of the convergence of the QRSS to NE , , . In this section, we demonstrate that although Eq. (22) appears similar to Eq. (21), Pss as defined in Eq. (27) differs substantially from . As discussed below, the difference is that Pss is a distribution in which is the set of all of the possible distribution functions on S, whereas is a member of , which is the set of independent distribution functions. That is, Pss includes possibly correlated strategies, whereas describes only purely non-cooperative strategies.
Now, we demonstrate this statement using one example: a 2×2 symmetric game with payoff matrices :(50)
This game is a potential game . It has two strict NEs and a proper mixed NE. According to  and , the corresponding Logit response dynamics has an invariant distribution of strategy profiles that correspond to the potential maximizer and the proper mixed NE. Here, we show that the invariant distribution in fact does not correspond to the proper mixed NE. It involves correlations between players. First, we calculate the probability transition matrix M and then obtain the Pss of this game according Eq. (27). From the Pss, we find the reduced strategy profile P1 for player 1 and P2 for player 2:(51)
The correlation index is zero when the joint distribution Pss is a product of two independent probability distributions. Otherwise, it is nonzero and vice versa. In Fig. 6, we plot the correlation index C calculated for the QRSS. We find that the correlation index of the QRSS is always non-zero except in cases of extremely small β values. It is obvious that the SQRE is uncorrelated and that the correlation index remains 0. In fact, the joint distribution is defined for the SQRE in this manner.
The correlation index of the QRSS is always non-zero except, again, in cases of extremely small β.
In this paper, we have not yet discussed the applicability of correlated strategies in non-cooperative games  and will not address it. However, we have noted that the invariant distribution of the QRSS or Logit response dynamics generally result in correlated strategies and the QRSS differs from the QRE and the SQRE.
Experimental results re-analysed using ILQRD
As previously explained, a key difference between this manuscript and other studies on the QRE is that beyond certain values of β, QRE becomes unstable. Therefore, even when the experimental data are better fit by the QRE, if the data is near the unstable region, then applicability of the QRE is questionable. In this section, we examine how close the experimental data are to SQRE. We focus on experiments that involve 2×2 games with unique mixed NE and require the games to have a SQRE relatively distant from such mixed NE, i.e., games such as the one in Fig. 4. Erev et al.  conducted forty 2×2 constant sum games. Each pair of players played one game 500 times. Among these experimental games, there were ten games in which each game was played by nine pairs of subjects, whereas the other thirty games were played by one pair of subjects. Here, we use only experimental data from the ten games played by nine subject pairs. The payoff matrices of the ten games are shown in Table 1, which is reproduced from Table 1 in . The sixth column shows the NE of each game. Each player was asked to choose between A and B. The payoff entry AB presents player 1's wining probability (×100) when the player choose A and the player's opponent choose B and so on. The payoff for each win was 4 cents. All ten games were played by fixed pairs for 500 trials. Table 2 shows the proportion of A choices in the 500 trials by each player. We would have preferred the data from the 500 pairs of independent players to the data from the games repeated 500 times by the same pair of players. However, first, we did not find such data, and second, for this constant-sum game, it is believed that a repeated game produces no surprising result. For social dilemma games, such as the prisoner's dilemma, of which when the game is repeated even a finite number of times, the experimental behavioral outcome completely changes.
According to the payoff matrices in Table 1, we obtain the SQRE and the QRE of ten games and compare the experimental data points with our SQRE. There are five games (games 2, 3, 6, 7, 9) for which the experimental data points are in the area of the SQRE. We show one of them in Fig. 7 as an example. The experimental data from three games (games 1, 5, 8) are in the area of unstable solutions but remain closer to our SQRE than to the NE(Fig. 8). As shown in Fig. 9, the experimental data points of the remaining two games (games4, 10) are closer to the NE than to the SQRE. From these simple and limited comparisons, we conclude that the SQRE fits the observed behavior in real experiments better than the NE (5+3:2 in favor of the former in this limited comparison). However, further examination of the relation between the experimental data and our SQRE is required to arrive at a more definitive answer. We do not yet have any qualitative or quantitative criteria of games whose expected experimental behavior is close to the SQRE. This question will be examined in future investigations.
Five games out of the ten games exhibit a similar behavior. In the inset is the payoff matrix of this game.
Three games out of the ten games exhibit a similar behavior. In the inset is the payoff matrix of this game.
Conclusion and Discussion
In this manuscript, using the Logit quantal response function form (the Boltzmann distribution in statistical physics) to link the choice of strategy to the corresponding payoff in every step, we construct an iterative Logit quantal response dynamic process. Thus, the manuscript can be regarded as a dynamic version of Logit quantal response equilibrium. Importantly, our dynamic process differs from the so-called Logit response dynamics, which generally results in correlated equilibrium, even for non-cooperative games.
It has been shown in  that the QRE exists for all of the values of β – a measure of level of players' payoff sensitivity – and converges toward NEs when . It has also been demonstrated on some examples and been taken for sure by some researchers that in fitting experimental data, the QRE is generally better than the NE because it is free to change the value of β to improve the fitting , , . In our manuscript, we demonstrate that this is not the case: When taking stability into consideration, in principle, the QRE is no longer always better than the NE. Based on the dynamic process, stable and unstable QREs are distinguished. We find the following: (1) For games with a single focal pure NE, there is always one stable QRE that converges toward the preferred NE when . (2) For games without any focal pure NEs but with one unique proper mixed NE, when the payoff sensitivity β is sufficiently large (β>βc), the QREs lose their stability and become unstable. For certain games, the QREs are already close to their corresponding NEs before they lose their stability. Therefore, the difference between stable QREs and NEs is small. For other games, the difference between stable QREs and NEs is substantially more pronounced.
The latter case could be used to assess the applicability of the QRE to experiments and real-life observations. Then, we compared the stable and unstable QREs with experimental data. We found that the experimental observation of certain games (5 games from our preliminary tests) yields results within the regions of the stable QRE, that in other games (3 games), the experimental data are located in the unstable regions but remain closer to stable QREs than to the mixed NEs and that for other games (2 games) the experimental results are closer to the mixed NEs than to the stable QREs. We also believe that linking mixed NEs to mixed SQREs improves our understanding of applicability of mixed NEs.
We have not identified any qualitative or quantitative criteria with which to classify games from this perspective. Further experimental and theoretical investigations are required to reach such a conclusion. In section, we only present a proof of the main observed features of our dynamic process for symmetric 2×2 games. In the future, a general discussion of the features and their proof for N×M games should be undertaken and cross-game experiments when performed and compared against our SQREs with estimated value of β of a fixed group of players are of good value to put our concepts of SQREs up to further examinations.
The authors thank Zhijian Wang and Bin Xu for sending us their paper on game theory and the principle of maximum entropy  and for sharing experimental data from their research. It was in their study  that we found the additional experimental results of .
Conceived and designed the experiments: JW ZD. Analyzed the data: QZ JW. Contributed to the writing of the manuscript: QZ JW.
- 1. Holt CA, Roth AE (2004) The nash equilibrium: A perspective. Proceedings of the National Academy of Sciences of the United States of America 101: 3999–4002.
- 2. Ochs J (1995) Games with unique, mixed strategy equilibria - an experimental-study. Games and Economic Behavior 10: 202–217.
- 3. Nash JF (1950) Equilibrium points in n-person games. Proceedings of the National Academy of Sciences of the United States of America 36: 48–49.
- 4. Nash JF (1951) Non-cooperative games. Annals of Mathematics 54: 286–295.
- 5. Harsanyi J, Selton R (1988) A General Theory of Equilibrium Selection in Games. Mit Press.
- 6. Samuelson L (1998) Evolutionary Games and Equilibrium Selection. MIT Press.
- 7. Fudenberg D, Levine D (1998) The theory of learning in games. MIT Press series on economic learning and social evolution. The MIT Press.
- 8. Fudenberg D, Levine DK (2009) Learning and equilibrium. Annual Review of Economics 1: 385–419.
- 9. Myerson RB (1978) Refinements of the nash equilibrium concept. International Journal of Game Theory 7: 73–80.
- 10. Sabourian H, Juang WT (2007) Evolutionary game theory: Why equilibrium and which equilibrium. In: Bold S, Lwe B, Rsch T, van Benthem J, editors, Foundations of the Formal Sciences V: Infinite Games, College Publications.
- 11. Weibull J (1997) Evolutionary Game Theory. MIT Press.
- 12. Hofbauer J, Weibull JW (1996) Evolutionary selection against dominated strategies. Journal of Economic Theory 71: 558–573.
- 13. Borgers T, Sarin R (1997) Learning through reinforcement and replicator dynamics. Journal of Economic Theory 77: 1–14.
- 14. Alos-Ferrer C, Netzer N (2010) The logit-response dynamics. Games and Economic Behavior 68: 413–427.
- 15. Brown GW (1951) Iterative solutions of games by fictitious play. In: Koopmans TC, editor, Activity Analysis of Production and Allocation, New York: Wiley, chapter XXIV. pp. 374–376.
- 16. Berger U (2005) Fictitious play in 2×n games. Journal of Economic Theory 120: 139–154.
- 17. Goeree JK, Holt CA (2001) Ten little treasures of game theory and ten intuitive contradictions. American Economic Review 91: 1402–1422.
- 18. Wu J (2004) A new mathematical representation of game theory, I. eprint arXiv:quant-ph/0404159.
- 19. Aumann RJ (1987) Correlated equilibrium as an expression of bayesian rationality. Econometrica 55: 1–18.
- 20. Chen HC, Friedman JW, Thisse JF (1997) Boundedly rational nash equilibrium: A probabilistic choice approach. Games and Economic Behavior 18: 32–54.
- 21. Mckelvey RD, Palfrey TR (1995) Quantal response equilibria for normal-form games. Games and Economic Behavior 10: 6–38.
- 22. Wolpert DH (2006) Information theory - the bridge connecting bounded rational game theory and statistical physics. In: Braha D, Minai AA, Bar-Yam Y, editors, Complex Engineered Systems, New York: Springer, chapter 12. pp. 262–290.
- 23. Haile PA, Hortacsu A, Kosenok G (2008) On the empirical content of quantal response equilibrium. American Economic Review 98: 180–200.
- 24. Baron R, Durieu J, Haller H, Solal P (2002) A note on control costs and logit rules for strategic games. Journal of Evolutionary Economics 12: 563–575.
- 25. Hopkins E (2002) Two competing models of how people learn in games. Econometrica 70: 2141–2166.
- 26. Mckelvey RD, Palfrey TR (1996) A statistical theory of equilibrium in games. Japanese Economic Review 47: 186–209.
- 27. Konno T (2011) The exact solution of spatial logit response games. SSRN eLibrary.
- 28. Hommes CH, Ochea MI (2012) Multiple equilibria and limit cycles in evolutionary games with logit dynamics. Games and Economic Behavior 74: 434–441.
- 29. Fudenberg D, Kreps DM (1993) Learning mixed equilibria. Games and Economic Behavior 5: 320–367.
- 30. Jordan JS (1993) Three problems in learning mixed-strategy nash equilibria. Games and Economic Behavior 5: 368–386.
- 31. Krishna V, Sjostrom T (1998) On the convergence of fictitious play. Mathematics of Operations Research 23: 479–511.
- 32. Goeree JK, Holt CA (2004) A model of noisy introspection. Games and Economic Behavior 46: 365–382.
- 33. Dixit A, Skeath S, Reiley D (2009) Games of Strategy. W. W. Norton & Company.
- 34. Strogatz S (1994) Nonlinear Dynamics And Chaos: With Applications To Physics, Biology, Chemistry, And Engineering. Studies in Nonlinearity. Westview Press.
- 35. Erev I, Roth AE, Slonim RL, Barron G (2007) Learning and equilibrium as useful approximations: Accuracy of prediction on randomly selected constant sum games. Economic Theory 33: 29–51.
- 36. Morgan J, Sefton M (2002) An experimental investigation of unprofitable games. Games and Economic Behavior 40: 123–146.
- 37. Mckelvey RD, Palfrey TR (1998) Quantal response equilibria for extensive form games. Experimental Economics 1: 9–41.
- 38. Xu B, Zhang H, Wang Z, Zhang J (2012) Test the principle of maximum entropy in constant sum game: Evidence in experimental economics. Physics Letters A 376: 1318–1322.