Stability of Mixed-Strategy-Based Iterative Logit Quantal Response Dynamics in Game Theory

Using the Logit quantal response form as the response function in each step, the original definition of static quantal response equilibrium (QRE) is extended into an iterative evolution process. QREs remain as the fixed points of the dynamic process. However, depending on whether such fixed points are the long-term solutions of the dynamic process, they can be classified into stable (SQREs) and unstable (USQREs) equilibriums. This extension resembles the extension from static Nash equilibriums (NEs) to evolutionary stable solutions in the framework of evolutionary game theory. The relation between SQREs and other solution concepts of games, including NEs and QREs, is discussed. Using experimental data from other published papers, we perform a preliminary comparison between SQREs, NEs, QREs and the observed behavioral outcomes of those experiments. For certain games, we determine that SQREs have better predictive power than QREs and NEs.


Introduction
Game theory has become a powerful and popular tool in many sociological studies. Although several studies have questioned predictive power of the Nash equilibrium (NE) [1,2], it has been used as a primary game solution since its initial proposition [3,4]. However, the questions of finding such NEs and refining them when multiple NEs exist are not easy tasks [5,6]. Furthermore, people are interested to know how, in experiments or real-life observations, one ''preferred'' NE emerges from all possible strategy profiles, particularly in a population that does not begin with a NE as the initial strategic state. This phenomenon is the well-known question of learning in games and the converging towards particular solutions [7,8].
To find game solutions with predictive power, in addition to first searching for all NEs and then refining them [9], dynamic processes have been proposed to describe, mimic or reproduce to a certain extent the strategic thinking processes of game players in the hope that certain long-term solutions of the dynamic processes will lead to the ''preferred'' NEs [6,10]. Well-known examples of such dynamic process include replicator dynamics [11][12][13], Logit learning [14], and fictitious play [15,16]. In certain cases, a refined NE fits the experimental data well. We refer to such an NE as the preferred NE. In this case, a proposed evolutionary model is a good theory if the model predicts that long-term solutions of the corresponding dynamic processes converge to the refined NE. Alternatively, in other cases, no NE can explain the observed behavior in real experiments. In this case, a good theory means that long-term solutions of the proposed dynamic processes can explain the observed behavior instead of the NEs [17]. To simplify our terminology, we denote both the NE and the long-term solution in these cases, where they are capable of describing experimental or real-life observations, the preferred NE. The primary goal of these typical dynamic processes, and thus of all of these theories, is to determine the preferred NE by solving for the long-term solutions of the proposed dynamic processes. For a dynamic process, usually two central topics should be discussed: how well experimental observations can be explained by long-term solutions of the dynamic process and the relation between the dynamic process's long-term solutions and other solution concepts such as NEs and refined NEs.
In this manuscript, we study properties of a new dynamic process: the iterative Logit quantal response dynamics (ILQRD), which will be defined based on the concept of static Quantal Response Equilibriums (QREs). Our goal of proposing this new dynamic process is solely to capture the preferred NE with longterm stable solutions of ILQRD, which we denote as stable QREs (SQREs).
This manuscript is organized as follows. In this introduction, we first explain our main idea: the evolutionary process. In the next section, we define several notations and the dynamic process. There, we also compare the new dynamic process with other learning models and evolutionary processes in game theory. In the rest parts of this manuscript, we attempt to discuss the two previously introduced central topics of this ILQRD process: how the long-term solutions fit experimental results and what is the relation between its long-term solution and other solution concepts. Next, we illustrate the performance of our dynamic process using examples and provide an analytical proof of the major conclusions for the special cases of 262 symmetrical games. Then, we compare our SQRE with a highly similar solution concept: the quantal response stable solutions (QRSS). After that, using SQRE, we re-analyze some collected experimental results. Finally, we summarize our main conclusions and discuss possible future research.

Notations and Definitions
To present our formula in a compact form and to remain consistent with the notations and the terminology of statistical ensembles used in statistical physics, in this section, we introduce a matrix-based notation to represent a general N6M game. The key notation that differs from the conventional mathematical forms of game theory is the matrix representation of probability distributions and the payoff matrices of general N-player games. One may proceed directly to Eq. (18) and Eq. (19) and continue from there if learning these new notation presents an obstacle. Most of our expressions can be understood in terms of the conventional mathematics of game theory. However, we believe that they can be understood more conveniently using the new notation. Furthermore, the new notation is readily applicable to quantum games [18].

A new set of matrix-based notations
Here, we introduce a matrix-based notation for probability distributions such that the probability distribution of the strategic status of all players and the mathematical description of payoffs for games with an arbitrary number of players and an arbitrary number of strategies become matrices. However, to understand our work in this manuscript, this new set of notation is not necessary. One may skip this section and proceed directly to Eq. (18) and Eq. (19) .
Consider a 262 game with the following conventional form of payoff bi-matrix: with the convention that the row (column) strategies belong to the first (second) player and the first (second) number of all entries represents the payoff received by the first (second) player. We denote the first (second) player's strategy C, D (also C, D, although one player's strategies can be totally different from the other player's strategies). Usually, mixed strategies, which include pure strategies as special cases, are written as column vectors: For example, for player 1 and for player 2. The payoff is calculated from the following vectormatrix-vector multiplication, For 262 games, the payoff matrix G indeed appears as a matrix. However, for a general N6M game, the matrix becomes a map from N vectors to R, i.e., cubic tensors for 3-player games and T(N, 0)-type tensors for N-player games. The payoff is no longer a matrix.
To unify the notation of all N6M games, we introduce a new equivalent set of notations as follows. We write payoff matrices and mixed strategies as matrices: and Then we calculate the payoff from the following trace operation: One can confirm that with the same strategy profiles, Eq. (7) and Eq. (4) result in the same payoffs. For the special case of the 262 game, in the new formalism, the payoff matrices and the matrices of strategy state of all players are of dimension 2 2 . The probability distribution of both players is defined as a direct product of each player's state matrix: Every entry of this state matrix corresponds to the probability of all of the players choosing the corresponding strategic combination defined by the position of the entry. For example, the (1, 1) (upperleft) entry of r is p 1 p 2 and means that at this probability, the two players take the strategic combination (C, C). In turn, the correspondence of this entry in the payoff matrices H 1,2 -their (1, 1) entries -are naturally a,a' ð Þ. In this sense, from the general expression, H i can be interpreted as a linear map from the set of r 1 ,r 2 À Á È É to real number R.
Another useful feature of this notation system is that it streamlines the description of correlated strategies. That is, this notation also functions when r=r 1 6r 2 . For an N6M game, s i l stands for the lth strategy of the ith player, where i[½1,N and l[½1,M. The set of strategies of player i is denoted as . For convenience, we denote the set of probability distributions over S i as D i , i.e., D i~ri f g and the direct product set of all of these probability distributions as D, i.e., D~D 1 6D 2 . This D differs from the set of probability distributions over S~S 1 6S 1 , which we denote as D S ð Þ. This space includes the correlated strategy, whereas D is the set of only independent strategies. For example, in this notation, a general possibly correlated equilibrium [19] can be defined as r 12 ce [D S ð Þ such that for every player i, where tr i r 1,2 ce À Á is a partial trace, which performs the partial integral/summation over player i's strategy space. For example, and the result is a strategy profile of player 2. This partial trace is the same as the partial summation in deriving partial distribution in probability theory. In our notation, NE is defined as r 1 ne 6r 2 ne or, equivalently, r 12 ne [D such that for Vi,Vr i , Definition of iterative Logit quantal response dynamics (ILQRD) and its stable equilibriums Using the previously described matrix-based notation, for a 2player game our iterative Logit quantal response dynamics (ILQRD) is defined as follows: where the reduced payoff matrix is defined as Normalization constant Z i R is defined as Using the matrix-based notations, one can straightforwardly extend this ILQRD to general N6M games.
Using the usual notation of probability distribution, for the 262 game defined in Eq. (1) , Eq. (2) and Eq. (3) , we can rewrite the iteration process explicitly as follows: To simplify our notation, we denote the RHS of Eq. (18) as g p; bDa,b,c,d À Á , where a, b, c and d can be omitted when it is clear what the parameter a, b, c and d refers to. Formally, we denote this map as and p 1 (t+1) can be regarded as an intermediate variable. This map is an iterative map from a mixed strategy (p 2 (t)) to a new mixed strategy (p 2 (t+1)).
The fixed points of this ILQRD are the same as the static Logit QRE, and they are denoted as If a fixed point of those QREs is also the long-term evolution of ILQRD, this fixed point is referred to as a stable QRE (SQRE) and denoted as Otherwise, it will be referred to as an unstable QRE (USQRE). Next, we focus on the relations among pure-strategy NEs, mixed NEs, QREs and SQREs and experimental data. A SQRE must be a QRE. However, the inverse is not necessary true. This stability test potentially differentiates QREs into SQREs and USQREs. In principle, such differentiation can improve the predictive power of QREs for experimental data and examining/demonstrating this is the whole point of the present manuscript.
Difference between our evolutionary process and other learning/imitating models In ILQRD, a key concept is the use of the quantal response function (as in Eq. (13) and Eq. (14)) to determine a player's strategy profile according to the player's corresponding payoffs. The introduction of parameter b as a description of bounded rationality in this form of quantal response function is common in theories of learning in games [20] and in the QRE concept [21]. Additionally, this idea may be proposed simply from the viewpoint of statistical physics [18]. The use of parameter b can be justified to a certain degree based on games with limited information [22,23].
In fact, the same expression used in Eq. (13) and Eq. (14) has been used in discussions of Logit QRE, and a highly similar expression has been used in Logit learning [14,24], stochastic fictitious play [15] and stochastic reinforcement learning [25]. In these theories, the quantal response function, as in Eq. (13) and Eq. (14) , is occasionally referred to as the smoothed best response. However, all of these theories differ from ours in principle, as explained bellow.
First, we compare our expressions with the QRE. In the QRE, is a map from all players' strategy profile p~P6p i to itself. This equation is a fixed-point equation. Our work differs from the QRE in that a QRE only focuses on fixed points solved from static equations, whereas we use iterations to find the stable fixed points and distinguish them from other, unstable fixed points. Later, we will note that such a difference in stability is essential in applying the QRE to explaining experiments. The QRE has been compared with experimental observation and generally provides a better fit to the data than the NE [26]. However, the QRE has been criticized as an illusory improvement because there is an additional free parameter in QRE when fitting the curve, and one can always improve using an additional parameter. We demonstrate that this free parameter is not completely free. In fact, in certain cases, when the b is sufficiently large, the fixed points from the QRE are no longer stable. Therefore, when comparing experimental data with the stable/ unstable QREs and the NEs, one can determine whether the QRE or the NE has more predictive power and then the QRE is not always better than the NE. Distinguishing stable QREs from unstable QREs using iterative dynamics is this manuscript's first contribution. As presented below, we collected experimental data and conducted a preliminary comparison of the theories with experimental observations. After distinguishing SQREs from QREs, we tested SQREs, QREs and NEs against several experiments reported in the literature, which is this manuscript's second contribution. One possible further investigation along this line, which has not been implemented in this work, can be applying our ILQRD to cross-game experiments. For example, using the same players in different games with similar level of payoffs, we can estimate the parameter b from one game and test it in other games. In principle, this should be even more interesting then simply testing SQREs against experimental results.
In a dynamic QRE [24], so-called Logit learning [14], which is driven by observations during real game-playing processes, in which each player chooses only one pure strategy to play every turn, the same smoothed best response function is used to mimic the player's response to the opponent's pure strategy, as follows: If the smoothed best-response function is replaced by the real best-response function then this becomes simply the best response dynamics. Our model differs from this in that states of players can be mixed strategies in our model while only pure strategies in this model.
In fact, this smoothed best-response function defines a probability transition matrix between the current strategy profiles of player i and the previous strategy profiles of all of the other players. This transition probability depends not on player i's previous state s i (t21) but on the previous states of all of the other players s 2i (t21). For simplicity, we express this probability as follows: If we let each player take his or her turn in the natural order, we will have a transition matrix between the current and the previous strategy profiles of all of the players. For simplicity, we denote this matrix as In the above formula, taking N = 2 as example, the matrix may be written according to one of the two following rules: The first rule is referred to as alternating updating, whereas the second rule is referred to as simultaneous updating. Regardless of the form assumed, the central task is to determine the invariant probability distribution using the transition matrix M: To distinguish fixed-point solutions of this transition matrix from a QRE, these long-term solutions are occasionally referred to as end results of Logit response dynamics [14]. Here, we name such solutions quantal response stable solutions (QRSSs). There have been many attempts to solve [27] or characterize [14] such a QRSS (P ss ) for a given transition matrix M. Because a QRE and a QRSS use similar formulae, with one allows mixed strategies while only pure strategies for the other, in principle, the two should be closely related. However, in this work, we will demonstrate that it is generally not the case: QRSSs differ substantially from QREs and SQREs. Differentiating a QRSS from a QRE and a SQRE is this manuscript's third contribution.
A continuous-time smoothed best-response dynamic [7,28], for simplicity using a single-population symmetric game as an example, seems to be quite similar with our model since its discrete version is, after taking rate of revision strategies is 1, Here notations in [7] is used that h i refers to the portion of population being strategic state i and smoothed best-response function BR can be exact of the exponential form such as the one in Eq. (18). To this end, we argue that in this work, we focus on stability of fixed points of this discrete dynamic, which we have not seen in the literature. [28] discussed stability of the timecontinuous counterpart. Furthermore, this stability analysis is linked to stability of QREs and thus in turn distinguish unstable QREs from stable QREs and this link as far as we know has not been investigated by others.
Next, we focus on a comparison between ILQRD and fictitious play [15]. In fictitious play, players update their beliefs and choose a pure strategy to play according to certain decision-making rules that relate their strategy choice to their beliefs. Such decisionmaking rules typically include, for example, the best response myopic strategy [15] and the smoothed best response [29]. The latter uses the same expression as used in Eq. (21) with only one difference. The difference is, p 2i , which is the true current strategy profiles of the others, is replaced by player i's belief regarding the strategy profiles of the others, which is usually taken to be the empirical distribution deduced from the entire history of other players' choices. In Eq. (21), Eq. (13) and Eq. (14), there is no belief and no empirical distribution. When we examine the learning process in real life, it may seem to be more reasonable to take player beliefs into consideration. However, as we have previously noted, we are substantially much concerned about finding the proper solutions, that is, solutions capable of predicting experimental behavioral outcomes, than making the entire dynamic process meaningful. Because fictitious play extracts the empirical distribution of other player strategy profiles from history, the speed of convergence occasionally becomes a problem [30,31]. As discussed below, in ILQRD, convergence speed is never an issue.
Similar relation holds between ILQRD and stochastic reinforcement learning [25]. Although the function forms of response in the two models are quite similar, our ILQRD relies only on the current state of all players while the reinforcement learning model take partial or all previous actions and payoffs into consideration. A record of scores for every potential strategy is kept by every player in the reinforcement learning while here in the ILQRD, only the previous mixed strategy is used in the decision making of the current strategy.
Another widely used dynamic process to determine the preferred NE is replicator dynamics [12,13]. All such previously mentioned static mapping or dynamics are based on introspective thinking and thus differ from replicator dynamics, where each player plays against a finite or infinite population and individuals learn from simple imitation but not with individual introspective thinking. In this manuscript, we focus on the effects of introspective thinking and only of the two players but not in a model of population dynamics.
We should note that the same notion of ILQRD (referred to as Boltzmann iteration) was proposed in a 2004 unpublished working paper [18] by one of the authors in a quantum game context. The idea is not a central topic in that working paper and was not developed any further there.
Additionally, a highly similar dynamic process was proposed in [32], as a concept referred to as noisy rational strategies (NRS): There, the authors focus on the effect of increasing b (b 0 ƒb 1 ƒ . . . ƒb n ) and assume b ?~? so that r 0 ð Þ becomes irrelevant. According to the authors, such an increase in b can be interpreted as the increasing difficulty of performing a greater number of iterations given a player's finite capability [32]. In this sense, what we discuss in this paper bears a greater resemblance to the following: with a constant b, i.e., b l = b. We do not assume that b is increasing or decreasing, or that b ?~? or b 1~? . We do not believe that it is more difficult to perform more iterations because all iteration processes are supposed to be fictitious. We will demonstrate that essentially none of the desired features of SQRE relies on details concerning orders of b or increasing or decreasing values of b, instead depending on iteration. The iteration alone suffices to lead us to the preferred NE. Furthermore, we have not found a thorough discussion of the stability of NRS solutions. Thus, this manuscript can be regarded as a further development of the NRS in that it distinguishes stable from unstable solutions and notes that the key component is the iteration and not the order of b or the assumption of the limit of b n approaching ?.
In sum, the proposed iterative process differs from many other theories in that it is a map from a mixed strategy of all of the players to a new mixed strategy of all of the players. One might have some questions with respect to the interpretation of this process and comparison with real game playing. However, in physics, it is natural to study the evolution of distribution functions directly instead of the evolution of individual trajectories. Additionally, we do not aim at making the process reflect to reality more closely, but only to make the long-term solution more capable of predicting the game outcomes. Next, we discuss certain features of the proposed iterative process and compare its solutions to observed behavioral outcomes of experiments.

Major features of ILQRD, illustrated through examples
In this section, first, we demonstrate by example that the QRE covers all NEs, including pure and mixed NEs. This conclusion is not new and has been implicitly demonstrated in [26]. For a 262 game, this statement can be proved by considering Eq. (18) and Eq. (19) in the extreme case of b??. We present a general proof. Second, we demonstrate that once there is a preferred pure NE in a game, our SQRE converges to this focal NE. This phenomenon serves as a natural refinement. Unfortunately, we have not proved this conclusion mathematically. Thus, we illustrate this outcome by examples. Third, we demonstrate that all QREs that correspond to mixed NEs are not stable for the case of large b (b??) but that some of these QREs can be stable for finite b. The third conclusion first questions the predictive power of QREs (when they correspond to mixed NEs) and then redeems the QRE as a possibly applicable solution when bounded rationality is considered. This conclusion also distinguishes stable QREs from unstable QREs, which enables examination of the applicability of QREs to the explanation of experimental observations or real-life phenomena. That is, in principle, it is no longer true that QREs are strictly better than NEs. If the experimental data of a game are located in the region of unstable QREs, the QRE is not a practical solution concept for the game because unstable solutions are not reachable following the evolution. As far as we know, the last two conclusions (first, that SQREs converge to preferred NEs when there are such NEs and thus it represent a natural refinement of NEs and, second, QREs that correspond to mixed NEs become unstable for large enough b) are new. This also implies that mixed NE are not directly applicable, since they are in a sense always unstable in the limit of large b (also according to best-response dynamics), unless these mixed NEs are close to SQREs. We believe this also improves our understanding of mixed NEs.

Games with two pure NEs and a mixed NE: Coordination Game and Hawk-Dove Game
It can be demonstrated that for games with a dominant strategy, such as the prisoner's dilemma, the dominant strategy is such that one of the QREs and the SQRE converge toward the dominant strategy in the large b limit. However, this case is trivial. We demonstrate behaviors of our ILQRD starting from the more interesting coordination game, which does not have any dominant strategies. The payoff matrices of coordination game are as follows: It is known that there are two pure-strategy NEs and a mixed NE. They are (p 1 , p 2 ) = (0, 0), (1, 1), (0.83, 0.83), which are denoted respectively as PNE 00 , PNE 11 and MNE. Conventionally, the preferred NE -PNE 00 -of this game can be found to be the focal NE through refinement [5,6]. Evolutionary stability analysis [11] indicates that the mixed NE is unstable. That is, when p 1 0 vp 1 c~0 :83 (p 1 0 wp 1 c ), the population converges to PNE 00 (PNE 11 ). Because the b we introduced has no absolute meaning, in all of the manuscript's remaining calculations, we normalize each player's payoffs by their own maximum. For the payoffs provided in Eq. (32) , the maximums are 5 and 5 for the first and second player, respectively. Fig. 1 shows iterative mappings for a range of values of b of this coordination game. Each curve except the red curve (which is p 1 = p 1 ) is a curve of the iterative mapping for a given value of b. The long arrow labelled b : indicates the shift of the curve of the iterative mapping when b increases. As an example, we also illustrate the first two steps of the iterative process for a specific value of b = 3.1. Usually it takes only less than 20 iterations to find the SQREs with reasonable accuracy starting from any initial value of p 1 . We can observe that for small b, there is only one QRE, which corresponds to PNE 00 (denoted as QRE 00 ). For large b, there are multiple QREs: QRE 00 , a QRE that corresponds to PNE 11 (denoted as QRE 11 ) and a QRE that corresponds to MNE (QRE MNE ). However, not all of these QREs are equally good. One can observe that for this game QRE 00 and QRE 11 are always stable, whereas QRE MNE is always unstable. Here, stability means that if the initial guess is not correct at the QRE, one iteration step will drive the value of p 1 closer to the QRE. Those QREs that are stable in this sense are referred to as SQRE. To simplify our terminology, we will refer to the QREs that correspond to pure (mixed) NEs in the limit of b?? as pure (mixed) QREs, although all QREs for all finite b are in fact mixed strategies. The same naming scheme and the same notation are used for SQREs, for instance, pure SQRE 00 , pure SQRE 11 and mixed SQRE MNE .
In principle, all of the information on the QREs and SQREs of this game is included in Fig. 1. However, to better illustrate the stability of QREs, in Fig. 2, we plot the dependence on b of the stability of the QREs of the coordination game. From the lower Here, QRE 00 and QRE 11 are always stable, whereas QRE MNE , which is close to p 1 = 0.83, is always unstable. As an example, we illustrate the first two steps of the iterative process for a specific value of b = 3.1. doi:10.1371/journal.pone.0105391.g001 left section, where QRE 00 overlaps with SQRE 00 , we can observe that for small b (b,b c , which here is approximately 22), there is only one QRE, and it is a SQRE. For all initial values of p 1 , this SQRE is the only long-term state. When b.b c , there are three QREs: QRE 00 , QRE 11 and QRE MNE . However, QRE MNE is unstable. For initial values of p 1 0 less than the unstable QRE (the green line, QRE MNE ), the iteration results in SQRE 00 (the filled circles). Otherwise SQRE 11 (the empty circles) becomes the iteration's long-term solution. The corresponding p 2 (not shown in the figure) can be straightforwardly calculated using Eq. (19). Note from the upper right section that the region between SQRE 11 (the empty circles) and QRE MNE (the green line) is narrow compared with the space between SQRE 00 (the filled circles) and QRE MNE (the green line). This outcome indicates that for a wide range of initial value of p 1 the SQRE of this game converges toward SQRE 00 , which is also PNE 00 , the preferred NE in this game. This picture, particularly the right half of Fig. 2, is highly similar to results from evolutionary stability analysis. However, the behavior with small b such that no matter what the initial value of p 1 is the SQRE is always the QRE that corresponds to PNE 00 , is a unique result of our ILQRD. We believe that this unique SQRE for small b can be regarded as a refinement of NEs.
In this game, we observe that QREs cover all NEs and that for a wide range of initial values of p 1 , SQRE 00 is the SQRE, and it corresponds to the preferred NE (p 1 = 0). Particularly when b,b c for all values of p 1 0 , the preferred NE is the only SQRE. It is intuitive to expect that even for large b the green line (QRE MNE ) is closer to the empty circles (QRE 11 ) than the filled circles (QRE 00 ). Thus, for a wide range of p 1 0 , SQRE 00 will be the long-term solution of the iterative process. In this sense, ILQRD and its SQRE represent a natural refinement for QREs and NEs. Additionally, we note that for this game, overall, our prediction is somewhat similar to that of evolutionary stability analysis. Next, we discuss another game, in which our prediction differs more significantly from the results of evolutionary stability analysis than the situation in this Coordination Game. Now, we consider the hawk-dove game, which according to evolutionary game analysis [11] has one evolutionary stable mixed NE and two evolutionary unstable pure NEs. Its payoff matrices are defined as follows: Based on a calculation of iterative mappings similar to that shown in Fig. 1, the QREs and SQREs of the hawk-dove game for various values of b are plotted in Fig. 3. We observe that for small b, there is only one QRE(QRE MNE ) and it is a SQRE. This SQRE is the long-term solution of the ILQRD for all of the initial values of p 1 . For large b, there are three QREs: QRE 01 , QRE 10 and QRE MNE . However, in this case, QRE MNE is unstable. The longterm solution of the ILQRD depends on the initial values of p 1 . When the initial values is above p 1 of QRE MNE (the green line), QRE 10 is the SQRE. Otherwise, QRE 01 is the SQRE. This figure provides more information than the QREs and NEs by distinguishing stable QREs from unstable ones. Additionally, this figure differs from the results of the evolutionary stability analysis, which demonstrates that the mixed NE of this game is evolutionary stable. Our results suggest that it is stable only when b,b c , which is approximately 10 for this game. Otherwise, the outcome of this game will prefer to be QRE 01 or QRE 10 . This outcome differs from the results of evolutionary game analysis of the same game. For small b, the QRE MNE is the only SQRE, whereas for large b, the mixed QRE becomes unstable: depending on the initial value of p 1 , different SQREs can be reached.

Games with a unique mixed NE: Tennis Game
The third example is the tennis game [33], which has one mixed NE (p 1 , p 2 ) = (0.7, 0.6) but no pure NEs. The payoff matrices are given as follows, Both the QRE and SQRE are shown in Fig. 4. We find that for this game there is always one and only one QRE for a given value of b and that this QRE converges toward the mixed NE (0.7, 0.6) in the limit of b??. Therefore, this QRE is QRE MNE . However, the SQRE follows this QRE MNE only when b is small enough (b# b c , which for this game is approximately 3.7). When b.b c , this QRE MNE becomes unstable; thus, there is no longer any SQRE. This game displays a substantial difference between the SQRE and the mixed NEs. Such games can be used to test the applicability of SQRE relative to NE, as shown in the following section.
From these examples, we observed the following: first, QREs exist for all of the games discussed above, and QREs cover all of the NEs in the limit of b??; second, for games with a preferred NE, the SQRE can be used as a refinement of the NEs; and finally, mixed QREs become unstable for large enough b values; thus, the SQRE can be regarded as a refinement of the QREs (and therefore the NEs). These observations reflect the three main features of our ILQRD. As we pointed earlier, the first feature was implicitly demonstrated in [26]. However, the latter two features are new. For certain games, the distance between the SQRE and the mixed NE is large. Thus, experimental results on such games are suitable for testing the applicability of the SQRE relative to the NE. Before we proceed to a comparison of our theoretical prediction with experimental results, we prove the previously mentioned first and the third features of our ILQRD. Unfortunately, we cannot prove the second feature because at present we do not know the necessary and sufficient condition for a game to have a preferred NE. Instead, we simply desire to demonstrate that for the coordination game and the hawk-dove game, for small b, the SQRE corresponds to a certain refinement: SQRE 00 for the former game [5,6] and SQRE MNE for the latter [11].

Proof of main conclusions on 262 symmetric games
In this section, while considering a symmetric game for simplicity, we wish to prove the previously described three features using examples.
Let a'~a,b'~b,c'~c,d'~d in Eq. (13) and Eq. (14). If we merge the two equation and focus only on p 1 , the iteration function becomes , which corresponds to the previously mentioned five NEs in the limit of b??. In terms of these notations, we wish to demonstrate that We will prove that the first and the last cases and the extensions to other cases are trivial.
It is straightforward to demonstrate that when b2d,0 & c2a. 0, These three conditions mean that f(0; b) is always greater than 0. However, it approaches more closely to 0 when b increases. Additionally, when b is sufficiently large, f(p; b) increases near p = 0. However, it increases slower than p 1 . This statement is equivalent to stating that f(0; b)20.0, and there is a p p such that f p p; b ð Þ{ p pv0 when b is sufficiently large. Thus, there must be a p * (b), such that f(p * ; b)2p * = 0. In the limit of b?? such p * (b) = 0. The situation for p 2 can be analysed similarly.
If we can prove this point, then, first, p * is a fixed point in the limit of b??. Second, this fixed point is not a maximum or minimum of the function f(p; b)2p. The latter means that the curve f(p; b)2p passes across 0 when b is sufficiently large and 0 is not an extremum.
Because we are working with symmetric games, the iteration function f(p; b) can be regarded as a composite mapping of the following function: where In terms of this function, we can then easily demonstrate that Here, we discuss the first part in greater detail, whereas the remainder is straightforward. The fixed point of the mapping g is defined equivalently as follows: For p(t) that is close enough to p * such that 0,p(t),1, the limit of the RHS of Eq. (42) when b?? becomes exactly p * .
Up to this point, we have demonstrated that the QREs of Eq. (35) cover all of the possible pure strategy NEs and the mixed NEs. Next, we discuss their stability. We wish to demonstrate that any pure QREs if exist are always SQREs and that mixed QREs are SQREs only for small b but unstable for large b. Additionally, we attempt to define b c , the critical value of b.
Stable solutions are a subset of fixed points. For the iterative mapping defined in Eq. (35) , we use linear stability analysis [34], according to which a solution of Eq. (35) is stable if the Jacobian (simply a derivative in this case) of the right-hand side is less than 1, i.e.,

Lf Lp
where p*(b) is defined as in For pure QREs, for example, QRE 00 , p*(b) decreases to 0 exponentially as  Lf Lp Therefore, mixed QREs are SQREs in the limit of b?0. Now, consider the case of b??. Because 0vp Ã b ð Þv1 is a finite number, there is always a b c such that when b.b c , Eq. (43) is no longer valid. b c , which is the critical value of b, is defined as follows: For a given game with fixed values of a, b, c and d, the numerical value of b c can be solved numerically using Eq. (44) and Eq. (46) together. In Fig. 5, we plot all b c found in numerical simulation of ILQRD (denoted as b N c ) against the b c solved from Eq. (44) and Eq. (46) (denoted as b T c ). We found that the two values agree with each other well.
For asymmetric games, similar b c can be derived. For simplicity, let for all the examples used in the previous section and also those payoff matrices of the experiments discussed in the next section, we plot b c , which are numerically solved from these equations and the corresponding ones found from simulations. We found that values of b c numerically solved and values found from simulations are in very good agreement. We have demonstrated that for symmetric 262 games, our ILQRD has the following characteristics: (1) the QREs cover all of the pure and mixed NEs, and (2) pure QREs are SQREs, whereas mixed QREs are SQREs for small b but unstable for sufficiently large b. These conclusions cover all the major features that we demonstrated in the preceding section using specific examples. A general proof for general 262 games should be straightforward, although it would involve more tedious algebra.
From this general proof, we have observe that for all of the values of b, the QRE exists but not the SQRE. Intuitively, when b increases, the QREs approach NEs because our iteration process moves closer to the best response dynamics. However, QREs might lose their stability when b is sufficiently large. The difference between the SQRE and the NE depends on the competition between these two effects: approaching NE and losing stability. This difference provides a means of examining the applicability of the QRE. The QRE has been criticized because the fixed points, which are what we refer to as QREs, can always surpass mixed the NE as a result of the free parameter b. We agree with this statement. Therefore, that experimental data are closer to the QREs than the mixed NEs does not imply that the QRE is a better solution concept than the NE. However, this criticism does not hold for our SQRE, which loses its stability for larger b, indicating that our SQRE cannot always outperform mixed NEs. By assessing whether the experimental data points are closer to our SQRE than the mixed NEs, we can compare the SQRE solution concept with the NE.

Difference between QRSS and SQRE
QRSS, which is occasionally referred to as Logit response dynamics [14,27], starts from an arbitrary strategy profile for each player and then uses the transition matrix defined in Eq. (22) to evolve the strategic states of all of the players into the invariant distribution defined in Eq. (27). Its stationary states have been discussed by several researchers. However, there is no full study on the necessary and sufficient conditions of the convergence of the QRSS to NE [14,24,27]. In this section, we demonstrate that although Eq. (22) appears similar to Eq. (21), P ss as defined in Eq. (27) differs substantially from r ? b ð Þ. As discussed below, the difference is that P ss is a distribution in D S ð Þ which is the set of all of the possible distribution functions on S, whereas r ? b ð Þ is a member of D, which is the set of independent distribution functions. That is, P ss includes possibly correlated strategies, whereas r ? b ð Þ describes only purely non-cooperative strategies. Now, we demonstrate this statement using one example: a 262 symmetric game with payoff matrices [14]: This game is a potential game [14]. It has two strict NEs and a proper mixed NE. According to [14] and [27], the corresponding Logit response dynamics has an invariant distribution of strategy profiles that correspond to the potential maximizer and the proper mixed NE. Here, we show that the invariant distribution in fact does not correspond to the proper mixed NE. It involves correlations between players. First, we calculate the probability transition matrix M and then obtain the P ss of this game according Eq. (27). From the P ss , we find the reduced strategy profile P 1 for player 1 and P 2 for player 2:  Then, we define the correlation index as C~X l,m P ss s 1 l ,s 2 The correlation index is zero when the joint distribution P ss is a product of two independent probability distributions. Otherwise, it is nonzero and vice versa. In Fig. 6, we plot the correlation index C calculated for the QRSS. We find that the correlation index of the QRSS is always non-zero except in cases of extremely small b values. It is obvious that the SQRE is uncorrelated and that the correlation index remains 0. In fact, the joint distribution is defined for the SQRE in this manner.
In this paper, we have not yet discussed the applicability of correlated strategies in non-cooperative games [19] and will not address it. However, we have noted that the invariant distribution of the QRSS or Logit response dynamics generally result in correlated strategies and the QRSS differs from the QRE and the SQRE.  Experimental results re-analysed using ILQRD As previously explained, a key difference between this manuscript and other studies on the QRE is that beyond certain values of b, QRE becomes unstable. Therefore, even when the experimental data are better fit by the QRE, if the data is near the unstable region, then applicability of the QRE is questionable. In this section, we examine how close the experimental data are to SQRE. We focus on experiments that involve 262 games with unique mixed NE and require the games to have a SQRE relatively distant from such mixed NE, i.e., games such as the one in Fig. 4. Erev et al. [35] conducted forty 262 constant sum games. Each pair of players played one game 500 times. Among these experimental games, there were ten games in which each game was played by nine pairs of subjects, whereas the other thirty games were played by one pair of subjects. Here, we use only experimental data from the ten games played by nine subject pairs. The payoff matrices of the ten games are shown in Table 1, which is reproduced from Table 1 in [35]. The sixth column shows the NE of each game. Each player was asked to choose between A and B. The payoff entry AB presents player 1's wining probability (6100) when the player choose A and the player's opponent choose B and so on. The payoff for each win was 4 cents. All ten games were played by fixed pairs for 500 trials. Table 2 shows the proportion of A choices in the 500 trials by each player. We would have preferred the data from the 500 pairs of independent players to the data from the games repeated 500 times by the same pair of players. However, first, we did not find such data, and second, for this constant-sum game, it is believed that a repeated game produces no surprising result. For social dilemma games, such as the prisoner's dilemma, of which when the game is repeated even a finite number of times, the experimental behavioral outcome completely changes.
According to the payoff matrices in Table 1, we obtain the SQRE and the QRE of ten games and compare the experimental data points with our SQRE. There are five games (games 2, 3, 6, 7, 9) for which the experimental data points are in the area of the SQRE. We show one of them in Fig. 7 as an example. The experimental data from three games (games 1,5,8) are in the area of unstable solutions but remain closer to our SQRE than to the NE (Fig. 8). As shown in Fig. 9, the experimental data points of the remaining two games (games4, 10) are closer to the NE than to the SQRE. From these simple and limited comparisons, we conclude that the SQRE fits the observed behavior in real experiments Figure 8. NE, QRE, SQRE and experimental data of game 8: the average strategy profile is in the region of unstable solutions but remains closer to the SQRE than to the NE. Three games out of the ten games exhibit a similar behavior. In the inset is the payoff matrix of this game. doi:10.1371/journal.pone.0105391.g008 better than the NE (5+3:2 in favor of the former in this limited comparison). However, further examination of the relation between the experimental data and our SQRE is required to arrive at a more definitive answer. We do not yet have any qualitative or quantitative criteria of games whose expected experimental behavior is close to the SQRE. This question will be examined in future investigations.

Conclusion and Discussion
In this manuscript, using the Logit quantal response function form (the Boltzmann distribution in statistical physics) to link the choice of strategy to the corresponding payoff in every step, we construct an iterative Logit quantal response dynamic process. Thus, the manuscript can be regarded as a dynamic version of Logit quantal response equilibrium. Importantly, our dynamic process differs from the so-called Logit response dynamics, which generally results in correlated equilibrium, even for non-cooperative games.
It has been shown in [21] that the QRE exists for all of the values of b -a measure of level of players' payoff sensitivity -and converges toward NEs when b??. It has also been demonstrated on some examples and been taken for sure by some researchers that in fitting experimental data, the QRE is generally better than the NE because it is free to change the value of b to improve the fitting [23,36,37]. In our manuscript, we demonstrate that this is not the case: When taking stability into consideration, in principle, the QRE is no longer always better than the NE. Based on the dynamic process, stable and unstable QREs are distinguished. We find the following: (1) For games with a single focal pure NE, there is always one stable QRE that converges toward the preferred NE when b??. (2) For games without any focal pure NEs but with one unique proper mixed NE, when the payoff sensitivity b is sufficiently large (b.b c ), the QREs lose their stability and become unstable. For certain games, the QREs are already close to their corresponding NEs before they lose their stability. Therefore, the difference between stable QREs and NEs is small. For other games, the difference between stable QREs and NEs is substantially more pronounced.
The latter case could be used to assess the applicability of the QRE to experiments and real-life observations. Then, we compared the stable and unstable QREs with experimental data. We found that the experimental observation of certain games (5 games from our preliminary tests) yields results within the regions of the stable QRE, that in other games (3 games), the experimental data are located in the unstable regions but remain closer to stable Figure 9. NE, QRE, SQRE and experimental data of game 10: the average strategy profile is closer to the NE than to the SQRE. Two games out of the ten games exhibit a similar behavior. In the inset is the payoff matrix of this game. doi:10.1371/journal.pone.0105391.g009 QREs than to the mixed NEs and that for other games (2 games) the experimental results are closer to the mixed NEs than to the stable QREs. We also believe that linking mixed NEs to mixed SQREs improves our understanding of applicability of mixed NEs.
We have not identified any qualitative or quantitative criteria with which to classify games from this perspective. Further experimental and theoretical investigations are required to reach such a conclusion. In section, we only present a proof of the main observed features of our dynamic process for symmetric 262 games. In the future, a general discussion of the features and their proof for N6M games should be undertaken and cross-game experiments when performed and compared against our SQREs with estimated value of b of a fixed group of players are of good value to put our concepts of SQREs up to further examinations.