## Figures

## Abstract

In game theory, there are two social interpretations of rewards (payoffs) for decision-making strategies: (1) the interpretation based on the utility criterion derived from expected utility theory and (2) the interpretation based on the quantitative criterion (amount of gain) derived from validity in the empirical context. A dynamic decision theory has recently been developed in which dynamic utility is a conditional (state) variable that is a function of the current wealth of a decision maker. We applied dynamic utility to the equal division in dove-dove contests in the hawk-dove game. Our results indicate that under the utility criterion, the half-share of utility becomes proportional to a player’s current wealth. Our results are consistent with studies of the sense of fairness in animals, which indicate that the quantitative criterion has greater validity than the utility criterion. We also find that traditional analyses of repeated games must be reevaluated.

**Citation: **Ito H, Katsumata Y, Hasegawa E, Yoshimura J (2016) What Is True Halving in the Payoff Matrix of Game Theory? PLoS ONE 11(8):
e0159670.
https://doi.org/10.1371/journal.pone.0159670

**Editor: **Cheng-Yi Xia, Tianjin University of Technology, CHINA

**Received: **March 3, 2016; **Accepted: **July 5, 2016; **Published: ** August 3, 2016

**Copyright: ** © 2016 Ito et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **All relevant data are within the paper.

**Funding: **This study was supported by a Japan Prize Foundation award to HI, grants-in-aid from the Japan Society for Promotion of Science (JSPS) for JSPS fellows to HI (no. 14J02983), and grants-in-aid from the JSPS to JY (nos. 22255004, 22370010, 26257405 and 15H04420). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Recently, game theory has been studied extensively to investigate the reasons why cooperation evolved in human and animal society, e.g., networks and other spatial structures, heterogeneity in behavior or rationality, and punishments [1–13]. These developments introduce some complexities in game theory itself such that the game becomes more realistic. Instead, in this work, we focus on the nature of decision making in game theory. Note that decision theory and game theory are optimization theories of individual behaviors that have been systematized together. Decision theory is adopted when individuals face environmental uncertainty, whereas game theory applies when the behavior of other individuals is uncertain [14–15]. For these reasons, game theory is a special extension of decision theory.

The core of decision theory is referred to as expected utility theory, which maximizes the expectation value (average or mean) of utility [14, 16]. Based on the above relationship, the axioms of expected utility theory are used as the basic axioms of game theory. In this case, utility is a measure of content (satisfaction) if the given wealth (resource) is consumed. The utility of a given level of wealth is thus highly subjective and individualized; moreover, utility is therefore a dimensionless value without a unit, unlike monetary currencies (dollars and yen) and prison sentences (years). We introduce dynamic utility [17–18], a new concept of utility, in decision theory into game theory. Specifically, we consider half-sharing among doves in the hawk-dove game in terms of dynamic utility.

Game theory involves the study of individuals’ behavior in relation to other individuals in a population. The payoff matrix is the set of benefits for players against opponents. Therefore, the elements of the payoff matrix are understood as the values of utility. Thus, the social interpretations of strategies are understood to be based on the utility criterion [16, 19–20]. However, this utility interpretation frequently conflicts with the natural understanding of animal and human behaviors. Meanwhile, some people in economics actually use the amount of money (dollar value) in the payoff matrix because the amount of money (rewards) is not known in the utility criterion [21–30], but we like to know how many dollars we can obtain as a reward for victory. The main interest of game theory is in how people behave in a real-money game. To contrast real-money rewards with the utility criterion, we call the latter the quantitative criterion. Recently, several experimental studies have shown that animals behave according to the quantitative criterion when they are offered rewards after enduring suffering [31–34]. For example, if the same effort is required, animals complain strongly if the same amount of reward is not offered. Hence, animals are said to exhibit a sense of fairness, where fair division is defined as a half share of the resources, e.g., an ultimatum game [31–32, 34]. These observations seem to indicate that the quantitative criterion is more valid than the utility criterion.

Many techniques have thus adopted the quantitative criterion for payoff elements [21–29], leaving two alternatives for interpreting the elements of the payoff matrix in game theory: the utility criterion (dimensionless) and the quantitative criterion (with units, e.g., dollars).

Dynamic programming (DP) involves a numerical algorithm that solves for the optimal choice in sequential decision making. DP underlies the first true dynamic optimization model that was developed solely by Richard Bellman [35]. Later, stochastic control (theory) was developed but found to be mathematically equivalent to DP with additional complexity [21]. DP has been extensively applied to animal decision-making activity in behavioral ecology and has helped yield numerical solutions to various dynamic problems [36]. Despite these great achievements, theoretical understanding of the mechanisms of behavior is not provided by the numerical solutions yielded by DP.

Recently, a theory of dynamic utility optimization (DU) has been developed using the “Principle of Optimality,” the core principle of Bellman’s DP, which involves dynamic decision making under risk and uncertainty in which the growth rates of individual wealth are random variables that follow a simple stochastic process [17–18]. Because DU optimizes Markov chains (stochastic processes) as a form of sequential decision making, it maximizes the geometric mean of multiplicative growth rates.

Here, we explain the theoretical rationale of the dynamic utility model. The dynamic utility function is derived as follows [17–18]. Let time *t* = 0, …, *T* (final time) and *w*_{t}, the wealth at *t*, be the non-negative state variables of a decision maker (independent, identically distributed random variables). Let *r*_{t} (>0) denote the multiplicative growth rate of the wealth at *t*, such that *w*_{t+1} = *r*_{t}*w*_{t}. Then, the wealth at *t*, *w*_{t}, is expressed as
(1)

We assume that the growth rates *r*_{t} (*t* = 0, …, *T*) are independent identically distributed random variables that represent a stochastic process. The decision maker can optimize this stochastic process by choosing the best option at every time point in Eq (1). We thus maximize the final wealth at *T*, *w*_{T}, such that
(2)

This maximization of the final wealth *w*_{T} (Eq (2)) is equivalent to maximization of the geometric mean growth rates such that
(3)

Taking the logarithm of Eq (3), we obtain (4)

Eq (4) is rewritten in the form of utility theory in economics and operations research. We simply define utility function *u*(*r*) as
(5)
and we maximize the expected utility E{*u*} [37]. Note that *w*_{t+1} = *r*_{t}*w*_{t}. Therefore, we obtain

, where *g*_{t} is the gain at time *t*. Therefore, at any time *t*, we obtain
(6)
where *g* and *w* are the current gain and the current wealth, respectively. The growth utility formula (Eq (5)) is then further rewritten in the form of *g* (decision variable) given *w* (state variable) such that
(7)
and we maximize the expected utility E{*u*(*g*;*w*)}, which indicates that the current wealth is the state variable for maximization of the final wealth. Therefore, this function *u*(*g*;*w*) (Eq (6)) violates the so-called independent axiom of the axiomatic system of utility theory [16]. Thus, the principle of optimality developed by Richard Bellman [35] contradicts with the traditional expected utility theory [16].

Thus, DU yields the following optimization principle: (8)

Thus, the derived dynamic utility is in the form of a logarithmic function (Eq (8)). Note that the value of *g* satisfies–*w* < *g*. This analytical solution for DU demonstrates that the utility function depends on the current gain/loss (the decision variable) and the current wealth status (state variable) at the time of decision making. In the present study, we demonstrate that the traditional application of expected utility theory and game theory in behavioral studies is valid only as a static model.

By combining the maximization of future wealth and the avoidance of bankruptcy (the two optimization criteria), we obtain the following: (9) where (10)

Here, *c* is a constant. In the present study, the dynamic utility function (DUF) *u*(*g*;*w*) is applied to game theory; notably, the DUF avoids the arbitrariness of utility by mathematical derivation from DP. Using numerical analyses of some examples, we demonstrate that there are serious contradictions in the social interpretation of strategy when the utility criterion is used. These results indicate that traditional interpretation of game theory is a static optimization model that is valid only when all the players have equal current wealth. It cannot be applied to any game in which the current wealth of players varies over time; thus, the utility criterion cannot be applied to repeated games in which the current status of a player changes over time. In contrast, the quantitative criterion does not invoke a sense of unfairness in dividing the reward for games. Thus, we suggest that the quantitative criterion is more adaptable to the social interpretation of strategy in game theory than the utility criterion.

## Model and Results

As an example of game theory, we consider the hawk-dove game (V: victory reward; and C: fighting cost), in which the sense of fairness appears when both players adopt the dove strategy (D, D) in their payoff matrix (Fig 1A) [14]. Here, the “dove” player against a dove opponent gains V/2 in every contest.

(a) Payoff matrix of the hawk-dove game, where V and C are the victory reward and fighting cost, respectively. (b, c, d) The halving outcomes of victory rewards (V) by two players adopting the dove strategy in which the current wealth of player 1 (rich dove; RD) and player 2 (poor dove; PD), *w*_{1} and *w*_{2}, are *w*_{1} = 10 and *w*_{2} = 5. (b) The utility criterion in which V = 2 (utility) is divided by half, such that *u*_{1} = *u*_{2} = 1. The amount gained by each player is proportional to the player’s current wealth, such that . The total amount, G, of victory reward, V, varies based on the sum of the current wealth of both players, such that G = (*w*_{1} + *w*_{2})(*e*– 1). (c) The utility criterion in which the amount of reward G is set constant (G = $2). The gains of the players depend on the proportion of players’ current wealth, such that *g*_{1} = {*w*_{1}/(*w*_{1} + *w*_{2})}G. The utilities of the two players are equal, but the amounts of gains differ based on the ratio of current wealth, as in (b). (d) The quantitative criterion in which V = G = 2 dollars. The utility of players depends on the current wealth of players, such that *u*_{i} = log {(*g*_{i} + *w*_{i})/*w*_{i}}.

We apply the analytically derived DUF *u*(*g*;*w*) *=* log{(*g*+*w*)/*w*} to the case of fair division of victory reward (V/2) in (D, D). Numerically, we set *w*_{1} = 10 and *w*_{2} = 5 (unit: dollar). We compare the values of V/2 between the utility and quantitative criteria.

In the case of the utility criterion, both players acquire *u* = V/2 (unit: utility). The utility of each player *i* (*i* = 1, 2) should satisfy the following relationships between the current gain *g* and current wealth *w*:
(11)

Eq (11) results in the following serious flaws. The face-value of money that each player obtains depends on the relative wealth of the player, such that (12)

Therefore, halving the reward depends on the relative amount of the players’ current wealth *w*_{1}/*w*_{2}. For example, if we set V = 2 (utility), such that *u*_{1} = *u*_{2} = 1 (Fig 1B; case 1 in Table 1), then we obtain the following:
(13)

Thus, the richer the player, the greater the share that he or she should obtain in the equal-utility division (Fig 1B).

In case 1 (Table 1; Fig 1B), we also face the problem of the total amount G (= *g*_{1} + *g*_{2}) of competitive resources. From Eq (11), the total resource G becomes proportional to the sum of the current wealth of both players. Therefore, G is large in games between rich players but small in those between poor players, which leads to the following logical inconsistency. In nature, and even in a society, competition occurs for existing resources, which indicates that the game rewards should be set equal to a constant prior to the beginning of a game. However, in case 1, the total reward G cannot be determined until who plays the game is determined. Furthermore, G may increase indefinitely in some repeated games when the sum of the current players’ wealth increases indefinitely. Thus, G should be a constant that is unaffected by players’ current wealth. We can avoid this problem of case 1 as follows. By setting a constant G in the utility criterion (case 2 in Table 1; Fig 1C), we can satisfy *u*_{1} = *u*_{2}. Then, player 1 receives as a reward. However, even in this case, the reward of a player becomes proportional to the current wealth of the players (as with case 1). Therefore, quantitative fairness is also not satisfied in this case.

In contrast, if we apply the quantitative criterion (Fig 1D; case 3), then each player obtains the same amount of money, such that *g* = V/2 (unit: dollar). In this case, the utility of each player *i* (*i* = 1, 2) depends on their current wealth, *w*_{i}:
(14)

Thus, equal division of money results in a difference in the utility values of the players unless their current wealth is identical. For example, if we set V = 2 (unit: dollar), then *u*_{1} ≅ 0.095 and *u*_{2} ≅ 0.182 (Fig 1C).

All three cases are summarized in Table 1.

Now, we compare the quantitative differences between case 1 and case 3 (Table 1). The observed discrepancies in terms of both the utility (case 1) and quantitative (case 3) criteria increase with the difference in current wealth between the players (Fig 2). In the utility criterion (Fig 2A), the difference in the current gain *Δg* (= *g*_{1} –*g*_{2}) depends linearly on the difference in current wealth *Δw* (= *w*_{1} –*w*_{2}) (Fig 2B and 2C). In the quantitative criterion (Fig 2D), the difference in utility *Δu* (= *u*_{1} –*u*_{2}) increases with the ratio of the multiplicative growth rates *Δr* (= *r*_{1}/*r*_{2}) of the two players, where *r* = (*g* + *w*) / *w* (Fig 2E and 2F). Note that if the current wealth of all players is equal, then we can preserve equality in both *g* and *u*, such that *g*_{1} = *g*_{2} and *u*_{1} = *u*_{2} simultaneously.

(a, b, c) The relationship between the gain and current wealth under the utility criterion (case 1, when *u*_{1} = *u*_{2} = 1). (d, e, f) The relationship between the utility and current wealth under the quantitative criterion (case 3). (a) The gain versus utility for both players, such that . *Δg* indicates the difference in *g* between the two players. (b) The difference in gain *Δg* (= |*g*_{1} –*g*_{2}|) versus the difference in current wealth *Δw* (= |*w*_{1} –*w*_{2}|). (c) Phase plane of *Δg* against *w*_{1} and *w*_{2}. The dashed line indicate *Δg* = 0. (d) The utility versus both players’ gain, such that *u*_{t} = log{(*g*_{t} + *w*_{t})/*w*_{t}}. *Δu* indicates the difference in *u* between the two players. (e) The difference in utility *Δu* (= |*u*_{1} –*u*_{2}|) versus the difference in growth rate *Δr* (= *r*_{1}/*r*_{2}). (f) Phase plane of *Δu* versus *w*_{1} and *w*_{2}. The dashed line indicates *Δu* = 0.

## Discussion

The current results demonstrate the drastic difference between the utility criterion and the quantitative criterion. Under the utility criterion, equal division means that the rich should obtain more than the poor. In contrast, under the quantitative criterion, both the rich and poor obtain the same amount of money (dollars), but their utility becomes different. Recent studies have shown that animals (e.g., human adults [32], human babies [33], and chimpanzees [34]) express a sense of fairness only when the reward is divided quantitatively in half. These studies suggest that these animals use the quantitative criterion in equal division.

This discrepancy in social interpretations could not have been resolved for more than a century because no utility can be derived unambiguously. From empirical studies of preferences in humans, utility (considered as perceptional quantity or psychophysical quantity) is known to correlate with the logarithm of the input (the stimulus quantity); this relationship is known as the Weber-Fechner Law [38–39]. Recent studies have also demonstrated that utility also depends on the current wealth of an individual decision maker [40], as in the analytically derived utility function applied in the current game. However, in traditional decision theory, the only method of estimating the utility function is to compare the preferences between two choices. Therefore, as Poincaré noted a century ago, we cannot even derive an approximate utility function mathematically [41]. The arbitrariness of utility thus cannot be avoided in any proposed utility functions. To avoid this problem, utility functions are treated as a black box without referring to the resources actually gained (rewards). Based on this problem, the resource dividends in traditional utility theory that are based on the utility criterion cannot be translated into actual amounts of resources.

The current results highlight a serious problem in repeated games. In a strict sense, traditional analyses of game theory are valid only when all players’ current wealth is equal. However, in any type of repeated games, players’ current wealth inevitably varies after each game. Thus, the utility function of a player varies in any sequential decisions, as long as his/her current wealth varies over time. Furthermore, the utility criterion is not applicable if the current wealth of players varies significantly. Therefore, traditional analyses are applicable only to the case of one-time decisions (games) in which the wealth of all players is equal. Note that the traditional equilibrium analyses become invalid even when the quantitative criterion is adopted. Thus, traditional analyses of game theory should be reevaluated in terms of DU. We should note that the utility criterion in game theory is still valuable because we have no alternative to estimate dynamic games. The traditional utility and game theory should be used as guidance for determining the exact (true) dynamic games. The definition of Nash equilibrium is still valid, and we expect that its analytical solution is also approximately true as long as the current wealth of players does not differ substantially.

## Acknowledgments

This study was supported by a Japan Prize Foundation award to HI, grants-in-aid from the Japan Society for Promotion of Science (JSPS) for JSPS fellows to HI (no. 14J02983), and grants-in-aid from the JSPS to JY (nos. 22255004, 22370010, 26257405 and 15H04420). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

## Author Contributions

**Conceived and designed the experiments:**HI YK EH JY.**Performed the experiments:**HI YK EH JY.**Analyzed the data:**HI YK EH JY.**Contributed reagents/materials/analysis tools:**HI YK EH JY.**Wrote the paper:**HI YK EH JY.

## References

- 1. Wang Z, Wang L, Szolnoki A, Perc M. Evolutionary games on multilayer networks: A colloquium. European Physical Journal B. 2015; 88: 124.
- 2. Huang K, Wang T, Cheng Y, Zheng X. Effect of heterogeneous investments on the evolution of cooperation in spatial public goods game. PLOS ONE. 2015; 10(3): e0120317. pmid:25781345
- 3. Perc M, Wang Z. Heterogeneous aspirations promote cooperation in the prisoner’s dilemma game. PLOS ONE. 2010; 5(12): e15117. pmid:21151898
- 4. Huang K, Zheng X, Yang Y, Wang T. Behavioral evolution in evacuation crowd based on heterogeneous rationality of small groups. Applied Mathematics and Computation. 2015; 266: 501–506.
- 5. Szolnoki A, Perc M. Reward and cooperation in the spatial public goods game. Europhysics Letters. 2010; 92(3): 38003.
- 6. Wang Z, Xia CY, Meloni S, Zhou CS, Moreno Y. Impact of social punishment on cooperative behavior in complex network. Scientific reports. 2013; 3: 3055. pmid:24162105
- 7. Perc M, Gómez-Gardeñes J, Szolnoki A, Floría LM, Moreno Y. Evolutionary dynamics of group interactions on structured populations: a review. Journal of The Royal Society Interface. 2013: 10(80): 20120997.
- 8. Ito H, Yoshimura J. Social penalty promotes cooperation in a cooperative society. Scientific reports. 2015; 5: 12797. pmid:26238521
- 9. Wang Z, Szolnoki A, Perc M. Rewarding evolutionary fitness with links between populations promotes cooperation. Journal of Theoretical Biology. 2014; 349: 50–56. pmid:24508726
- 10. Zhu CJ, Sun SW, Wang L, Ding S, Wang J, Xia CY. Promotion of cooperation due to diversity of players in the spatial public goods game with increasing neighborhood size. Physica A. 2014; 406: 145–154.
- 11. Chen MH, Wang L, Sun SW, Wang J, Xia CY. Evolution of cooperation in the spatial public goods game with adaptive reputation assortment. Physics Letters A. 2016; 380: 40–47.
- 12. Xia CY, Meloni S, Moreno Y. Effects of environment knowledge on agglomeration and cooperation in spatial public goods games. Advances in Complex Systems. 2012; 15: 1250056.
- 13. Xia CY, Miao Q, Wang J, Ding S. Evolution of cooperation in the traveler’s dilemma game on two coupled lattices. Applied Mathematics and Computation. 2014; 246: 389–398.
- 14.
Maynard Smith J. Evolution and the Theory of Games. Cambridge: Cambridge University Press; 1982.
- 15.
Nowak MA. Evolutionary Dynamics: Exploring the equations of life. Cambridge: Harvard University press; 2006.
- 16.
von Neumann J, Morgenstern O. The Theory of Games and Economic Behavior. 2nd ed. Princeton: Princeton University Press; 1947.
- 17. Yoshimura J, Ito H, Miller DG III, Tainaka K. Dynamic decision-making in uncertain environments I. The principle of dynamic utility. J. Ethol. 2013; 31: 101–105.
- 18. Yoshimura J, Ito H, Miller DG III, Tainaka K. Dynamic decision-making in uncertain environments II. Allais paradox in human behavior. J. Ethol. 2013; 31: 107–113.
- 19.
Luce RD, Raiffa H. Games and Decisions: Introduction and critical survey. Hoboken, New York, Chichester, Brisbane, Toronto and Singapore: John Wiley & Sons; 1957.
- 20.
Rasmusen E. Games and Information: An Introduction to Game Theory. 4th ed. London: Blackwell Publishers; 2006.
- 21.
Clark CW. Mathematical Bioeconomics: The Mathematics of Conservation Third edition. Hoboken: John Wiley & Sons; 2010.
- 22.
Stiglitz JE, Walsh CE. Economics 3rd edition. New York: W. W. Norton & Co; 2002.
- 23.
Margolis H. Selfishness, Altruism, and Rationality: A Theory of Social Choice. Chicago: University of Chicago Press; 1984.
- 24.
Savage LJ. The Foundations of Statistics. New York: Dover Publications; 1972.
- 25.
Borch KH. The Economics of Uncertainty. Princeton: Princeton University Press; 1972.
- 26.
Jeffry RC. The Logic of Decision 2
^{nd}Edition. Chicago: University of Chicago Press; 1983. - 27.
Mankiw NG. Principle of Economics Third Edition. South-Western: Thomson Learning; 2004.
- 28.
Littlejohn SW. Theories of Human Communication. Columbus: Charles E. Merrill Publishing Co & A Bell Howell Co; 1978.
- 29.
Hirshleifer J, Glazer A, Hirshleifer D. Price Theory and Applications: Decisions, Markets, and Information. Cambridge: Cambridge University Press; 2005.
- 30. Basu K. The Traveler's Dilemma. Scientific American. 2007; 296: 90–95.
- 31. Fehr E, Schmidt KM. A theory of fairness, competition, and cooperation. Q. J. Econ. 1999; 114: 817–868.
- 32. Sanfey AG, Rilling JK, Aronson JA, Nystrom LE, Cohen JD. The Neural Basis of Economic Decision-Making in the Ultimatum Game. Science. 2003; 300: 1755–1758. pmid:12805551
- 33. Hamann K, Warneken F, Greenberg JR, Tomasello M. Collaboration encourages equal sharing in children but not in chimpanzees. Nature. 2011; 476: 328–331. pmid:21775985
- 34. Proctor D, Williamson RA, Waal FBM, Brosnan SF. Chimpanzees play the ultimatum game. Proc. Natl. Acad. Sci. USA. 2012; 110: 2070–2075.
- 35.
Bellman RE. Dynamic Programming. Princeton: Princeton University Press; 1957.
- 36. Houston A, Clark C, Mcnamara J, Mangel M. Dynamic models in behavioural and evolutionary ecology. Nature. 1988; 332: 29–34.
- 37. Yoshimura J, Clark CW. Individual adaptations in stochastic environments. Evolutionary Ecology. 1991; 5: 173–192.
- 38.
Weber EH. Zusätze zur Lehre vom Baue und von den Verrichtungen der Geschlechtsorgane. Leipzig, SN: Weidmann’sche Buchhandlung; 1846.
- 39.
Fechner GT. Elemente der psychophysik. Leipzig, SN: Breitkopf; 1860.
- 40. Tricomi E, Rangel A, Camerer CF, O’Doherty JP. Neural evidence for inequality-averse social preference. Nature. 2010; 463: 1089–1091. pmid:20182511
- 41. Walras L. Economique et Mécanique. Bull. Soc. vaud. Sci. nat. 1909; 45: 313–325.