Mistakes can stabilise the dynamics of rock-paper-scissors games

A game of rock-paper-scissors is an interesting example of an interaction where none of the pure strategies strictly dominates all others, leading to a cyclic pattern. In this work, we consider an unstable version of rock-paper-scissors dynamics and allow individuals to make behavioural mistakes during the strategy execution. We show that such an assumption can break a cyclic relationship leading to a stable equilibrium emerging with only one strategy surviving. We consider two cases: completely random mistakes when individuals have no bias towards any strategy and a general form of mistakes. Then, we determine conditions for a strategy to dominate all other strategies. However, given that individuals who adopt a dominating strategy are still prone to behavioural mistakes in the observed behaviour, we may still observe extinct strategies. That is, behavioural mistakes in strategy execution stabilise evolutionary dynamics leading to an evolutionary stable and, potentially, mixed co-existence equilibrium.


29
The question frequently arising in ecology is: Under which conditions does a particular type of species 30 survive? This question is also relevant in the context of understanding a wide range of environmental, 31 social, genetic and other conditions potentially influencing evolutionary trajectories. Evolutionary game 32 theory, a branch of game theory and ecological sciences, aims to answer that question (1)(2)(3)(4)(5). One of 33 mistakes resulting in stochastic payoffs of all involved individuals, altering overall population's fitness. Figure 1: A schematic representation of behavioural mistakes in a rock-paper-scissors game. (A) Rock-paperscissors dynamics with pure strategies is described by a fitness matrix such that the cyclic relationship between the three strategies is promoted. (B) The effect of execution errors on the example of one interaction: here individual 1 has chosen strategy paper and individual 2 has chosen strategy rock. Without mistakes, individual 1 would win this instance of the contest. However, a mistake in the execution leads to mixed strategies being played for both individuals resulting in different possible outcomes of the interaction. Hence, the outcome of the game is no longer deterministic but stochastic and depends on the probability distribution of mistakes.
Here, we consider the following scenario. Imagine, each randomly chosen individual finds itself in 64 the pairwise interaction with another randomly chosen individual. Both of them choose a strategy to consider changing environments where adaptation becomes particularly important and depends strongly 77 on the interplay between behavioural patterns and fitness. 78 In low-dimensional games this interplay can be captured and analysed in detail. Unfortunately, it 79 becomes challenging as dimensionality of a game grows where even small perturbations may impact an 80 evolutionary outcome. However, under a natural assumption that behavioural mistakes are completely 81 random, we can describe game behaviour for general n dimensions. We show that in such settings, strate-82 gies (or behavioural types) leverage their fitness advantage. This in turn might lead to only one strategy 83 dominating. Further, we assume that mistakes do not have to be completely random. We consider a 84 symmetric case of an unstable rock-paper-scissors game where no choice of strategies yields a fitness 85 advantage. Such games lead to a heteroclinic orbit where none of the strategies dominate. We show that 86 in such a case behavioural mistakes bring asymmetry to the game, breaking the cyclic relationship and 87 potentially leading to dominance of one of the strategies. That is, the structure of execution errors, may 88 technically imply the existence of an evolutionary stable interior point.

90
In this paper we focus on the rock-paper-scissors dynamics. Hence, we shall mostly work with the 91 general form of R given by 92

Rock P aper Scissors
where a i , b i ∈ R + (4).

93
In classic games, there is an underlying assumption that players are able to execute the chosen actions 94 perfectly. We assume that actions selected by players may not coincide with the executed actions. Such 95 behavioural stochasticity results in executing unintended strategies and is captured in matrix Q(λ) from In (14) the authors called Q(λ) the incompetence matrix with elements q ij (λ). However, in the bio-98 logical context considered here the name plasticity matrix is more appropriate. This stochastic matrix is 99 constructed from the set of all probabilities of player 1 executing action j given that she selects action 100 i. When λ = 1, Q(1) = I and no mistakes are observed in the population. Hence, the population is 101 behaviourally homogeneous and all interactions are deterministic. However, if λ < 1, then with proba-102 bilities q ij (λ) an individual chooses to play strategy i but plays strategy j instead. We say that in such 103 a case the population is Q(λ)-heterogeneous and the outcomes of the interactions are now stochastic. 104 We shall call λ the strength of behavioural plasticity. In the limit as λ → 0, the matrix Q(0) is equal 105 to S, which is defined as a limiting distribution of behavioural mistakes. Such a matrix in the case of a 106 three-strategy matrix game has the form and is also a stochastic matrix. Every i-th row of this matrix defines a mixed profile of each strategy i. 108 We define the expected incompetent reward matrix as a perturbation of the fitness matrix by plasticity 109 (or incompetence), namely where D(R(λ)) is a matrix with each column j consisting of the diagonal elements of R(λ), inducing 112 r jj (λ) = 0, j = 1, 2, 3 (32). In our further analysis, we will focus on the equilibrium analysis of the 113 games with the fitness matrixR(λ), and explore possible transitions caused by λ changing values in

117
In the evolutionary sense, behavioural mistakes lead to perturbations in fitness that populations obtain rors. Hence, the population dynamics now depends on the degree of plasticity, that is competency of 124 individuals, according to replicator equations (33) defined as where the fitness of i-th strategy is given by where e T i is the i-th transposed unit basis vector. The mean fitness of the entire population is defined as Interpretation of λ

128
The model proposed here was first referred to as a "game with incompetence of players" (14; 31). That is, 129 the matrix Q was consisting of probabilities of players' mistakes, when they intended to execute strategy  This concept was next considered in the evolutionary settings as a modelling approach to adaptation 137 to a new environment (15). First, it was assumed that a population is immersed into a new environment, 138 which can happen either due to migration of animals or changing environmental conditions. It is assumed 139 that there are n behavioural types or strategies available to individuals. Then, new conditions might 140 increase stress levels and force individuals' behaviour to deviate from the one in the old environment.

141
Such deviations are then captured in the matrix S. As time passes by, animals learn and adapt to their 142 new environmental conditions, which is then reflected in the parameter λ. In such settings, one can also 143 assume some form of learning dynamics, λ(t) (34).

144
Another possible way to think about this model, is to apply it at a genetic level (35). That is, we would

174
Since the stable equilibrium is a strict Nash equilibrium, it is an evolutionary stable strategy (ESS) 175 (4). However, for any given λ and strategy choice,x(λ), the observed stochastic behaviour of organisms, Here, a stable fixed point is denoted by a red circle and a unstable fixed point is denoted by a white circle. We use the Wolfram Mathematica project (36) to produce these phase planes.
Hence, at λ = 0 the game possesses a stable pure equilibriumx(0) = (1, 0, 0) that corresponds to the The assumption that individuals make mistakes completely at random is somewhat limiting.
Note that the determinant of R is negative, which implies that this game possesses an interior fixed  , 1 3, 1 6). Then, the matrix S is given by 203 at λ ≈ 0.287, a paper strategy becomes stable (Fig 3 panel D). Further, the interior fixed point vanishes 208 at λ = 1 4 leaving unstable fixed points on the rock-scissors and paper-scissors edges (Fig 3 panel E), 209 which is followed by a rock strategy becoming stable at λ ≈ 0.209 (Fig 3 panel F). This interval of all 210 three strategies being stable is rather short as at λ = 1 5 paper loses its stability (Fig 3 panel G). However, 211 these are not the only transformations occurring: at λ = 1 7 the interior equilibrium emerges again (Fig   212   3 panel H). While the interior equilibrium exists again, scissors lose their stability too at λ = 1 13 and at 213 λ = 0 rock is the only stable equilibrium. Note that for λ = 0 (Fig 3 panel J), the stable observed pure 214 equilibrium will again be a mixed strategy due to the execution errors given byỹ(0) = s 1 = ( 1 3, 2 3, 0).

219
In the following analysis we shall examine possible transitions in unstable RPS games. We aim to define 220 conditions under which we can secure existence of a stable equilibrium.

222
Games with completely random plasticity 223 Let us first consider the case when behavioural mistakes are completely random. Such settings can be 224 interpreted as either a form of phenotypic plasticity or just noise in the interactions. Then, the matrix S 225 is such that any strategy obtains the same probability of mistakes, that is, s ij = 1 3, ∀i, j = 1, 2, 3. For 226 such a game, the canonical fitness matrix simplifies to where J is a matrix of ones, R is the fitness matrix of the original game and λ ∈ (0, 1]. In a game 228 with λ ∈ (0, 1], if strategies do not induce any overall fitness advantage to any strategy (that is, R is a 229 row-sum-constant matrix), then uniform execution errors will not affect the resulting equilibrium (see 230 Supplementary Information, Proposition 1).

231
Result 1. Letx be an interior equilibrium of R. If the limiting distribution of mistakes, S, is a uni-232 form matrix, that is, s ij = 1 3, ∀i, j = 1, 2, 3 and R is a row-sum-constant matrix, thenx is an interior 233 equilibrium for the gameR(λ) for any λ ∈ (0, 1].

234
In other words, if in a row-sum-constant game everyone is making mistakes with the same prob- the limiting distribution of mistakes, S, being a uniform matrix, that is, s ij = 1 3, ∀i, j = 1, 2, 3, we obtain is an interior fixed point for the gameR(λ).

243
Note that, the pointx(λ) from equation (8)   Note that for a homogeneous population (λ = 1) the interior fixed point,x, can be calculated as where R ji are cofactors of the matrix R. Then, as rate of execution errors of players increase (λ → 0), 287 the interior pointx(λ) might transform and become infeasible as one or two of the components reach 0, 288 that is,x i (λ) → 0 for some i.

306
Result 3. Let λ c ∈ (0, 1) be such that λ c = min(λ c kj , λ c ij ), wherer kj (λ c kj ) = 0 andr ij (λ c ij ) = 0, i, j, k in the context of a cyclic RPS game. Here, behavioural mistakes imply that individuals might execute 345 a strategy different from the intended one. We encode all probabilities of mistakes in a matrix Q(λ) 346 and allow individuals to play either a mixed or pure strategy. The degree of plasticity is captured by the 347 parameter λ varying from 1 (no plasticity) to 0 (maximum plasticity). 348 We then explore the influence of the limiting distribution of mistakes captured in matrix S on the 349 evolution of social behaviour of species. Depending on the matrix S, different pure strategies might 350 benefit from those mistakes. Such matrix captures mistake probabilities for the limiting case of λ = 0.

351
We analyse the interplay of learning and fitness advantages and define conditions under which strategies 352 can prevail. For example, in the case with completely random mistakes, the most beneficial strategy is 353 the strategy with the highest relative fitness advantage (see Result 2). However, it does not change the 354 outcome of the evolution since in this case it will be a completely mixed interior point.

355
One can also interpret our model as adaptation to new environmental conditions. Then, it is natural 356 to expect that specific environments require different strategies to be adopted. For instance, in the case 357 with an RPS game with the interior equilibrium ( 1 3, 1 3, 1 3) and a general form of S, different strategies 358 might become stable depending on their behavioural plasticity as their competence evolves (see Result   359 3). However, even if behavioural choice of organisms will evolve to a stable pure strategy, their executed 360 strategy (for λ ≠ 1) might differ from their actual type. Conversely, we will obtain a vector of mixed 361 strategies given by equation (7). Hence, S can introduce stability in the game which might preserve all 362 three strategies from extinction.

363
Interestingly, at λ = 0, strategies are leveraging the advantage they can gain from mistakes from 364 maximum plasticity. For instance, in the case with a general form of limiting probability distribution, 365 stability of a pure strategy is determined by its plastic response to itself (see Result 3). For a strategy to 366 become stable, it is necessary to be uninvadable by the other two plastic strategies.

367
Overall, behavioural heterogeneity, captured through the execution noise, might help species to ben-368 efit from behavioural heterogeneity or plasticity. The ability of our model to induce a stable equilibrium 369 in the unstable game might help in explaining why such unstable RPS dynamics are not observed in 370 wild communities. That is, plasticity in behaviour might help to stabilise the evolutionary outcome and 371 sometimes enable one of the strategies to become dominant.