Evolution of deterrence with costly reputation information

Deterrence, a defender’s avoidance of a challenger’s attack based on the threat of retaliation, is a basic ingredient of social cooperation in several animal species and is ubiquitous in human societies. Deterrence theory has recognized that deterrence can only be based on credible threats, but retaliating being costly for the defender rules this out in one-shot interactions. If interactions are repeated and observable, reputation building has been suggested as a way to sustain credibility and enable the evolution of deterrence. But this explanation ignores both the source and the costs of obtaining information on reputation. Even for small information costs successful deterrence is never evolutionarily stable. Here we use game-theoretic modelling and agent-based simulations to resolve this puzzle and to clarify under which conditions deterrence can nevertheless evolve and when it is bound to fail. Paradoxically, rich information on defenders’ past actions leads to a breakdown of deterrence, while with only minimal information deterrence can be highly successful. We argue that reputation-based deterrence sheds light on phenomena such as costly punishment and fairness, and might serve as a possible explanation for the evolution of informal property rights.


Best response dynamics
We assume that strategy updating is guided by the social learning process known as the best-response dynamics (BR-dynamics) [29-31]. Since the BR-dynamics is defined for games with finitely many pure strategies, we define = {0, * , 1, … } to be any finite discretization of the defenders' pure strategy space including at least 0, 1, and * . Let ∆( ) be the space of mixed strategies, i.e. probability distributions over SQ. Denoting the population state by ( ) ∈ ∆( 1 × ), this results in ( ) moving along (possibly non-unique) solutions, called best response paths (BR-paths), of the differential inclusion ̇( ) ∈ ( ( )) − ( ), where ( ) is the set of (pure or mixed) best responses to the population state x. As long as the current best response is unique, the BR-path describes a straight line in the state space pointing to the current pure best response. If a BR-path converges, the limit is a Nash equilibrium.
We assume two separate time-scales here: a slow one for the best response dynamics of the population state x and a fast one for the adaptation of the reputations of Q-defenders to their stationary values for fixed x. This allows us to assume that reputations are instantly equilibrated while the population state moves through the state space.
Now we set out to find the long-run behavior of the population state in the symmetrized game under BR-dynamics. Though the state space ∆( 1 × ) has dimension 3| | − 1 (which is at least 8), the task is greatly simplified by the following two steps, which we use to sequentially reduce the state space we have to analyze.
From the long-run behavior of the population state in this game we can then infer the long-run behavior of the population state in the symmetrized game. This allows us to reduce the dimension of the state space we are working in from 3| | − 1 to 2| | − 2.
Step , which we will do henceforth. We have now reduced the dimension of the state space from at least 8 to 3. The remaining task is to solve for the long-run behavior of the BR-dynamics.  Fig. 4 in the main text.
By construction of P, defenders switch to yielding if the induced population state is above the horizontal axis and they switch to fighting if it is below the horizontal axis.
Induced paths therefore move to the left above and to the right below the horizontal axis.
If qF is large, i.e. if most defenders are prepared to fight, challengers switch to always respecting (AllR), since neither taking nor obtaining information pays off. In the rightmost vertical sector, therefore, induced paths point to one of the boundary points of the horizontal axis. This is the case for > 1 − ′ . If is in the intermediary range ′ < < 1 − ′ , the discriminating strategy Disc becomes optimal for challengers. In the middle vertical sector induced paths therefore point to one of the bottom vertices of the rectangle.

Analytical results for the fighting-probability reputation assessment scheme
As shown above, for a defender randomizing with fighting probability Q the probability of her last action having been F is given by ( , ) = ( + ) + . For all other past actions, the corresponding probability is simply Q. Thus, her empirical fighting frequency calculated from the last k actions converges to Q as → ∞, provided that, as we assume here, her last switch appeared before those k actions were carried out. In the limiting case, a discriminator is therefore informed of the defender's true fighting probability Q. The optimal strategy for a discriminating challenger is then to take if < * and to respect if > * , i.e. to use the * -threshold strategy. We denote this strategy by Disc again. Against a taker, yielding ( = 0) is the best response for defenders and against a Disc-challenger it is clearly optimal to use * (assuming w.l.o.g. that discriminators respect if indifferent).
All other fighting probabilities are never optimal except if there are no takers in the challenger population.