A Strategic Interaction Model of Punishment Favoring Contagion of Honest Behavior

The punishment effect on social behavior is analyzed within the strategic interaction framework of Cellular Automata and computational Evolutionary Game Theory. A new game, called Social Honesty (SH), is proposed. The SH game is analyzed in spatial configurations. Probabilistic punishment is used as a dishonesty deterrence mechanism. In order to capture the intrinsic uncertainty of social environments, payoffs are described as random variables. New dynamics, with a new relation between punishment probability and punishment severity, are revealed. Punishment probability proves to be more important than punishment severity in guiding convergence towards honesty as predominant behavior. This result is confirmed by empirical evidence and reported experiments. Critical values and transition intervals for punishment probability and severity are identified and analyzed. Clusters of honest or dishonest players emerge spontaneously from the very first rounds of interaction and are determinant for the future dynamics and outcomes.


Introduction
Dynamics of honest/dishonest behavior in social and economic environments are of particular interest as they are important influencers of social functionality. Honesty and trust are considered the main foundations of social order. Dishonesty often leads to undesired social phenomena undermining common welfare [1]. Erosion of social norms deteriorates the social capital [2]. Moreover, in the economic realm, dishonest behavior and the associated corruption may generate market failure and poverty [3].
The 2008 crisis illustrates the effects of dishonest behavior. Toxic and risky financial products have been used only because they offered huge profits for banks and investment funds. The lack of regulation and control made this possible. When the tailors of such speculative schemes are questioned about their deeds the answer is most often: 'because we can, because the control is loose, because state authorities lack the resources to catch people like me' (see the declarations of Bernard Madoff -operator of the largest financial fraud in U.S. history). It has been proven that people often cheat when the punishment certainty is low and also because other people do it [4]. Nowadays, due to the wide spread of information technology, the contagion of such behaviors is easier and faster than before [5].
Dishonesty, defined in the context of personal interactions, may be seen as a strategy that increases the benefit of the one using it while causing loss to the other interacting person. The principle is: 'one man's bribe may be another man's gift' [6]. In other words, dishonesty is a socially defecting choice. An individual may have an incentive to act dishonestly in order to increase his/her payoff (i.e. material or of any kind). This usually leads to lower payoffs for those interacting with him/her. Advantages and also risks are associated to acting dishonestly. The benefit may be seen as the payoff of crime [7], whereas the costs include material costs, status risks, the probability of being caught, and the prospect of penalty.
A typical scenario where social honesty dynamics may be observed is based in a society where each player (individual, group, firm, etc.) interacts with other players (providing, for instance, services, products, information). Every interaction is actually a transaction. Moreover, an explicit or non-explicit competition is the underlying mechanism. Players may establish an informal (possibly verbal) contract for providing/using a certain service. An honest behavior consists, for instance, in offering a service/product at the expected quality, whereas a dishonest behavior means cheating by deliberately providing lower quality, misleading or incomplete information, fake results, or by causing delays.
The existence of some kind of punishment of dishonest behavior seems to be a necessary condition for promoting, supporting, encouraging, and protecting honesty [8][9][10]. Various natural or spontaneous forms of punishment exist [11] -be it a penalty or just a critical feedback from another player. A punishment may be applied by a player, a group of players or by a central authority.
Altruistic punishment, for instance, is one of the mechanisms that may enforce cooperation in human societies [12]. Punishment is called 'altruistic' if it implies a cost for the punisher and if the punished person's behavior is meant to change for the benefit of the society. An example is to tell a queue jumper to stand in line. The negative emotion produced by the defector is a possible mechanism for triggering an altruistic punishment. Experiments with humans confirm this hypothesis [12].
An important issue about punishment is to find an efficient balance between severity and certainty [13]. Empirical evidence and reported experiments reveal that punishment certainty is more important than severity [14]. Moreover, social experience indicates that a low punishment probability is inefficient even when punishment severity is very high [11]. A classical example is the U.S. Prohibition.
The general subjective perception about the risk of being punished is also important. Dishonest behavior is encouraged when the perceived risk of punishment is low [15]. In societies affected by deep systemic corruption, it is very difficult to eradicate dishonest behaviors because bad habits are somehow culturally accepted and considered more or less 'normal'.
Both honest and dishonest behavior may display an epidemic character [16]. A successful yet dishonest person may be taken as a model by others. The notoriety of 'successful' negative models may increase their negative impact by replication of their actions by other individuals. An individual's incentive to unethical behavior depends largely on social norms. The important figures in their social group will have a larger impact [16].
Social conformism, need for safety, greed, and the fear of missing out induce powerful incentives for imitation [5]. Imitation becomes a convenient heuristic when there is too much information to process [5]. Also, imitation is an effective mechanism of spreading a social behavior. Then How to start and promote an honesty epidemic?
We study the conditions that favor (or not) the emergence of honest behavior in a social framework. Punishment, in its various forms, penalty or negative feedback, is considered a control mechanism. The main control parameters are the certainty and the severity of punishment -which eventually can act as honesty contagion incentives. We posit that an appropriate/finely tuned punishment mechanism may start an epidemic of honesty. We assume that community members and/or an authority are able to discover and punish, with a certain probability, dishonest players. Zero punishment probability may account for a corrupt/ dishonest-dominated society. However, public policies may finely tune some parameters that influence players' payoffs, in various situations.
Honest/dishonest behavior dynamics under punishment effect are analyzed in the strategic interaction framework of Cellular Automata and Evolutionary Game Theory (GT). Strategic interactions are the basic paradigm of GT: one player's payoff depends on the actions of all the other players in the environment. We propose a social dilemma game, called the Social Honesty (SH) game. Players' strategies are either 'honest' (H) or 'dishonest' (D). We study the influence of local social interactions on the spreading of a particular behavior. Numerical simulations explore the contagion dynamics of honest and dishonest strategies in the population.

Approach
Internal mechanisms that trigger human behavior are too complex to be accurately described and to be captured by sound (mathematical or computational) models. Social interactions add even more complexity. That is why a feasible approach would be that of focusing on the individual behavior, as it is relatively easy to observe and measure [17]. Accordingly, honest and dishonest actions may be seen as classes of human behavior/action.
Becker proposes a simple economic model: a rational crime theory [7]. According to this model, an individual decides to commit a crime if the revenue obtained is higher than the price payed for that crime. This model was proven to be inadequate for many real world situations [4,16]. This is due to the fact that human behavior is significantly influenced by non-material aspects such as: emotions and beliefs, the perceived risk of punishment, the salience of ethicality, the visibility of unethicality of another person, social identity, reputation, reciprocity, etc.
We base our approach in the fields of Cellular Automata and Evolutionary Game Theory. We propose a new social dilemma game called the Social Honesty (SH) game. In order to capture the uncertainty of the social environment the SH game model considers probabilistic payoffs. Payoffs are described as random variables.
For convenience, we call an 'H-player' a player using the honest strategy and a 'D-player' a player using the dishonest strategy. We assume that an incentive towards dishonest behavior exists, yet there is also an associated risk: a probabilistic punishment for Dplayers.
When both players chose the dishonest strategy only one of them will win, yet the punishment may be applied to both of them. Dishonest behavior in one player causes a lower payoff for the honest player with whom he/she interacts. Table 1 depicts the payoff matrix of the Social Honesty game. Within the SH game, when two H-players interact each player gets a positive payoff c(cw0). The value of c is constant.
When an H-player interacts with a D-player, the H-player gets zero and the D-player is punished with probability p 1 .
Let us denote by S be the punishment severity (usually Sw0). If not punished, the D-player gets a payoff equal to a, which represents the D-player's advantage in an (H,D) interaction (advantage to be dishonest). Thus, D-player's payoff may be expressed as a discrete random variable A. Variable A takes the value ({S) with probability p 1 and the value a with probability 1{p 1 (awc). A is defined as follows: In (D,D) interactions, payoffs are assigned to players according to the following rule: (i ) each D-player may be punished independently with (ii) if no player is punished, one player gets zero and the other one gets b,(bw0), or the opposite, with equal probabilities.
Therefore D-players cannot win b and zero simultaneously, but both may be punished.
The payoffs for the D-players may be expressed as discrete random variables B and B 0 , defined as follows: Table 1. SH game normal-form.

Player1/Player2
Honest (H) Dishonest (D) The payoff matrix of the Social Honesty game.
! Spatial form of SH game. A standard N|N lattice model is considered. Each lattice cell represents a player. In the majority of our experiments players are arranged on a regular lattice with joint boundaries of a cellular automaton [30,31] (see also [32,33]). Another set of experiments are based on scale-free networks [34].
The state of a cell is the strategy of the corresponding player (H or D). At each game round a player may act either honestly or dishonestly. Each player strategically interacts with the neighbors by means of the SH game. A player's gain in one round is the sum of the payoffs obtained in each of her interactions of that particular round. Players synchronously update their strategies at each round.
Experiments are based on Moore, von Neumann [35], wellmixed and scale-free neighborhood topologies [34]. In most of the subsequent experiments we use Moore's neighborhood with radius 1 (eight cells surrounding a central cell). One experiment is dedicated to a comparison between different types of neighborhoods.
Human learning is a complex process based on social and asocial forms of learning [36,37]. Our aim is not to find the best suited form of learning but to test the most important learning strategies and compare them. Therefore, several strategy update rules are experimented. 'Best' imitation strategy is used in the majority of the experiments: each player imitates the strategy of the neighbor with the highest payoff of the last round. This strategy update rule, inspired from the 'survival of the fittest' principle, is frequently used in evolutionary games [38]. Probabilistic strategy update rules, based on myopic and Fermi function [39,40], are also experimented.
Experiments run on the following parameter setting of the SH game: a~3,c~2,b~2. In most of the subsequent experiments we use a punishment probability of p~p1~p2. In one experiment we use unequal punishments. Experiments are performed on a 100|100 population. For statistical relevance, results are averaged over 100 runs.

Results
The main findings of our experiments are described in the following:

Experiment 1. Emergence of H and D clusters in the population
In this experiment we start from an initial 100|100 population with 50% D-players, randomly distributed. Punishment severity is S~2 and punishment probability is p~0:15. Radius 1 Moore neighborhood and best-neighbor imitation update strategy are used. Fig. 1 and Fig. 2 illustrate honest/dishonest the population dynamics over the first 300 rounds.
It may be observed that the H-player rate drops dramatically in the first 3 rounds. Only few small H clusters survive. In time, these clusters grow and divide. Cluster shape and location are changing dynamically. H and D-player rates become approximately stable after about 150 rounds and remain almost unchanged for 100,000 rounds (60% H-players, 40% D players), indicating a kind of dynamic equilibrium.
As it may be observed in Fig. 2, H-player cluster formation seems to be an important phenomenon in resisting D-player invasion. Cluster dynamics indicate that numerous changes occur at the cluster frontiers, and few or no changes in the cluster center.
Player rates at equilibrium (approximately constant rates) depend on the punishment probability and severity. Fig. 3 depicts the honest/dishonest dynamics in the first 300 game rounds for different punishment probabilities (p~0:13,0:15,0:17 and 0:18) and constant punishment severity (S~2). In each case, the Hplayer rate drops very fast and then increases slowly, remaining quasi stable after 300 rounds. The cluster pattern depends on parameters p and S.

Experiment 2. Punishment impact on H and D-player rates in the population
In this experiment the player rates are measured after 500 rounds. Each experiment runs for 100 times and the results are averaged. We start from an initial 100|100 population with 50% D-players, randomly distributed. Radius 1 Moore neighborhood and best-neighbor imitation update strategy are used. Some p-transition intervals are identified, accounting for a translation from an average D domination to an average H domination (Fig. 5). In order to guarantee H-player domination, punishment probability should be higher than the upper bound of the p-transition interval. Table 2 illustrates the p-transition intervals for different values of the punishment severity S. After 150 rounds H-clusters may be found all over the population (57% Hplayers). The color code is: blue -is honest/was honest; red -is dishonest/was dishonest; green -is honest/was dishonest; yellow -is dishonest/was honest. doi:10.1371/journal.pone.0087471.g002 Similar to p-transition intervals, specific transition intervals for the punishment severity have been found. S-transition intervals are depicted in Fig. 6.
For lower p values, the S-transition intervals become significantly wider and translated to higher values. This means that punishment severity is ineffective when punishment probability is very low. Higher punishment probability makes it possible to reduce significantly punishment severity, with the same effect on the D and H population rate.

Experiment 3. The effect of increasing the advantage of being dishonest
If D-player's advantage when playing against an H-player is double (a is set to 6 instead of 3) the effect is a significant change in the p-transition interval: from (0:12,0:18) to (0:4,0:47), indicating a large increase in the required punishment probability. This indicates that D-player's advantage in a (D,H) interaction is a sensitive parameter of the model.

Experiment 4. The effect of unequal punishment probabilities on the p-transition interval
If p 2 is set to 0, meaning that a D-player is punished only when playing against an H-player, the effect is a translation of the p 1transition interval, which also becomes slightly narrower. For S = 2 and p 2 = 0, the p 1 -transition interval changes from (0:12,0:18) to (0:23,0:26). This indicates that a double punishment probability is necessary when, for some reasons, p 2 is close to zero. This corresponds to a case when both D-players are difficult to expose. Such a form of cooperation between D-players is clearly unfavorable to H-player spreading.
For the rest of experiments we set p 1 = p 2 = p.

Experiment 5. The effect of the initial rate of H and Dplayers on the p-transition interval
Different rates of randomly distributed D and H-players in the initial population are considered. Fig. 7 illustrates the effect of the initial D-player rate on the p-transition interval. (S~2). The initial population contains 50%, randomly distributed, D-players. The color code is: blue -is honest/was honest; red -is dishonest/was dishonest; green -is honest/was dishonest; yellow -is dishonest/was honest. doi:10.1371/journal.pone.0087471.g004 The p-transition interval changes when the initial rate of Dplayers changes. The difference is significant for a D rate superior to 50% (H rate below 50%) and less significant when D-players are a minority. Between one D-player and 50% D-players the ptransition interval does not change much. When D-player rate is high (w70%) the p-transition interval becomes more nosy.
An explanation may be found if we correlate these results with the observations about cluster dynamics (see Experiment 1, Fig. 1 and 2). When the initial rate of D-players is high, the H cluster formation probability is low. If no H cluster appears in the first rounds, then D-players will spread all over the population.
This suggests the fact that the initial cluster structure is more important than the initial proportion of H and D-players. The importance of the initial cluster structure is investigated in the next experiment.

Experiment 6. The importance of the initial cluster formation
In this experiment we start from a situation where clusters of D and H-players already exist (in all other experiments we start from H and D-players randomly spread).
Two world states are generated by letting the population evolve: one with 95% and the second with 12% D-players. In both cases clusters already exist (similar to what is seen in Fig. 4). The granularity is measured by counting the strategy changes for each lattice row. The value is averaged and normalized. The granularity is similar for the two cases (about 0.9, whereas in the case of randomly spread players it is about 0.5).
The results are depicted in Fig. 8. It may be observed that p-transition intervals are very similar for the two different initial world states. This fact indicates that cluster existence is much more important than the initial H/D population rate. Experiment 7. The effect of the population size on the p-
It may be observed that, for small size worlds, the p-transition intervals are wider, translated to higher values, and also exhibit some noise. Significant changes appear when the world size is smaller than 50|50.
Since the values depicted in Fig. 9 are averages, the noise indicates a high dispersion of the H rate dynamic. The noise observed for the small size worlds indicate that their dynamic is less stable. These results may be explained by the initial cluster formation: in a small size world, the cluster formation probability is lower than in a large world. If H clusters do not appear, the situation converges rapidly to a pure D dominance.
A similarity may be observed between the p-transition interval for a small population (Fig. 9) and the p-transition interval for an initial world state with numerous D-players (Fig. 8). This similarity may be explained by a common cause: the probability of H cluster formation in the first rounds depends on the initial distribution but also on the population size (probability of cluster formation is higher in larger populations).
We notice that higher punishment probability and severity are needed in small-size worlds (e.g. 3|3 or 20|20) in order to obtain the same effect as in larger size worlds (e.g. 50|50 or 200|200).
As we already noticed, cluster formation is the main driver for spreading honest/dishonest behavior. Fig. 10 depicts three different-size world dynamics.
Very small size worlds tend to converge towards a pure distribution (100% D or 100% H). In medium and large size

Experiment 8. The effect of the neighborhood type on the p-transition interval
In this experiment we analyze the impact of different types of neighborhoods: von Neumann, Moore, well-mixed, and scale-free. Von Neumann and Moore neighborhoods may have different radia. In a 'well-mixed' case everybody is neighbor with everybody. In a 'scale-free' neighborhood the connections are no longer related to the original lattice structure. Instead, a spatial power-law based graph is mapped on the lattice (each lattice node is a vertex in the scale-free graph).
Results are depicted in Fig. 11. As expected, the p-transition interval is influenced by the neighborhood type. However, this influence is rather minor and a phase transition appears in all cases. In the 'well-mixed' neighborhood network, the p-transition interval is narrower and translated to higher values.
For the 'scale-free' topology, the p-transition interval is wider. The upper bound of this interval is close to the upper bounds of the intervals obtained for von Neumann and Moore neighborhoods. For low punishment probability, the; 'scale-free' topology is more favorable to H-players.

Experiment 9. The effect of the strategy update rule on the p-transition interval
In this experiment we analyze the p-transition intervals for different types of strategy update rules. When using the 'Best' rule, players imitate the best player (i.e. the neighbor with the highest payoff). With 'Best Myopic' players imitate the best player with probability 0:75 and a randomly chosen neighbor with probability 0:25. With 'Best Fermi' the best player is imitated with a probability given by a particular function proposed in [40] or keeps its strategy unchanged. Figure 7. H-player averaged rate (100 runs) after 500 rounds, function of punishment probability p, (for S = 2). Different initial states are considered: one D-player in the middle, 10%, 50%, 60%, 70%, 80%, 90%, 95% D-players randomly positioned and one H-player in the middle. In all cases, a p-transition interval from D dominance to H dominance appears. The interval is wider and translated to higher values for higher initial rates of D-players. Significant changes appear only for more than 60% D-players. doi:10.1371/journal.pone.0087471.g007 Figure 8. H-player averaged rate (100 runs) after 500 rounds, function of punishment probability p, for S = 2. Two initial states are considered: one with 95% and the second with 12% D-players, both containing already formed clusters of players. The p-transition intervals are very similar, despite the initial player rate. A granularity measure (0 v gran. v 1) is used for characterizing the clusters (a high value indicates few large clusters). doi:10.1371/journal.pone.0087471.g008 Results are depicted in Fig. 12. The strategy update rule influences the p-transition interval by slightly changing its position. When using Moore neighborhood and 'Best Fermi' update rule a higher punishment probability is required in order to obtain the same effect as when using the 'Best' strategy. When using a 'scale-free' topology the difference between 'Best' and 'Best Fermi' rules is less significant.

Conclusion
A social dilemma model, called the Social Honesty (SH) game, is proposed and investigated. The SH game induces complex dynamics when iterated on a regular grid (cellular automaton) or network of various topologies and using different strategy update rules. The emerged dynamics may offer relevant insights onto real world processes such as social behavior dynamics.
Experimental results illustrate how the behavior of a population of interacting individuals may be influenced by setting an adequate punishment severity and applying it with a certain probability, even up to the point where an honesty domination may be achieved. Experiments indicate that punishment probability is more important than punishment severity. These results confirm the empirical evidence and studies based on real world observations [13,14], thus illustrating the validity of the model. Figure 9. H-player averaged rate (100 runs) after 500 rounds, function of punishment probability p, (S~2). The initial population contains 50%, randomly distributed, D-players. The p-transition intervals are depicted for different population sizes (from 3|3 to 250|250). For small populations, p-transition intervals are significantly wider and translated to higher values. A noise, caused by a high dispersion of the H rate dynamic, appears for small populations. doi:10.1371/journal.pone.0087471.g009 Figure 10. Different-size world dynamics: a 5|5 world, a 25|25 world, and 100|100 world. Several simulation runs (each on a separate row are depicted. The initial population contains 50% D-players randomly distributed. S = 2 and p is selected in the middle of the p-transition interval for each N|N world. 0,1,2,3,4,5,10,50,150, and 500 rounds of the SH game are captured. The color code is: blue -honest/was honest; reddishonest/was dishonest; green -honest/was dishonest; yellow -dishonest/was honest. doi:10.1371/journal.pone.0087471.g010 Transition intervals for punishment probability and severity have been identified. The results indicate the presence of transition intervals in all experiments. Punishment severity proves to be ineffective when punishment probability is very low. Higher punishment probability makes it possible to reduce significantly the punishment severity, with the same effect on the honest/ dishonest population rate. These results may be related to the observations about the U.S. Prohibition period when, despite a high punishment severity, the punishment probability was very low and thus ineffective [11].
When a 'zero tolerance' policy [41] is too costly and difficult to implement in practice, a solution is to find the punishment transition interval in order to finely tune the control mechanism. The optimal combination between punishment severity and probability depends on the involved costs.
Also, a higher punishment probability proves necessary when the dishonest's advantage is higher. Apparently, this result, confirming the rational crime model [7], may contradict the conclusions of [4], where the amount of money and punishment probability do not influence the temptation to act dishonestly. However, the considerations from [4] concern one individual, whereas in our model the individuals are influenced by their neighbors, through imitation.
Another finding is that the honest strategy survival depends on the players' ability to form clusters. A similar phenomenon has been observed for cooperative behavior emergence in experiments with Prisoner's Dilemma game [32,33]. We find that initial proportion of D and H-players is less important than the initial cluster formation (groups are stronger against aggression than isolated individuals). Also, small size populations seem to be less predictable and less sensitive to punishment.
Results indicate that the proposed model may describe real world phenomena with an acceptable approximation. New dynamics, with a new relation between punishment probability and punishment severity, are revealed. An epidemic of honesty is possible if model parameters are finely tuned and the cluster formation is triggered. Hopefully, policy makers, various groups and organizations, and even law enforcement institutions may use Figure 11. p-transition intervals for different types of neighborhoods. von Neumann (r = 1,2), Moore (r = 1,2,3) well-mixed (everybody is neighbor with everybody) and scale-free (power law connections mapped on the lattice). Averaged values for 100 runs are observed after 500 game rounds. doi:10.1371/journal.pone.0087471.g011 Figure 12. p-transition intervals for different types of strategy update rules. 'Best' rule -players imitate the best player (i.e. the neighbor with the highest payoff). 'Best Myopic' rule -players imitate the best player with probability 0:75 and a randomly chosen neighbor with probability 0:25. 'Best Fermi' rule -the best player is imitated with a probability given by a particular form of Fermi function. Averaged values for 100 runs are observed after 500 game rounds. doi:10.1371/journal.pone.0087471.g012 such a model for fine tuning punishment severity and certainty towards favoring honest behavior contagion.
As future work we intend to enrich our model with some additional features for the player model (identity, memory, etc.), test some other network topologies and new strategy updating rules. Another direction of interest is related to real world validation experiments.