If Cooperation Is Likely Punish Mildly: Insights from Economic Experiments Based on the Snowdrift Game

Punishment may deter antisocial behavior. Yet to punish is costly, and the costs often do not offset the gains that are due to elevated levels of cooperation. However, the effectiveness of punishment depends not only on how costly it is, but also on the circumstances defining the social dilemma. Using the snowdrift game as the basis, we have conducted a series of economic experiments to determine whether severe punishment is more effective than mild punishment. We have observed that severe punishment is not necessarily more effective, even if the cost of punishment is identical in both cases. The benefits of severe punishment become evident only under extremely adverse conditions, when to cooperate is highly improbable in the absence of sanctions. If cooperation is likely, mild punishment is not less effective and leads to higher average payoffs, and is thus the much preferred alternative. Presented results suggest that the positive effects of punishment stem not only from imposed fines, but may also have a psychological background. Small fines can do wonders in motivating us to chose cooperation over defection, but without the paralyzing effect that may be brought about by large fines. The later should be utilized only when absolutely necessary.


Introduction
Approximately two million years ago some hominids were beginning to evolve larger brains and body size and to mature more slowly than other apes [1].This likely procreated serious challenges in rearing offspring that survived.Faced with such evolutionary pressures, members of the genus Homo begun helping each other, in particularly provisioning for the young of others regardless of kinship [2].Today, we are known as the supercooperators [3], and it is beyond doubt that selfless cooperative behavior between unrelated individuals is one of the key pillars of our remarkable evolutionary success.The temptations to defect, however, have been present in the past as they are now, and we are well aware of the fact that defection may lead to the tragedy of the commons [4].But since we are no longer threatened by other species -in fact, it seems difficult to dispute that the biggest challenges today are of our own production -the primal motive to cooperate is gone.We must rely on our cultural heritage and upbringing as well as between-group conflicts to maintain in-group solidarity [5].
Perhaps not surprisingly, we have come to appreciate actions that may promote cooperation, most notably punishment [6,7], even to the point of institutionalization [8][9][10][11][12][13].The problem is that punishment is costly, and it is far from clear who should be the ones to pay.We can be quick to conclude that obviously it would be on the cooperators to trace down and punish defectors.Yet cooperators already have a personal disadvantage over defectors, and adding yet another could prove too much to bare in a competitive environment.The emergence of second-order free-riding, i.e., contributing to the common pool but not to sanctioning, therefore seems inevitable [14][15][16][17][18][19], and in fact presents the biggest threat to the success of punishment [20][21][22].The hope, or rather the assumption, is that in the long run punishment would pay off, so that the additional investments would be offset by increased levels of cooperation.There exists evidence, both theoretical and experimental, in support of such an assumption [23][24][25][26][27][28][29], but there are also studies asserting that costly punishment is maladaptive [30,31], and that it can be challenged by antisocial punishment [32][33][34][35] as well as reward [36].The stick versus carrot dilemma [37] has recently received ample attention [38][39][40][41][42][43][44][45], and the subject of antisocial punishment has also been contested with loners [46], which were originally studied in [47,48].It is safe to conclude that there are still many open issues that require further research.
Here we investigate the impact of punishment from a somewhat different perspective, namely how humans react when being subject to punishment.When a defector is punished, she essentially has two options on how to proceed.One is to keep defecting in the hope that the withdrawn contribution to the common pool will make up for future sanctions, while second is to decide to cooperate and thus avoid further sanctions.It is a dilemma that is likely to be decided based on the severity of punishment as well as the cost-to-benefit ratio of the game.Punishment can be considered as being effective if a high fraction of punished defectors chooses to cooperate in the next round.To clarify this, we have conducted economic experiments [49] (recent examples of research are [50][51][52][53][54][55]) based on the snowdrift game played in groups [56], where the two main free parameters were the severity of punishment and the cost-to-benefit ratio.In the realm of the game (see Methods for details), the cost-to-benefit ratio determines just how severe the social dilemma is.Low cost-to-benefit ratios constitute lenient conditions for the evolution of cooperation, while high costs and low benefits favor defection.As we will show, the effectiveness of different punishment regimes depends sensitively on the cost-to-benefit ratio.If costs are low, the application of severe punishment is not more effective than mild punishment, yet it does lead to lower overall payoffs and hence is not recommended.Only if costs are high does severe punishment outperform mild punishment in terms of persuading more defectors to adopt cooperation in the next round.We proceed by presenting the main results in support of these conclusions, first by focusing on the outcome of the game in the absence and subsequently in the presence of punishment.

Results
The impact of punishment on the outcome of the game can be understood well only if the same economic experiments are carried out also without the possibility of this action.We therefore first conduct experiments in the absence of punishment to arrive at a baseline scenario, in particular to estimate the general willingness of players to cooperate at different values of the cost-to-benefit ratio.Subsequently, we will use this as a reference point for the snowdrift game with punishment.

The snowdrift game without punishment
To illustrate the snowdrift game, imagine two drivers that are caught in a blizzard and trapped on either side of a snowdrift [57].They can either get out and start shoveling (cooperate) or remain in the car (defect).If both cooperate, they have the benefit b of getting home while sharing the labor c.Thus, R = b − c/2, which indicates the Reward for both cooperators.If both defect, they do not get anywhere and hence incur the punishment P = 0.If only one shovels, however, they both get home but the defector avoids the labor cost and gets the T emptation T = b, whereas the cooperator gets the Sucker ′ s payoff S = b − c.The four payoff values in the snowdrift game rank in order: T > R > S > P , and r = c/(2b − c) illustrates the cost-to-benefit ratio.If we fix R = b − c/2 = 1, then r = c/(2b − c) = c/2, T = 1 + r and S = 1 − r, and the payoff matrix thus becomes: To have a reference point for the actual impact of punishment, as noted, we first study how the frequency of cooperation varies with the cost-to-benefit ratio r.In the T reatment I including eight sessions, cooperators were not allowed to punish defectors, and six different values of cost-to-benefit ratio r were set (four sessions for r = 0.2 and r = 0.8, other four sessions for other values of r).In each session 20 undergraduate students were randomly allocated to four groups of five subjects playing snowdrift games in the Computer Lab for Behavior Games [for further information see the Methods and Fig. S1 (Supplementary Information)].As it is expected, both the level of cooperation f c and the average payoff per period decrease with increasing of r, as demonstrated in Figs. 1 (a) and (b), respectively.These results signal clearly the conflict between the individual and group interests: the best strategy for individuals is to defect if the opponent adopts cooperation.Consequently, the total payoff of the whole group falls gradually by increasing r, which prompts more individuals to choose defection to gain a higher individual payoff.In the end, when the cost-to-benefit ratio r is large, both individual and the group benefits become minimal.The Kruskal-Wallis test for results presented in Fig. 1 (a In addition, we present details of the regression analysis for results presented in Fig. 1 in Tables S1, S2, S3 and S4 (Supplementary Information).This statistical analysis indicates clearly that the cost-to-benefit ratio indeed does have a statistically significant impact on the cooperation level and the average payoff.
In addition to the average values of strategies we also monitored how often players change strategies at different values of r.We found that defectors adhere to defection as we increase r.Cooperators, on the other hand, do not insist on cooperation at high r.Figures 2 (a) and (b) show the percentage of defectors selecting defection and cooperators selecting cooperation in the next round among all the subjects that changed strategies in the next round.We found that the percentage of defectors choosing to stick with defection increases from 13% at r = 0.1 to 72.75% at r = 0.9, while that of cooperators choosing to cooperate anew declines from 53.25% at r = 0.1 to 6.5% at r = 0.9.Furthermore, we have also monitored the fraction of individuals who always choose to cooperate (ALLC) and those who always choose to defect (ALLD) among all the subjects.Interestingly, these values depend sensitively on the cost-to-benefit ratio.As the dotted lines in Fig. 2 show, the percentage of ALLD increases greatly from 0% at r = 0.1 to 25% at r = 0.9, while the percentage of ALLC reduces from 10% at r = 0.1 to 0% suddenly for higher r values.These observations suggest that the subjects are in general "flexible" in responding to the change of external conditions (here determined by the value of r), and are thus well aware and concerned for their individual success.

The snowdrift game with punishment
To investigate the effectiveness of punishment when applying different fines, we focus on r = 0.2 and r = 0.8, because these two values of the cost-to-benefit ratio represent typical conditions constituting low and high costs of cooperation, respectively (see Figs. 1 and 2).For the sake of simplicity, there were two stages making up sessions T reatment II and T reatment III, namely the playing game stage and the peer punishment stage.During the first stage, subjects played the snowdrift game with other group members.Similarly as in T reatment I, there were four groups containing five subjects each.During the second stage, cooperators were given the chance to punish defectors on a peer-to-peer (individual) basis as follows.If there was at least one cooperator who accepted the cost of punishment, then all the payoffs of all defector in the group were reduced by a fine p, and simultaneously the punisher's profit was also reduced by a single unit, which was the costs of punishment.We should stress that the cost of punishment was always constant at different values of fine.Therefore the results we observed primarily focus on the reaction of defectors being punished and not the dilemma of cooperators whether to punish or not.The latter dilemma, however, still exist because those who cooperate but do not punish can be considered as second free-riders.Table 1 illustrates the applied parameter values for T reatment II and T reatment III.
When the cost-to-benefit ratio is low (r = 0.2), the application of larger fines does not yield a more favorable outcome than the application of lower fines.Naturally, the chance to punish defectors will improve the cooperation level but, as Fig. 3 (a) shows, higher fines will not increase f C further.On the other hand, the average payoff of group members will be reduced, especially so for severe punishment, as illustrated in Fig. 3 (c).Beside the Kruskal-Wallis test we have also calculated the 95% confidence intervals to compare the  In agreement with previous observations [30], the uselessness of too hard punishments could be an important message for those who are in a position to establish the means of punishment in our society.Interestingly, the relevance of severe punishment becomes more prominent when the external conditions to cooperate become significantly worse.When the cost-to-benefit ratio is high (r = 0.8), the application of higher fines will gradually elevate the cooperation level beyond what could be achieved by mild punishment.As shown in Fig. 3 (b), f C can be doubled due to severe punishment.The impact on the average payoff is also positive, given that cooperation is virtually absent in the absence of punishment.This is hence a very much desired outcome one would expect from punishment.It is important to emphasize, however, that low fines will still yield a similar outcome as we have observed for r = 0.2.Namely, as Fig. 3 (d) clearly illustrates, the usage of p = 2.0 increases f C , but the average payoff is lower than in the punishment-free case.The corresponding 95% confidence interval is (1.717,2.154)for the punishment-free case and (1.311,1.707)for p = 2.0.
Results presented thus far indicate that the value of fine should be carefully adjusted in agreement with the general conditions that characterize the severity of the social dilemma the players are facing.If the conditions are such that cooperation is likely and viable even in the absence of punishment, then severe sanctioning of defective behavior should be avoided as it leads to lower average payoffs.On the other hand, in strongly defection-prone environments, where cooperators hardly have a chance to survive in the absence of additional regulations, severe punishment appears to be the correct and indeed the only effective means to evoke a change for the better.
To reveal the microscopic details governing the choices during the conducted economic experiments, we have also determined the rate of groups that had different numbers of cooperators during all the considered periods.For example, there were 83 groups with three cooperators in the total of 168 groups for all the periods in the absence of punishment at r = 0.2.Therefore the rate of groups with three cooperators is 83/168 ≈ 0.49.This is the most common formation, as shown in Fig. 4 (a).If punishment was applied at r = 0.2, then the most typical group would contain four cooperators and one defector to form the five group members.Fig. 4 (a) also illustrates that not just the average f C but also the probability distributions of different groups are very similar when we applied different punishment strengths.This explains why a more severe punishment is less recommended in this case: it has no additional impact on the strategy choice of players, and hence it only contributes to additionally reducing the payoffs of defectors.This may have a negative psychological side effect, as it makes it less likely that such "paralyzed" players will attempt reintegration by means of the less profitable cooperative approach.
If cooperation is costly, however, the distribution of strategies within the groups changes significantly, as shown in Fig. 4 (b).Here, the most common group contains only a single cooperator in the absence of punishment.Moreover, there is a significant fraction of groups, about 20% of them, which completely fulfill the makings of the tragedy of the commons as therein everybody chooses to defect.If we apply punishment then the number of cooperators n c increases, and indeed the maximum of the distribution moves towards higher n c .In particular, it is at n c = 2 when p = 2.0 and at n c = 3 for p = 4.0.In agreement with our previous conclusion, here the application of severe punishment will significantly reduce the number of defectors, and this reduction comfortably makes up for the losses in overall payoff that are charged to defectors because of the larger value of p.
To arrive at addressing our principal goal, which regards the effectiveness of punishment, we have also determined the percentage of defectors who select defection and cooperators who choose cooperation in the next round among all players who change strategy.Figures 5 (a) and (b) show these ratios for low and high costs of cooperation, i.e., r = 0.2 and 0.8, respectively.In comparison with the data obtained without punishment (plotted in Fig. 2), we found that for r = 0.2 there is a slight increase in opting to cooperate and decrease in opting to defect, but the value of fine plays a rather insignificant role in mediating this decision.As we have already observed, this changes significantly if a high cost-to-benefit ratio characterizes the snowdrift game.Here the percentage of cooperators staying cooperators increases from 27.0% for p = 2.0 to 49.7% for p = 4.0, and the percentage of defectors deciding to defect again reduces from 50.4% for p = 2.0 to 22.4% for p = 4.0.For a deeper insight we have also calculated how the probability of changing strategy depends on the actions of the others in the group (on the number of cooperative opponents) at different values of r values, as summarized in Fig. S2 (Supplementary Information).This further strengthens the conclusion that the proper impact of punishment on individual decision making might depend sensitively on other elementary circumstances, like in our case, how beneficial it is to defect instead of to cooperate to begin with.
Staying further at the high cost regime, we note that it is difficult to distinguish accurately the motivation of a defector to choose cooperation in the next round, because the fluctuations of a person to choose a different strategy in the next round amount to about 12%, as shown in Fig. S3 (Supplementary Information).We argue that the primary purpose of punishment ought to be to turn defectors into cooperators at the next time of asking.As shown in Fig. S4 (Supplementary Information), the percentage of defectors who choose cooperation in the next round because of being punished in the current round increases from 8% to 11% when the fine is increased.To test this further, we can define the effectiveness of punishment E p as the rate of defectors who choose cooperation in the next round after being punished.As shown in Fig. 6 (a), the effectiveness of punishment using p = 2.0 is equal or even a bit larger than that of p = 4.0 when the cost-to-benefit ratio is low (r = 0.2).Since here two rather than three setups are tested against statistical relevance, we apply the t-test, which yields P > 0.05.This indicates that, for r = 0.2, there are indeed no statistically relevant differences between the effectiveness of punishment with p = 2.0 and p = 4.0.The difference at r = 0.8, depicted in Fig. 6 (b), is rather more spectacular.There the higher fine is much more effective in converting defectors to cooperators, and indeed it corroborates the necessity of severe punishment in adverse environments.Here the t-test yields P < 0.0001, clearly confirming statistically relevant differences between the two punishment modes at r = 0.8.

Discussion
We have conducted economic experiments centered around the snowdrift game played in groups of five, with the aim of determining the effectiveness of severe and mild punishment to persuade defectors to choose cooperation in the next round of the game.With the assumption that the propensity of the environment itself to promote or deter cooperation likely plays an important role, we have tested the impact of severe and mild punishment under a cooperation-prone and under a defection-prone setup of the snowdrift game.We have observed that benefits of severe punishment emerge only under adverse conditions, when to cooperate is highly unlikely in the absence of sanctions.If the conditions are favorable or at least not unfavorable, mild punishment is not less effective.In particular, if cooperation is likely, mild punishment is just as effective as severe punishment in persuading defectors to choose cooperation.But since the fines imposed by severe punishment are higher, the overall welfare is lower than by mild punishment.Severe punishment fails to offset the imposed fines and costs associated with its execution, and so the players would be better of without it.Importantly, this holds even under the lenient assumption that the cost of punishment is independent of the severity of punishment.If the costs would scale with the imposed fines, the effectiveness of severe punishment would be even worse.However, if the conditions for cooperation are unfavorable, then only severe punishment is able to revert the players from defecting, and it is also then that it has a positive impact on the average payoff and is in fact sustainable.
The presented results indicate that it is far from obvious to know how large fines should be applied to elevate the overall welfare, even if the costs do not scale with the imposed penalties.Contrary to what could be assumed, even if we have the means to punish hard, doing so is likely not an optimal decision.It can be a viable one if the conditions are really adverse and unfavorable for the evolution of cooperation.In general, however, mild punishment is not less effective as severe punishment, with the added benefit that the imposed fines make it easier for the punished individuals to reintegrate into the society.In view of these observations, we conclude that the positive effects of punishment stem not only from the imposed fines, but may also have a psychological background.Small fines work just as well as high fines in motivating us to chose cooperation over defection.Punishing excessively hard seldom has additional benefits, but it does have the potential to disable the punished individual, and it also decreases the overall welfare more than punishing mild or moderately.Neither of these two side effects is desirable, and thus we conclude that severe punishment should be utilized only when absolutely necessary.It seems less harm can be done by adopting mild punishment and risking a few more persistent defectors, then it is to endorse severe punishment in the name of total cooperation.

Methods
A total of 320 undergraduate students (45% females, 20.3 years old on average) from Wenzhou University participated in repeated snowdrift games taking place in groups of five at the Computer Lab of Behavior Games.Students that participated did so by answering a public call that was issued by the Computer Lab of Behavior Games of Wenzhou University.The ethics committee of the Wenzhou University approved the public call and the experiments.All the participants provided their written informed consent to participate in the study.Prior to participation, they have also learned the rules of the game and subsequently demonstrated their understanding in a short test.
The 20 subjects in a session were allocated anonymously to four groups consisting of five subjects each by means of the z-Tree software [58].Subsequently, subjects played the snowdrift game with all the members in the same group.Since participants were freshmen and sophomore students with different major fields, coming from different departments, they were unlikely to know each other.In addition, subjects were not allowed to participate in more than a single session of the experiment.A total of sixteen sessions were conducted from May to December 2012.Three different treatments were conducted.Namely, experiments without punishment for different cost-to-benefit ratios r (T reatment I), experiments with punishment with penalty p = 2.0 for r = 0.2 and r = 0.8 (T reatment II), and experiments with punishment with penalty p = 4.0 for r = 0.2 and r = 0.8 (T reatment III).Each subject played 25 periods during approximately 60 min, and earned 64.8 RMB (the Chinese unit of currency) on average, which amounts to approximately 10.3 US dollars.
At the beginning of each experiment, subjects read written instructions that explained the payoff matrix and the rules of the game.To avoid misunderstanding the instructions, subjects were asked to calculate their own payoff for several examples, and they had to arrive at the correct numbers in order to be allowed participation.After the experiment started, participants marked their decisions on a computer screen using the experimental software z-Tree.In every period, subjects were informed of their own decision and their monetary payoff on the computer screen.Cooperators were allowed to punish defectors during the punishment stage of T reatment II and T reatment III.Each subject's final score was summed over all periods, and subjects earned an income proportional to their final score (1 RMB for each score point).c) show the frequency of cooperation and the average payoff per period in the absence of punishment, for punishment with p = 2.0, and for punishment with p = 4.0, as obtained when the cost-to-benefit ratio is r = 0.2 (low).Panels (b) and (d) show the frequency of cooperation and the average payoff per period in the absence of punishment, for punishment with p = 2.0, and for punishment with p = 4.0, as obtained when the cost-to-benefit ratio is r = 0.8 (high).Only if cooperation is very unlikely in the absence of sanctions does severe punishment reveal its advantages.The whiskers in panels (a) and (b) show the 95% confidence intervals for the frequency of cooperation, while the whiskers in panels (c) and (d) show the same for the average payoffs.   .If to cooperate is a very difficult proposition, severe punishment is more likely to divert from defection and perpetuate cooperation than mild punishment.Depicted is the statistics on when cooperators continue to cooperate (red) and defectors continue to defect (green) under mild (p = 2.0) and severe (p = 4.0) punishment.In panel (a), for r = 0.2, mild punishment is just as effective as severe punishment in maintaining the strategy choices.In panel (b), for r = 0.8, mild punishment is less effective.If punishment is severe, cooperators are more likely to continue cooperating (red), while defectors are less likely to continue defecting (green) than if punishment is mild.If cooperation is likely the effectiveness of mild punishment is just as high as the effectiveness of severe punishment.Panels (a) and (b) present results obtained for r = 0.2 and r = 0.8, respectively.It can be observed that for r = 0.2, when the likelihood of cooperation is high even in the absence of sanctioning (see Fig. 1), mild punishment is just as effective as severe punishment.Conversely, for r = 0.8 severe punishment leads to a higher percentage of defectors that after being punished choose to cooperate (E p ) than mild punishment.The whiskers in panels (a) and (b) show the 95% confidence intervals for the effectiveness of punishment E p .

Figure 1 .Figure 2 .Figure 3 .
Figure1.Higher cost-to-benefit ratios in the snowdrift game lead to lower levels of cooperation.Depicted are results of an economic experiment, as obtained in the absence of punishment.Panels (a) and (b) show the frequency of cooperation f c and the average payoff per period in dependence on the cost-to-benefit ratio r, respectively.The whiskers in panels (a) and (b) show the 95% confidence intervals for the frequency of cooperation and for the average payoffs, respectively.

Figure 4 .
Figure 4. Distribution of strategies within groups depends not just on the severity of punishment, but also on the severity of the social dilemma.Panels (a) and (b) depict the rate of groups having n c cooperators, as obtained for T reatment I (without punishment), T reatment II (punishment with p = 2.0), and T reatment III (punishment with p = 4.0), at r = 0.2 and r = 0.8, respectively.Only if r = 0.8 is severe punishment more effective.If cooperation is likely (r = 0.2) mild punishment is at least just as effective in sustaining highly cooperative groups.

Figure 5
Figure 5.If to cooperate is a very difficult proposition, severe punishment is more likely to divert from defection and perpetuate cooperation than mild punishment.Depicted is the statistics on when cooperators continue to cooperate (red) and defectors continue to defect (green) under mild (p = 2.0) and severe (p = 4.0) punishment.In panel (a), for r = 0.2, mild punishment is just as effective as severe punishment in maintaining the strategy choices.In panel (b), for r = 0.8, mild punishment is less effective.If punishment is severe, cooperators are more likely to continue cooperating (red), while defectors are less likely to continue defecting (green) than if punishment is mild.

Figure 6 .
Figure 6.If cooperation is likely the effectiveness of mild punishment is just as high as the effectiveness of severe punishment.Panels (a) and (b) present results obtained for r = 0.2 and r = 0.8, respectively.It can be observed that for r = 0.2, when the likelihood of cooperation is high even in the absence of sanctioning (see Fig.1), mild punishment is just as effective as severe punishment.Conversely, for r = 0.8 severe punishment leads to a higher percentage of defectors that after being punished choose to cooperate (E p ) than mild punishment.The whiskers in panels (a) and (b) show the 95% confidence intervals for the effectiveness of punishment E p .

Table 1 .
Game parameters employed during Treatment II and Treatment III in the snowdrift game with punishment.T reatment II corresponds to mild punishment because the fines (p) for defection are low, while T reatment III corresponds to severe punishment because of the application of comparatively high penalties.