Strike one hundred to educate one: Measuring the efficacy of collective sanctions experimentally

In this paper, we test whether sanctions applied to an entire group on account of the free-riding of one of its members can promote group cooperation. To measure the efficiency of such collective sanctions, we conducted a lab experiment based on a standard public good game. The results show that, overall, collective sanctions are ineffective. Moreover, when subjects are able to punish their peers, the level of cooperation is lower in the regime of collective sanctions than under individual sanctions. Both outcomes can be explained by a general disapproval of the collective responsibility for an individual fault: in the post-experimental survey, an absolute majority evaluated such regimes as unfair. While collective sanctions are not an effective means for boosting group compliance, there are nevertheless two insights to be gained here. First, there are differences across genders: under collective sanctions, men’s level of compliance is significantly higher than under individual sanctions, while the opposite is true for women. Second, there were intriguing differences in outcomes between the different regime types. Under collective sanctions, a person who is caught tends to comply in the future, at least in the short term. By contrast, under individual sanctions, an individual wrongdoer decreases his or her level of compliance in the next period.

In accordance with the submission guidelines of PLOS ONE I also changed "female" or "male" to "woman" or "man" where it was used as a noun everywhere in the text.

Reviewer #1. Major issues:
===== 1) In line 175, the experiment consisted of 15 periods. Here, do the game participants know when the game will end? Thanks! The question how much did participant know was also raised by a Reviewer #2. Yes, that's why we observe a end-game effect. Based on both your and Reviewer #2 comments I described what was known to the participants before the game in lines 197-205.

=====
2) The explanation of Figure 1 is unclear, including two "Yes" and "N0". Thank you for point this. Figure 1 is re-done to include all the relevant information. ===== 3) In lines 314 -323, the descriptions of the results presented in Figure 2 are inaccurate. "Without peer sanctions, cooperation began to decline after the 5th or 6th round to contributions of 5 or 6 tokens out of 20. With peer sanctions, the average contributions remained relatively stable at about half of the endowment (10-12 tokens) until the 15th (and the last) round" From periods 6-11, I can still find that the contribution level is above 6. The description should be more accurate.
Thanks again for noticing this. I slightly re-did Figure 2 (without changing its content) to make the difference between treatments more visible. I also corrected the text based on your comments (lines 352-357). ===== 4) In the individual sanctions regime, the contributions would be checked. Will this check generate an observation cost? I entirely agree that I mentioned the 'informational dimension' of collective sanctions but later on do not use this factor in the experimental design, so the check did not generate an observation cost. That was done mostly for the sake of simplicity and based on your comment I provide the explanation why I did it this way in lines 249-255, 'Experimental design' section.
===== 5) In the process of experiment, the individuals participating in the game have great heterogeneity, such as education level, culture, major, age. Why does the author only explore the influence of gender on the results? Are other variables controlled? That is a very important point, which had not been covered in the original version of the text. This 'gender' effect was controlled for income and age, but since the difference between genders was not the initial focus of this study, of course more rigorous controls are needed. I mention this when I describe the limitations of this study in the 'Discussion' section (lines 498-515).
===== 6) In the section "Theoretical arguments for collective sanctions", the author should clarify the difference between the collective sanctions mentioned in this manuscript and the costly punishment in previous works, such as Emergence of social punishment and cooperation through prior commitments. In AAAI, pp. 2494-2500.
Yes, I totally agree. I added the paragraph with a current state of the art on sanctions in social dilemmas to a 'Theoretical arguments' section (lines 60-73), including among others the referred paper.

Reviewer #1. Minor issues
(1) It is better to use declarative sentence instead of interrogative sentence in the title of the manuscript. That is an excellent suggestion. The title was changed from "Strike one hundred to educate one: can collective sanctions be efficient?" to a new one "Strike one hundred to educate one: measuring the efficacy of collective sanctions experimentally"

=====
(  (Capraro & Barcelo, 2015). This seems relevant and should probably be discussed. Thank you! I decided to elaborate on this in the 'Hypotheses' section (lines 146-158) where I provide a compact review of the literature considering group size effect on cooperation, including the papers suggested above.

========
- Table 2. Please eliminate the word "tab" from the description of the table. This error is corrected in a revised version, thank you for noticing this! ======== - Table 3. I think you want to say "lower bound" and not "lower boundary". Moreover, you have to tell the confidence interval. In general, lower bound does not make any sense in this context.
Thank you for noticing that, that was corrected in a revised version: I replaced this info with 95% confidence intervals.
======== - Figure 1. What does "intsanction" mean? Note that figures should be as self-explanatory as possible, to help the reader to understand the key point of the paper without necessarily read all the details. Yes, I entirely agree, this was also noted by Reviewer #1. Based on your and the other reviewer's suggestions, I re-formatted Figure 1 entirely to include all the relevant information.
This unclarity is now corrected (line 350) ======== -Line 332. The trimodal distribution was already observed by Capraro, Jordan and Rand (2014). Please discuss the relationship between your paper and theirs. Note that Capraro et al. observed a trimodal distribution in a standard PGG (and argue that participants follow a "give half heuristic". In any case, the fact that they observe a trimodal distribution in the standard PGG implies that your interpretation that this trimodal distribution is due to the threshold is probably wrong. Thank you for this comment -that is my fault that I somehow overlooked this paper (Capraro, Jordan and Rand, 2014) which of course extremely important if I'd like to draw any conclusions from an observed trimodal distribution. I included the reference to this paper, and described the limitations of my analysis based on this in lines 369-376.

=========
-Gender differences. You should discuss the relationship between your result and those of Rand (2017) and Balliet et al, who found gender differences in cooperation in the standard PGG. In a new version of 'Discussion' section, lines 499-515 I provide a quick overview of the current findings in gender differences in social dilemmas, including standard PGGs, and I also include among other papers a paper by Rand and a meta-review of Balliet et al.

=========
-The discussion should be largely rewritten. One of the goals of the discussion section is to compare the current work with previous works. The current discussion has only one reference, so it dramatically fails to make this comparison. In general, I think that this paper largely fails in relating its results with previous work. Another goal of the discussion is to list limitations of the work. The current discussion does not list any limitation. But every experimental work has limitations! Thank for this comment! Following your suggestion, I entirely re-wrote the entire 'Discussion' section. Now it roughly consists of two parts. In the beginning of this section I refer to other similar studies in this field comparing my design with these studies. In the second part I briefly delineate the most important limitations.