Group Size Effect on Cooperation in One-Shot Social Dilemmas II: Curvilinear Effect

In a world in which many pressing global issues require large scale cooperation, understanding the group size effect on cooperative behavior is a topic of central importance. Yet, the nature of this effect remains largely unknown, with lab experiments insisting that it is either positive or negative or null, and field experiments suggesting that it is instead curvilinear. Here we shed light on this apparent contradiction by considering a novel class of public goods games inspired to the realistic scenario in which the natural output limits of the public good imply that the benefit of cooperation increases fast for early contributions and then decelerates. We report on a large lab experiment providing evidence that, in this case, group size has a curvilinear effect on cooperation, according to which intermediate-size groups cooperate more than smaller groups and more than larger groups. In doing so, our findings help fill the gap between lab experiments and field experiments and suggest concrete ways to promote large scale cooperation among people.


Introduction
Cooperation has played a fundamental role in the early evolution of our societies [1,2] and continues playing a major role still nowadays. From the individual level, where we cooperate with our romantic partner, friends, and co-workers in order to handle our individual problems, up to the global level where countries cooperate with other countries in order to handle global problems, our entire life is based on cooperation.
Since the resolution of many pressing global issues, such as global climate change and depletion of natural resources, requires cooperation among many actors, one of the most relevant questions about cooperation regards the effect of the size of the group on cooperative behavior. Indeed, since the influential work by Olson [33], scholars have recognized that the size of a group can have an effect on cooperative decision-making. However, the nature of this effect remains one of the most mysterious areas in the literature, with some scholars arguing that it is negative [33][34][35][36][37][38][39][40], others that it is positive [41][42][43][44][45][46][47], and yet others that it is ambiguous [48][49][50][51] or non-significant [52][53][54]. Interestingly, the majority of field experiments seem to agree on yet another possibility, that is, that group size has a curvilinear effect on cooperative behavior, according to which intermediate-size groups cooperate more than smaller groups and more than larger groups [55][56][57][58][59]. The emergence of a curvilinear effect of the group size on cooperation in real life situations is also supported by data concerning academic research, which in fact support the hypothesis that research quality of a research group is optimized for medium-sized groups [60][61][62].
Here we aim at shedding light on this debate, by providing evidence that a single parameter can be responsible for all the different and apparently contradictory effects that have been reported in the literature. Specifically, we show that the effect of the size of the group on cooperative decision-making depends critically on a parameter taking into account different ways in which the notion of cooperation itself can be defined when there are more than two agents.
Indeed, while in case of only two agents a cooperator can be simply defined as a person willing to pay a cost c to give a greater benefit b to the other person [6], the same definition, when transferred to situations where there are more than two agents, is subject to multiple interpretations. If cooperation, from the point of view of the cooperator, means paying a cost c to create a benefit b, what does it mean from the point of view of the other players? Does b get earned by each of the other players or does it get shared among all other players, or none of them? In other words, what is the marginal return for cooperation?
Of course, there is no general answer and, in fact, previous studies have considered different possibilities. For instance, in the standard Public Goods game it is assumed that b gets earned by each player (including the cooperator); instead, in the N-person Prisoner's dilemma (as defined in [63]) it is assumed that b gets shared among all players; yet, the Volunteer's dilemma [64] and its variants using critical mass [65] rest somehow in between: one or more cooperators are needed to generate a benefit that gets earned by each player, but, after the critical mass is reached, new cooperators do not generate any more benefit; finally, it has been pointed out [66,67] that a number of realistic situations can be characterized by a marginal return which increases linearly for early contributions and then decelerates, reflecting the natural decrease of marginal returns that occurs when output limits are approached.
In order to take into account this variety of possibilities, we consider a class of social dilemmas parametrized by a function β = β(Γ, N) describing the marginal return for cooperation when Γ people cooperate in a group of size N. More precisely, our general Public Goods game is the N-person game in which N people have to simultaneously decide whether to cooperate (C) or defect (D). In presence of a total of Γ cooperators, the payoff of a cooperator is defined as β (Γ, N) − c (c > 0 represents the cost of cooperation) and the payoff of a defector is defined as β (Γ, N). In order to have a social dilemma (i.e., a tension between individual benefit and the benefit of the group as a whole) we require that: • Full cooperation pays more than full defection, that is, β(N, N) − c > β(0, N), for all N; • Defecting is individually optimal, regardless of the number of cooperators, that is, for all Γ < N, The aim of this paper is to provide further evidence that the function β might be responsible for the confusion in the literature about group size effect on cooperation. In particular, we focus on the situation, inspired from realistic scenarios, in which the natural output limits of the public good imply that β(Γ, N) increases fast for small Γ's and then stabilizes.
Indeed, in our previous work [63], we have shown that the size of the group has a positive effect on cooperation in the standard Public Goods game and has a negative effect on cooperation in the N-person Prisoner's dilemma. A reinterpretation of these results is that, if β(N, N) increases linearly with N (standard Public Goods game), then the size of the group has a positive effect on cooperation; and, if β(N, N) is constant with N (N-person Prisoner's dilemma), then the size of the group has a negative effect on cooperation. This reinterpretation suggests that, in the more realistic situations in which the benefit for full cooperation increases fast for early contributions and then decelerates once the output limits of the public good are approached, we may observe a curvilinear effect of the group size, according to which intermediate-size groups cooperate more than smaller groups and more than larger groups.
To test this hypothesis, we have conducted a lab experiment using a general public goods game with a piecewise function β, which increases linearly up to a certain number of cooperators, after which it remains constant. While it is likely that realistic scenarios would be better described by a smoother function, this is a good approximation of all those situations in which the natural output limits of a public good imply that the increase in the marginal return for cooperation tends to zero as the number of contributors grows very large. The upside of choosing a piecewise function β is that, in this way, we could present the instructions of the experiment in a very simple way, thus minimizing random noise due to participants not understanding the decision problem at hand (see Method).
Our results support indeed the hypothesis of a curvilinear effect of the size of the group on cooperative decision-making. Taken together with our previous work [63], our findings thus (i) shed light on the confusion regarding the group size effect on cooperation, by pointing out that different values of a single parameter might give rise to qualitatively different group size effects, including positive, negative, and even curvilinear; and (ii) they help fill the gap between lab experiments and field experiments. Indeed, while lab experiments use either the standard Public Goods game or the N-person Prisoner's dilemma, real public goods game are mostly characterized by a marginal return of cooperation that increases fast for early contributions and then approaches a constant function as the number of cooperators grows very large-and our results provide evidence that these three situations give rise to three different group size effects.

Method
We have recruited participants through the online labour market Amazon Mechanical Turk (AMT) [68][69][70]. After entering their TurkID, participants were directed to the following instruction screen.
Welcome to this HIT. This HIT will take about 5 minutes and you will earn 20c for participating. This HIT consists of a decision problem followed by a few demographic questions. You can earn an additional bonus depending on the decisions that you and the participants in your cohort will make. We will tell you the exact number of participants in your cohort later. Each one of you will have to decide to join either Group A or Group B. Your bonus depends on the group you decide to join and on the size of the two groups, A and B, as follows: • If the size of Group A is 0 (that is, everybody chooses to join Group B), then everybody gets 10c • If the size of Group A is 1, then the person in Group A gets 5c and each person in Group B gets 15c • If the size of Group A is 2, then each person in Group A gets 10c and each person in Group B gets 20c • If the size of Group A is 3, then each person in Group A gets 15c and each person in Group B gets 25c • If the size of Group A is 4, then each person in Group A gets 20c and each person in Group B gets 30c • And so on, up to 10: If the size of Group A is 10, then each person in Group A gets 50c and each person in Group B gets 60c • However, if the size of Group A is larger than 10, then, independently of the size of the two groups, each person in group A will still get 50c and each person in group B will still get 60c.
After reading the instructions, participants were randomly assigned to one of 12 conditions, differing only on the size of the cohort (N = 3, 5, 10,15,20,25,30,40,50,60,80,100). For instance, the decision screen for the participants in the condition where the size of the cohort is 3 was: You are part of a cohort of 3 participants. Which group do you want to join?
By using appropriate buttons, participants could select either Group A or Group B. We opted for not asking any comprehension questions. We made this choice for two reasons. First, with the current design, it is impossible to ask general comprehension questions such as "what is the strategy that benefits the group as a whole", since this strategy depends on the strategy played by the other players. Second, we did not want to ask particular questions about the payoff structure since this may anchor the participants' reasoning on the examples presented. Of course, a downside of our choice is that we could not avoid random noise. However, as it will be discussed in the Results section, random noise cannot be responsible for our findings. Instead, our results would have been even cleaner, if we had not had random noise, since the initial increase of cooperation and its subsequent decline would have been more pronounced (see Results section for more details).
After making their decisions, participants were asked to fill a standard demographic questionnaire (in which we asked for their age, gender, and level of education), after which they received the "survey code" needed to claim their payment. After collecting all the results, bonuses were computed and paid on top of the participation fee, that was $0.20. In case the number of participants in a particular condition was not divisible by the size of the cohort (it is virtually impossible, in AMT experiments, to decide the exact number of participants playing a particular condition), in order to compute the bonus of the remaining people we formed an additional cohort where these people where grouped with a random choice of people for which the bonus had been already computed. Additionally, we anticipate that only 98 subjects participated in the condition with N = 100. This does not generate deception in the computation of the bonuses since the payoff structure of the game does not depend on N (as long as N > 10). As a consequence of these observations, no deception was used in our experiment.
According to the Dutch legislation, this is a non-WMO study, that is (i) it does not involve medical research and (ii) participants are not asked to follow rules of behavior. See http://www. ccmo.nl/attachments/files/wmo-engelse-vertaling-29-7-2013-afkomstig-van-vws.pdf, Section 1, Article 1b, for an English translation of the Medical Research Act. Thus (see http://www. ccmo.nl/en/non-wmo-research) the only legislations which apply are the Agreement on Medical Treatment Act, from the Dutch Civil Code (Book 7, title 7, section 5), and the Personal Data Protection Act (a link to which can be found in the previous webpage). The current study conforms to both. In particular, anonymity was preserved because AMT "requesters" (i.e., the experimenters) have access only to the so-called TurkID of a participant, an anonymous ID that AMT assigns to a subject when he or she registers to AMT. Additionally, as demographic questions we only asked for age, gender, and level of education.

Results
A total of 1.195 distinct subjects located in the US participated in our experiment. Distinct subjects means that, in case two or more subjects were characterized by either the same TurkID or the same IP address, we kept only the first decision made by the corresponding participant and eliminated the rest. These multiple identities represent usually a minor problem in AMT experiments (only 2% of the participants in the current dataset). Participants were distributed across conditions as follows:  group and the rate of cooperation is not quadratic: while the initial increase of cooperation is relatively fast, the subsequent decrease of cooperation seems extremely slow. This is confirmed by linear regression predicting rate of cooperation as a function of N and N 2 , which shows that neither the coefficient of N nor that of N 2 are significant (p = 0.4692, p = 0.2003, resp.). For this reason we use a more flexible econometric model than the quadratic model, consisting of two linear regressions, one with a positive slope (for small N's) and the other one with a negative slope (for large N's). As a switching point, we use the N = 15, corresponding to the size of the group which reached maximum cooperation. Doing so, we find that both the initial increase of cooperation and its subsequent decline are highly significant (from N = 3 to N = 15: coeff = 0.0187553, p = 0.00042; from N = 15 to N = 100: coeff = −0.00177618, p = 0.00390).
We conclude by observing that not only random noise cannot explain our results, but, without random noise, the effect would have been even stronger. Indeed, first we observe that there is no a priori worry that random noise would interact with any condition and so we can assume that it is randomly distributed across conditions. Then we observe that subtracting a binary distribution with average 0.5 from a binary distribution with average μ > 0.5, one would obtain a distribution with average μ 0 > μ. Similarly, subtracting a binary distribution with average 0.5 from a binary distribution with average μ < 0.5 one would obtain a distribution with average μ 0 < μ. Thus, if the μ's are the averages that we have found (containing random noise) and the μ 0 's are the true averages (without random noise), the previous inequalities allow us to conclude that the initial increase of cooperation and its following decrease would have been stronger in absence of random noise.

Discussion
Here we have reported on a lab experiment providing evidence that the size of a group can have a curvilinear effect on cooperation in one-shot social dilemmas, with intermediate-size groups cooperating more than smaller groups and more than larger groups. Joining the current results with those of a previously published study of us [63], we can conclude that group size can have qualitatively different effects on cooperation, ranging from positive, to negative and curvilinear, depending on the particular decision problem at hand. Interestingly, our findings suggest that different group size effects might be ultimately due to different values of a single parameter, the number β (N, N), describing the benefit for full cooperation. If β(N, N) is constant in N, then group size has a negative effect on cooperation; if β(N, N) increases linearly with N, then group size has a positive effect on cooperation; in the middle, all sorts of things may a priori happen. In particular, in the realistic situation in which β(N, N) is a piecewise function that increases linearly with N up to a certain N 0 and then remains constant, then group size has a curvilinear effect, according to which intermediate-size groups cooperate more than smaller groups and more than larger groups. See Table 1.
To the best of our knowledge, ours is the first study reporting a curvilinear effect of the group size on cooperation in an experiment conducted in the ideal setting of a lab, in which confounding factors are minimized. Previous studies reporting a qualitatively similar effect Table 1. Summary of the different group size effects on cooperation depending on how the benefit for full cooperation varies as a function of the group size.
shape of β (N, N) group size effect on cooperation paper linear positive Barcelo & Capraro (2015) constant negative Barcelo & Capraro (2015) linear-then-constant curvilinear this paper doi:10.1371/journal.pone.0131419.t001 [55][56][57][58] used field experiments, in which it is difficult to isolate the effect of the group size from possibly confounding effects. In our case, the only possibly confounding factor is random noise due to a proportion of people that may have not understood the rules of the decision problem. As we have shown, our results cannot be driven by random noise and, in fact, the curvilinear effect would have been even stronger, without random noise. Moreover, since our experimental design was inspired by a tentative to mimic all those real public goods games in which the natural output limits of the public good imply that the increase of the marginal return for cooperation, when the number of cooperators diverges, tends to zero, our results might explain the apparent contradiction that field experiments tend to converge on the fact that the effect of the group size is curvilinear, while lab experiments tend to converge on either of the two linear effects. Our contribution is also conceptual, since we have provided evidence that a single parameter might be responsible for different group size effects: the parameter β (N, N), describing the way the benefit for full cooperation varies as a function of the size of the group. Of course, we do not pretend to say that this is the only ultimate explanation of why different group size effects have been reported in experimental studies. In particular, in real-life situations, which are typically repeated and in which communication among players is allowed, other factors, such as within-group enforcement, may favor the emergence of a curvilinear effect of the group size on cooperation, as highlighted in [58]. If anything, our results provide evidence that the curvilinear effect on cooperation goes beyond contingent factors and can be found also in the ideal setting of a lab experiment using one-shot anonymous games. We believe that this is a relevant contribution in light of possible applications of our work. Indeed, the difference between β(N, N) and the total cost of full cooperation cN can be interpreted has the incentive that an institution needs to pay to the contributors in order to make them cooperate. Since institutions are interested in minimizing their costs and, at the same time, maximizing the number of cooperators, it is crucial to understand what is the "lowest" β such that the resulting effect of the group size on cooperation is positive. This seems to be an non-trivial question. For instance, does bðG; NÞ ¼ G N log 2 ðN þ 1Þ give rise to a positive effect or is it still curvilinear or even negative? The technical difficulty here is that it is hard to design an experiment to test people's behavior in these situations, since one cannot expect that an average person would understand the rules of the game when presented using a logarithmic functions.
In terms of economic models, our results are consistent with utilitarian models such as the Charness & Rabin model [71] and the novel cooperative equilibrium model [10,63,72]. Both these models indeed predict that, in our experiment, cooperation initially (i.e., for N 10) increases with N (see [63] for the details), and then starts decreasing. This behavioral transition follows from the simple observation that free riding when there are more than 10 cooperators costs zero to each of the other players and benefits the free-rider. Thus, cooperation in larger groups is not supported by utilitarian models, which then predict a decrease in cooperative behavior whose speed depends on the particular parameters of the model, such as the extent to which people care about the group payoff versus their individual payoff, and people's beliefs about the behavior of the other players. Thus our results add to the growing body of literature showing that utilitarian models are qualitatively good descriptors of cooperative behavior in social dilemmas.
However, we note that while theoretical models predict that the rate of cooperation should start decreasing at N = 10, our results show that the rate of cooperation for N = 15 is marginally significantly higher than the rate of cooperation for N = 10 (Rank sum, p = 0.0588). Although ours is a between-subjects experiment, this finding seems to hint at the fact that there is a proportion of subjects who would defect for N = 10 and cooperate for N = 15. This is not easy to explain: why should a subject cooperate with N = 15 and defect with N = 10? One possibility is that there is a proportion of "inverse conditional cooperators", who cooperate only if a small percentage of people cooperate: if these subjects believe that the rate of cooperation decreases quickly after N = 10, they would be more motivated to cooperate for N = 15 than for N = 10. Another possibility, of course, is that this discrepancy is just a false positive. In any case, unfortunately our experiment is not powerful enough to detect the reason of this discrepancy between theoretical predictions and experimental results and thus we leave this interesting question for future research.