Stimulating Contributions to Public Goods through Information Feedback: Some Experimental Results

In traditional public good experiments participants receive an endowment from the experimenter that can be invested in a public good or kept in a private account. In this paper we present an experimental environment where participants can invest time during five days to contribute to a public good. Participants can make contributions to a linear public good by logging into a web application and performing virtual actions. We compared four treatments, with different group sizes and information of (relative) performance of other groups. We find that information feedback about performance of other groups has a small positive effect if we control for various attributes of the groups. Moreover, we find a significant effect of the contributions of others in the group in the previous day on the number of points earned in the current day. Our results confirm that people participate more when participants in their group participate more, and are influenced by information about the relative performance of other groups.


Introduction
There is a substantial understanding of the conditions that lead to successful governance of the commons by small groups such as communities [1]. Studies in small-scale communities and in controlled experiments [2] show that the strength of groups in overcoming collective action problems lies in whether or not participants can communicate, whether they have input in the creation of the rules, whether there is group homogeneity, and whether institutional arrangements are monitored and enforced. In small-scale communities, participants have relatively low costs in deriving information to determine the trustworthiness of others. Small communities are also characterized by low participant costs for monitoring others' behavior, as well as low costs for face-to-face meetings. The low costs of monitoring behavior and conducting faceto-face meetings are not generally possible at a large scale. Identifying how to scale up the insights that lead to success at the community level to larger scale collective action problems around 50%, which then declines over subsequent rounds [18]. In recent work, researchers have run web-based experiments using Amazon's Mechanical Turk [19][20][21]. In these experiments participants must all log in at the same time to participate for an hour, similar to traditional one-hour experiments run in economics labs.
We are developing a mobile app to perform experiments on the role of social information on physical actions to improve sustainability outcomes. With this mobile app physical actions can be verified via photos or location information. At this moment a reliable implementation of verification of physical actions is not finished, and therefore we will have participants performing virtual actions to test initial hypotheses. Hence the reward for the participants depends only on their own efforts and the efforts of their group members to login to the web-based environment at the right time. We used student subjects at our university campus for logistical reasons as we aim to continue those experiments with verification of physical actions. For such verification we would initially work at our university campuses. For example, we will partner with local initiatives that could facilitate the verification process (scanning a barcode when a participant returns batteries). We recognize that our design has limitations by using student participants and virtual actions. Therefore the results will not be representative for the general population. But a study like this is needed in the pathway towards a more comprehensive design.
Although we are developing specialized experiment infrastructure, we also recognize the value of existing websites that promote large-scale behavioral change. We are also studying collective action in online communities to complement our controlled experiments [22]. This online community research has shown that there are many challenges in capturing participant attention and retention and that successful communities often have a portfolio of tasks to entice active and sustained participation.
In this paper, we report on the first of a series of experiments with our infrastructure. In traditional one-hour experiments participants receive an endowment that they can invest in the public good or keep for themselves. In our experiment, participants do not receive an initial endowment. Instead, they must invest a small amount of their time repeatedly throughout a period of five days to contribute to the public good by logging into the website on their laptop or smart phone, click on the virtual activities that are available to them at that time. In this sense, a participant's time is their natural endowment and participation in the experiment competes with the participant's other activities they are involved in. Since participants have to take time each day to make a contribution to the public good they have to remain motivated to participate.
Our experimental goal for this paper is to replicate a finding from a traditional public good experiment. [23] found that if groups receive information that compares their performance to other groups-where the information does not affect their material rewards-the comparative information leads to an initial increase in contributions. Over the long term, the benefit from comparative information disappears. In our experiment we will test the effects of information feedback on the level of participation.
We will now discuss the experimental design, before we discuss the results. The paper will close with a discussion of the possibilities of web-based experiments to test collective action for groups and social networks.

Experimental Design
The experiment is based on a linear public good problem [17]. The theoretical formulation is as follows: The monetary reward is linear to the contributions of all N individuals of the group. As mentioned earlier, participants do not receive a monetary endowment, but invest their time. Participants may invest an amount equal to x minutes to the public good, and may receive y dollars dependent on the points collected on average by all group members. The value of time might not be the same for every participant. Hence there will be a natural heterogeneity among the participants in evaluating the time commitment with respect to the expected rewards. Note that in traditional experiments there is also heterogeneity with respect to how participants value a dollar from the experiment. Since the experiment is run over several days, we expect that participation over time will be adjusted based on the expected rewards.
Our experiment is framed as a carbon footprint reduction game where participants can perform virtual actions representing sustainable alternatives to common activities during a 5-day period. These sustainable alternatives are only available at certain time intervals throughout the day, coinciding loosely with when those activities are available (e.g., carpooling is available between 8-10 AM and 4-6 PM, local Phoenix time). Participants login to a website where they can view an update of their group's progress (Fig 1) and the activities that are currently available (Fig 2). Each activity performed-by clicking on a button on the website-generates points for the group if the participant clicks on the button when the activity is available. This version of the experiment only requires participants to click on the actions. They do not have to actually perform those actions in real life. In essence, this public good game tests whether enough participants within a group will log in to the website at the right time to click on the Perform button for an available activity, contributing their time to the public good.
The reason we frame this public good experiment as a carbon footprint reduction game is to make it less abstract and more compelling for participants. We also found in tests with earlier versions of the experimental environment that the website design needed to be engaging and clear in order to maintain participant interest. This also meant that we allow participants to leave chat messages to other group members, and to like the actions of other members of the group. This may allow for group coordination, and team building.
Participants in laboratory experiments are making decisions for a limited amount of time and are monitored during the duration of the experiment. Hence in laboratory experiments we expect participants to pay attention to the experiment even if the task is abstract. In our webbased experiment, the participation in the experiment competes with other tasks students may engage in. Hence we had to make the experiment less abstract. The final design was the result of involvement of undergraduate student web developers and feedback from pretests in two large undergraduate courses. When Amazon Turk is used the commitment is somewhat lower than laboratory experiments, but 90% of the participants still remain in the experiment until the end [24]. In our design, the experiment naturally competes with the various other activities a participant has going on in their daily life. Since participants still receive daily emails, we consider all participants in the experiment for the duration of five days. The experiment was implemented using the vcweb framework (http://vcweb.asu.edu/). Screenshot of some of the actions the participants can take. If an action is still available the button is green and the action can be performed by clicking on the button. If the action is not available anymore, it is showed by a red button with the text "expired". At the start of the experiment, registered participants are sent an email with their username and password to join the experiment. Once logged in they can view the currently available activities (Table 1), select an activity to perform and earn points for their group. Earnings are based on the accumulated points per person over the week. The rewards are 2 cents per point which leads to maximum earnings of 25 dollars. The amount of accumulated money earned is shown directly on the participant's home dashboard.
We developed four treatments based on [23]. [23] showed that providing information about group performance compared to other groups temporarily increases the level of cooperation. Group comparison was implemented by adding a leaderboard on the front page of the experiment website (see Fig 1). We also sent a nightly email to each participant that summarized their group's results for the day. The text of the nightly email is shown in Fig 3. We considered four treatments ( Table 2). The reason for those four treatments is to test the effect of group size, and the effect of including leaderboard to see group performance relative to other groups. We will test leaderboards when group earnings are independent of each other, and if earnings of the groups are dependent on each other. The basic two treatments are groups of 5 with and without a leader board (5-LB and 5-NLB). In 5-LB there are 20 groups of 5 in the experiment at the same time. Hence the participants can see how their group is performing compared to 19 other groups. In the treatment 5-NLB there are also 20 groups in the experiment at the same time, but they do not receive information about the performance of the 19 other groups. Those two treatments allow us to test the effect of leaderboards for small groups, similar to [23]. We performed different sessions leading to 60 groups in treatment 5-LB and 40 groups in treatment 5-NLB.
We also wanted to test the effect of group size and performed experiments with groups of size 20 without exchanging information on the relative performance with other groups (20-NLB). Based on the classic work on collective action we would expect smaller groups would perform better compared to bigger groups [25].   Finally, we included a treatment of groups of 20 where the groups are subdivided into 4 groups of 5 (4x5-LB). The payoff depends on the performance of the group of 20, but the subgroups of 5 will see how they perform compared to the other 3 subgroups during the experiment. We call it 4x5-LB since the subgroups of 5 see their subgroup performance compared to the other 3 groups of 5. If the use of leaderboards have a positive effects this could be used to increase cooperation in public good games with larger group size. This is what we would be able to test with 4x5-LB compared to 20-NLB.
We now state the three hypotheses we test. Those hypotheses are focused on the effect of the treatments on the performance of the group over the duration of the experiment of 5 days. The hypotheses for this experiment are therefore: H1. (5-NLB > 20-NLB) The average performance of groups of 5 is higher compared to groups of 20. This hypothesis is based on the seminal work of Mancur Olson [25] who argued that cooperation in public goods is higher in small groups compared to big groups.
H2. (5-LB > 5-NLB) Providing information to participants on their relative performance compared to other groups leads to higher performance of groups compared to those who do not get this information. [23] found support for H2 in their study. This hypothesis is also based on various studies that show the effect of descriptive norms (e.g. [15,16]).
H3. (4x5-LB > 20-NLB) When groups of 20 are split up in four groups with a leader board we will derive higher performance compared to group of 20 without subgroups.
Based on the arguments for H2 it would be beneficial to include group comparison. In order to reach an overarching goal for a large group one can therefore create subgroups and allow for group comparison in order to increase performance. Hence to increase the level of cooperation in a large group (20 persons in this experiment) we expect that information on the relative performance on subgroups has a positive effect.

Results
The experimental protocol was approved by the Institutional Review Board of Arizona State University (IRB protocol # 1302008874), and the experiments were run in the Spring semesters of 2014 and 2015 and the Fall semester 2014. 900 participants were recruited from a database of potential participants for behavioral experiments among undergraduates at Arizona State University. The participants signed up the week before the experiment and were informed they would receive instructions for the web-based experiment on a Sunday evening. The participants were randomly assigned to groups and treatments. The experiment began on Monday at midnight, and ended after 5 full days passed, on Saturday at midnight. Participants were informed about the length of the experiment when they were invited to participate. Table 3 provides the basic results of the experiments. The maximum score a group could attain in the experiment was 1250 points, and we found that all treatments averaged around 500 points. Groups of 5 without information about their relative performance had the lowest scores on average. When we use the Mann-Whitney one-tailed test on the data we find that results over the whole week are not significant from each other using a p-value of 0.05. Since 463.66 (5-NLB) is not larger than 532.27 (20-NLB) hypothesis 1 is rejected (Z = -1.52; pvalue = 0.0643), meaning that we do not observe that smaller groups perform better. Although 516.21 (5-LB) > 463.66 (5-NLB) with p-value = 0.0901 (Z = -1.34), it is not statistically significant for p < 0.05 and hypothesis 2 is rejected. This means that there is no significant effect of the leaderboard. Since 524.65 (4x5-LB) > 532.27 (20-NLB) we have to reject hypothesis 3 also (p-value = 0.4247 and Z = -0. 19). This means that the leaderboard has no positive effect to increase performance of large groups. Now we have found that the treatments itself does not lead to statistically significant outcomes, we will look in more detail to the data using multi-level regression analysis. Table 3 shows the average amount of points earned per person per day in the four treatments. They have the same pattern (increased performance until Thursday (Day 4), and drop on Friday (Day 5). The points earned do not differ significant (based on Mann-Whitney tests using pvalue = 0.1) except for day 4 when treatment 5-NLB is significantly lower than the other treatments. However, groups of 5 without social information seem to peak on Wednesday. The experiments are performed during different semesters and each semester we find the same pattern. The drop on Friday might be caused by different priorities of the student participants at a large state university. Fig 4 shows the distribution of points among the individuals in the four different treatments. The points will lay between 0 and 1250 points, and we rank the students from the highest to the lowest number of points they earned over 5 days. Since three treatments have 200 participants and one treatment 300 participants, we scaled the observations for the 200 participants to compare it with the treatment (5-LB) of 300 participants. Fig 4 demonstrates clearly that the distributions are very similar among the treatments. About 10 percent of the participants do not receive any noticeable number of points, while in each treatment there is also about 10% of them who earn 1000 points of more. Note that all participants opted in to an online experiment that would have a duration of 5 days.
There are also 673 likes given during the experiments. In groups of 20 participants give more likes per person, since they have more other participants to like their actions. We tested potential effects that explain the behavior of individuals during the experiments. In Fig 4 there was no significant difference between treatments at the individual and group level. But what is the effect of the communication and the posting of Likes? The nightly emails that participants received included the individual's score, the group's average score and the number of chat messages in the group. We performed a multi-level mixed-effects linear regression model using the individual level data (Table 4). In the first model (Model 1) we only include treatment dummies and the day of the week. We do not find significant effects of the independent variables. In the second model (Model 2), we do not include Day 5 (Friday) and now we find a positive effect of time, but no treatment effects. Model 3 includes Day 5 (Friday), but not Day 1 (Monday) since we include information participants in their nightly email. We include the number of points the individual earned the day before, as well as the average contribution of others in the group, the number of chat messages and the number of likes the others posted. We find that the total number of points earned during the previous day is a strong predictor for the amount of points for the current day. The points earned on average by others in the previous day have a negative impact, while the number of chat messages has a positive effect. In Model 4 we include a dummy variable for Day 5, the Friday, since we observe a sharp reduction in performance which might be caused by events outside the experiment (being it a Friday at a college campus). We also include dummies on whether groups that use leaderboards are ranked at the top 25% or the bottom 25%. We find now a positive effect of the actions of others in the previous day. This means that if others scored more points during the previous day, the participant increase the score in the current day. Note again that the participants get nightly emails with the performance of the group, which might stimulate people to increase their participation. We do not see an effect of chat messages or likes, treatment, or whether groups were ranked high or low. Finally, instead of individual treatments we control for the size of the group that shares the public good (20 for 20-NLB and 4x5-LB) and a dummy indicating there was a leaderboard (= 1) or not (= 0). Now we find a positive significant effect of the leaderboard. The leaderboard is predicted to increase the performance with 5 points per person per day, an increase of around 5%.
In sum, we still do not find specific treatment effects if we control for the days of the weak, and the information participants get. However, the use of leaderboard itself leads to a small increase (5%) of performance. We do find that a more participation by others in the previous day stimulate the actions of the participants, which may indicate conditional cooperation. This means that participants cooperate if others do too.

Discussion
This paper presented the first results of a new experimental environment where participants invest time in the public good during a period of days. We find a major inequality in the amount of participation among the participants, even though they signed up for the experiment just days before and received a reminder digest email every evening. When participants have to decide to invest their time to contribute to the public good, this investment of time Stimulating Contributions to Public Goods through Information Feedback faces competition with alternative activities. This is not the case when subjects participate in an experiment in the laboratory. Using other online platforms like Amazon's Mechanical Turk to conduct group experiments for a brief period of time (e.g., one hour) also introduces a limited amount of competition for alternative activities as shown by the fact that 10% of participants also drop out during the experiment [24].
Adding a leaderboard to the experiment had a small positive effect if we control of group attributes such as the number of chat messages and likes as well as group size. Thus we can replicate the observed effect of [23] that intergroup competition-without monetary incentives to win, increases the level of cooperation. We do find an effect of the amount of points earned on average by other group members in the previous day on the actions of individuals in the current day. This result confirms the finding in many other public good experiments that many participants are conditional cooperative [26]. Conditional cooperation means that individuals cooperate if they expect others will cooperate. Thus participants are influenced by information about participation in their own group, and by the relative performance of their group compared to other groups.
The effect of the individual treatments is not significant. This might be caused by a limitation of our experiment, which namely that our experiments participants are not known to each other while recent studies find the important of the influence of the strength of peers [27]. Nevertheless, the conditional cooperation effect is in line with other online experiments. Some studies such as [15] and [16] find statistical significant effects among hundred thousands or millions of participants where the absolute effect is very small. In those cases information about voting or energy use of their peers affect the decisions of individuals but the group size of such public group size is technically hundreds of millions (voting affect a nation, and energy use affect global climate change). In our controlled web-based experiment we could test more variations and found no significant effects among the individual treatments. A possible reason for the lack of significant effects in our experiment is the lack of social context experienced by the participants (they interact with fellow students, but not their own social network and not their own group identity) [27]. Only by combining treatments and control for group attributes, we could replicate the small effect from laboratory experiments by [23]. A hypothesis is that the social context is critical for collaboration online over a number of days, where people have to come back to check updates. This suggests that it is important for the effectiveness of social influence that the information is socially embedded.
To conclude, we find that actions of other group members have a positive effect, and we do find a positive effect of information on relative group performance. For future work when we include physical actions we expect that it is vital to grow social groups from existing social networks, instead of assigning participants into particular roles. It is important to facilitate social network traffic that leaves information about the activity of others and relative to other groups. One is willing to contribute when there is evidence that others are also contributing.
Supporting Information S1 File. Data from the Experiment. (XLSX) Table 4. A multilevel mixed-effects linear regression. Regression performed with the number of points that individuals collected during each day. We distinguish five models as discussed in the main text. The independent variables are the Points participants collected the previous day, group level information of the previous day (the number of Points per person, the number of chat messages, number of Likes), and dummies for the treatment participants were in. We controlled for group effects for performing a multi-level analysis where we indicated the groups participants in. The χ 2 was not significant which means that there was no significant group effect on the error terms. For each variable of the regression we provide the estimated value, the standard deviation (between brackets), and the 95% confidence interval. The *, ** or *** next to the standard deviation means a p-value smaller than 0.1, 0.05 or 0.01, respectively. doi:10.1371/journal.pone.0159537.t004