How Much Is the Whole Really More than the Sum of Its Parts? 1 ⊞ 1 = 2.5: Superlinear Productivity in Collective Group Actions

In a variety of open source software projects, we document a superlinear growth of production intensity () as a function of the number of active developers , with a median value of the exponent , with large dispersions of from slightly less than up to . For a typical project in this class, doubling of the group size multiplies typically the output by a factor , explaining the title. This superlinear law is found to hold for group sizes ranging from 5 to a few hundred developers. We propose two classes of mechanisms, interaction-based and large deviation, along with a cascade model of productive activity, which unifies them. In this common framework, superlinear productivity requires that the involved social groups function at or close to criticality, or in a “superradiance” mode, in the sense of the appearance of a cooperative process and order involving a collective mode of developers defined by the build up of correlation between the contributions of developers. In addition, we report the first empirical test of the renormalization of the exponent of the distribution of the sizes of first generation events into the renormalized exponent of the distribution of clusters resulting from the cascade of triggering over all generation in a critical branching process in the non-meanfield regime. Finally, we document a size effect in the strength and variability of the superlinear effect, with smaller groups exhibiting widely distributed superlinear exponents, some of them characterizing highly productive teams. In contrast, large groups tend to have a smaller superlinearity and less variability.


Introduction
Since at least Aristotle, the adage in the title has permeated human thinking, with prominent influence in psychology (Gestalt theory [1]), biology (brain functions [2], ecological networks [3]), physics (spontaneous symmetry breaking [4] and the ''more is different'' concept [5]), economics [6,7] among a wealth of other examples. Prominent among other developments are the fields of complexity science, synergetics and complex adaptive system theory, which strive to understand natural and social systems in terms of a systemic or holistic approach, where the above adage is translated into the scientific concept of emergence that results from repetitive interactions between simple constituting elements in extended out-of-equilibrium adaptive systems. Dealing with groups such as firms and production units, management science also strives to understand when and how a group can be more than the sum of individuals, and to design ways to improve team performance [8][9][10][11], through the mechanism of complementarity in organization [12,13] and innovations [14]. Because most activities in our modern environment require coordination and collaborative actions within groups of widely varying sizes, it is the fundamental aspiration of any manager, be it in the public or private sector, to find the gears that could enhance productivity.
Notwithstanding their importance in human culture and civilization since ancient times, we still have a limited understanding of the mechanisms at the origin of group productivity. Moreover, we do not really understand the conditions under which the whole is more than the sum of its parts, and how to quantify its productivity with respect to its different constituents. The bottlenecks hindering progress include the difficulties for quantifying productivity as well as the obstacles of controlled experiments that allow for clean conclusions. Indeed, most human groups and systems are entangled in their functioning and objectives, and are rarely amenable to systematic and continuous observations suitable for rigorous scientific analyses.
To address these problems, we use a source of data in which group cooperation is ubiquitous and can be quantified in great details, namely the dynamics of production intensity during the development of open source software (OSS) projects. Because OSS development is essentially collective, iterative, and cumulative, and the overhead costs for interactions is small thanks to the cheap electronic support mediating exchanges between developers, the study of potential increases of productivity by interaction and cooperation between several contributing developers is particularly well suited.
The next section presents the main empirical evidence of the superlinear production intensity law found for open source software projects. We then present two classes of mechanisms at the origin of superlinear production intensity, which are unified in the cascade model of productive activity. Empirical data tests are found to support the model. We then compare and attempt to reconcile present findings for OSS and the superlinear law previously reported for cities. A discussion section develops the broader implications of our results, and the conclusion section summarises our main results.

Quantification of productivity in open source software projects
We have analyzed the production for 164 open source software projects of size ranging from 5 to 1678 contributors. Figure 1 shows the complementary cumulative distribution of project sizes in our sample quantified by the number of developers involved in each project [all source data (Archive S1) and relevant statistics (Table S1), detailed per project, are available in Supporting Information]. The distribution is an approximate power law P w (S) :~Pr(sizewS)*1=S a with exponent a&1:4, which reflects a large heterogeneity of project sizes with few projects attracting many developers and a multitude of projects with just a few developers. The simplest generic mechanism for such power law distribution of human group sizes is proportional growth coupled with birth and death [15,16] as verified empirically in OSS package reuse [17], in group [18] and in product [19] dynamics.
A first idea would be to quantify the total production (for instance proxied by the number of lines of code, commits or the number of packages) of each software and search for a relationship with the total number of involved developers over the whole project. This is misleading because the total output results from a complex interplay between a time varying numbers of involved developers and the intermittent duration and intensity of their contributions. In the extreme limit, a single developer working over a lifetime may produce as much as tens or even hundreds of developers over a few months. The large variability of developer numbers and contributions as a function of time for each project is illustrated by Figure 2, which shows the intermittent dynamics of active contributors as well as their productive activity as a function of time (in logarithmic scales).
To capture more faithfully the actions of contributions via cooperation, we propose to focus on short-term production and group sizes. For each project, we partition its lifetime in time windows of a fixed size that we shift over the whole project duration. We then quantify the production in each window and study its relation to the number of active developers during that same time window. As proxies for the production of developers, we could use either use lines of codes (LOCs) or commits. LOCs are straightforward metrics but suffer from the criticism that real production and quality is not in general proportional to the number of code lines. Indeed, excellent contributions are in general characterized by efficient and elegant coding associated with conciseness. Among software developers, it is well recognized that the number of LOCs contributed is not a predictor of quality.
However, in open collaboration, each innovation step can be seen as a commit uploaded and compounded on an online repository, which keeps track of all changes over time. Each commit reflects the contributor's commitment to expose to the community her proposed solution to an open problem. Commits are the elementary units that get peer-reviewed, tested and eventually integrated in the project knowledge base. Thus, they are a direct measure of the iterative productive process at work in peerproduction. All commit activities are parsimoniously indexed and timestamped on the project repository.
Notwithstanding these arguments in favor of using commits as metrics of production, it is useful to test for a possible relation between LOCs and Commits. Figure 3 documents a robust scaling relationship LOCs*(Commits) d , with exponents d & > 1 for most of the projects. These findings shown in Figure 3 bolster our confidence in the robustness of the findings reported below, which should not be sensitive to the specific choice of the metric for production. Figure 4 demonstrates the typical superlinear relationship where the production R is defined as the total number of commits measured per 5-day time windows for the Apache Web Server (http://httpd.apache.org/) and c is the number of active contributors in the same 5-day time windows. Contrary to the naive expectation that the production R should be proportional to the number c of developers, Figure 4 documents a superlinear relationship with exponent b&1:5+0:1, therefore significantly larger than the value 1 describing a simple proportionality R!c. Over all OSS projects studied, the estimated statistical average iŝ b b&4=3. Since 2 4=3~2 :5, this explains the title of this paper. For many projects, b is larger than 4=3, such as the Apache Web Server project shown in figure 4, for which 2 1:5~2 :8. These results are robust with respect to the length of the time windows (from 1 day to 10 days).

Mechanisms for superlinear production
We consider two classes of mechanisms for superlinear production.

Interaction-based mechanism for superlinear production
There is a variety of channels by which contributors commit more solutions to problems when the community is more active. The peer-review process is more likely to occur when more contributors are active, there are incentives to share early with the community to avoid redundant work and some problems require collective intelligence to increase their chance to be solved [20], because they require tight coordination among different technical parts of the code [21]. A priori, the number of active developers is an extensive variable, that is, it is additive for independent noninteracting systems. When interactions between developers occur, the observed increasing return of productive activity implies that the change dR=dc of productivity upon the addition of a developer due to the existence of interactions is not a constant but grows itself with the number of active contributors (as *c b{1 with bw1). There is thus a remarkable increase of productive activity, not only as the sum of increased individual commits, but also as a result of interactions among active contributors.

Interactions leading to a phase transition
In standard models of interaction, linearity between the observable and external driving field as well as number of elements in the system is the rule (b~1), except at or close to a critical phase transition point. As an illustration, consider the average magnetisation m(T) per spin at a function of the temperature T in a system undergoing a paramagnetic-ferromagnetic phase transition at the critical temperature T c . The standard relation m(T)~x(T)H relates linearly the average magnetisation m to the external intensive magnetic field H via the susceptibility x. Introducing the spatial spin-spin correlation length j of the system, it is known that the susceptibility diverges as a power of the correlation length as T?T c where c and n are two critical exponents related by the hyperscaling relation c=n~2{g~d(d{1)=(dz1), where d is the space dimension. Exactly at T~T c , the linear relationship between m and H given by (2) is replaced by the nonlinear relation defining the exponent dw1. This means that the collective behaviour of the spin at criticality induces a nonlinear response of the magnetisation m for very small external magnetic fields H (indeed, H 1=d &H for H?0 and dw1). The values of the exponents are c~1, n~1=2, g~0, d~3 in the mean-field regime, which holds at the upper critical dimension d~4. The relationship (3) looks superficially similar to (1) when compared with the standard linear relation m(T)~x(T)H, but here the magnetic field is an intensive quantity while relation (1) describes the production intensity as a function of the number of group members, which is an extensive quantity. Actually, a relation similar to (1) can be derived by introducing the finiteness of the spin system and using the theory of finite-size scaling [22]. For a system of finite linear size L and thus finite volume V~L d , the theory of finite-size critical phenomena implied that relation (2) is replaced by obtained simply by replacing j by L. In words, the unique relevant length, which is the correlation length j for an infinite system at criticality, becomes the system size. With m~xH, this yields m*V c=dn H. Since m :~M=V is the magnetisation per spin, we obtain that the total magnetisation M of the system with a total number V of spins is given by that it, becomes superlinear at or close to criticality, similarly to expression (1). This type of superlinear relationship (5) holds more generally in various models of interacting elements at or close to criticality [23][24][25][26]. The meaning of criticality is that, on average, one action triggers on average one follow-up action, ensuring that the dynamics remains delicately poised between growth and decay, or between order and disorder. Therefore, an explanation of superlinear productivity by the interaction-based mechanism requires elucidating under which circumstances open source projects operate close to or at criticality. The study of dynamics of book sales [27,28] and YouTube videos views [29] has shown evidence of these critical triggering effects in large social networks.
Open source projects and their online communication platforms coupled with the code repository serve a similar social network role yet at much smaller scales [30,31]. Since these above analyses as  (1) (light grey for bv1, grey for 1ƒbv2 and dark grey for b §2) for time windows of 250 days. Blank areas show time windows for which b could not be fitted, mainly because the numbers of active contributors (resp. commits) were strongly varying over these periods. In other words, it is possible that super linear production was occurring in these periods but we could not determined it. doi:10.1371/journal.pone.0103023.g002 1+1 = 2.5: Superlinear Productivity in Collective Group Actions PLOS ONE | www.plosone.org well as those presented here benefit from the survival bias, in other words the analyses are performed on top performers among a much larger database, the existence of criticality in these system can be interpreted as the signature of a degree of success quantified by significant activity. Specifically, considering a large universe of projects, those that are of interest in the sense of exhibiting significant dynamics in volume and quality are those for which the conditions are met to be close to criticality.

Interactions leading to superradiance-like phenomena
The superlinear dependence of the production intensity as a function of the number of group members has a rather direct analog with the phenomenon of superradiance [32,33], a coherent effect in many-body systems of N excited emitters that interact with a common light field. In the limit when the wavelength of the light is much greater than the separation of the emitters, then the emitters interact with the light in a collective and coherent fashion. Rather than radiating independently with a total intensity proportional to N as would be expected for independent emitters, in the most favorable case of perfect coherence, the total radiation scales as N 2 , similarly to the mean-field prediction b~2 obtained from expression (13) when the exponent c of the tail distribution of first generation contributions per developers is larger than or equal to 2. For more realistic experimental situations, the exponent is smaller than 2, for instance equal to 4=3 when the initial light fluctuating field is small [34], or equal to 5=3 for N two-level atoms placed within isotropic photonic band-gap material (but can reach the value 3 for anisotropic 3D band gaps) [35]. In physics, the superradiance effect results from the existence of correlations and interactions between emitters, similarly to the interactions between group members of OSS projects. The interactions and resulting correlations between emitters are mediated by the radiated light, similarly to the correlations between developers via the production of commits. The superradiant emission is a cooperative process involving a collective mode of all the atoms of the sample. In this collective mode, an ''order'' appears in the system which can be defined by the build up of correlation between the dipoles belonging to different atoms. This correlation is quite reminiscent of the spin-spin correlation appearing for example in a ferromagnetic sample [33]. There is in fact a hidden phase transition in which the role of the diverging correlation length is played by the light wavelength, which has to be much larger than the inter-emitter distances.
Moreover, the smaller value of the exponent b for large groups and for cities, as documented below, has a straightforward interpretation in the superradiance analogy. Indeed, the maximal number of correlated emitters is limited by the correlation, or coherence volume. When the number of emitters exceeds the maximal number of those that effectively interact, the superlinear exponent decreases. This is due to the fact that, for larger numbers of emitters, the system separates into clusters or subgroups that radiate practically independently. In physics, this effect is termed filamentation. The same effect is argued to happen for the studied case of production intensity, as is discussed in the section below entitled ''Reconciling present findings and superlinear production in large cities''.

Large deviation mechanism for superlinear production
The second class of mechanisms builds on the evidence of large deviations in the statistics of the production activity R over the whole population of contributors and over the whole life of the project. Figure 5 shows the complementary cumulative distribution P tot w (r) :~Pr(Rwr) of all contributions per developer over a long period for the Apache Web Server project. One can observe an approximate power law tail dependence with m&0:92. Within the epidemic framework presented in the next section, P tot w (r) will be shown to be equivalent to the statistics of the cluster sizes of contributions following critical cascades [36] (see expression (12)), i.e., when the dynamics of triggering of activity is close to or at the critical point of a branching process. This result, showed for the Apache Web Server project, is representative of the distributions found in other collaborative projects.
In the presence of such a power law statistics of contributions characterized by an exponent mv1, we show below that the sum of contributions over all developers is controlled by extreme contributors. The contributions made by these exceptional members of the group are also responsible for the observed superlinear behavior given by (1). This mechanism is reminiscent of the improved group performance that results from the presence of one or few surperforming individuals [37]. In this case, the largest contributor provides a finite fraction of the whole production over a given time period. This largest contributor (i.e. the ''large deviation'') has a superlinear contribution in the group size [38,39]. In this situation, the increasing productive activity results from a large heterogeneity of activity per individual. And the more contributors c during a production period, the more likely it is to find an extremely large contribution.
Specifically, starting from expression (6) for the complementary cumulative distribution P tot w (r), we denote p(r)*1=r 1zm the corresponding probability density function obtained as the derivative of P tot w (r). Let us call fR 1 ,R 2 ,:::,R c{1 ,R c g, the total number of commits contributed respectively by the developers 1,2,:::,c{1,c. Let us call R max (c), the largest among the set fR 1 ,R 2 ,:::,R c{1 ,R c g. A good estimate of R max (c) is obtained by the condition that the probability Ð z? Rmax(c) p(r)dr to find a developer with a total contribution equal to or larger than R max (c) times the number c of active developers is equal to 1, i.e., by the definition of R max (c), there should be typically only one developer with such a number of commits. This yields An estimate of the typical total number of commits R 1 zR 2 z:::zR c contributed by the c developers can then be obtained as [38,39] R 1 zR 2 z ::: zR c &c We stress that the scaling *c 1=m only holds for mv1 and is replaced by *c, i.e., linearity, for mw1. The upper bound in the integral in (8) reflects that the random variables fR 1 ,R 2 , :::,R c{1 ,R c g are not larger than R max (c) by definition of the later. According to equation (8), the typical total production (number of commits) by c developers is proportional to c 1=m , when their contributions are wildly distributed with a power law distribution with exponent mv1. According to this large deviation mechanism, the superlinear exponent b is equal to 1=m.
prediction of the large deviation mechanism : b~1=m, for mv1: Within this large deviation mechanism, explaining the superlinear productive activity (bw1) reduces to explaining the heavytailed distribution of commits R per contributor over a large period of time, i.e., amounts to derive the power law distribution (6) with mv1. For this, the next section proposes a generic model.

Cascading model of productive activity
Both the interaction-based and the large deviations mechanisms can be captured together by a generic cascade process, which is well described by the excited Hawkes conditional Poisson process [40]. The Hawkes process typically models well a variety of social dynamics involving complex human interactions such as online viral meme propagation [29], gangs and crime in large American cities [41], cyber crime [42] and financial contagion [43][44][45]. The Hawkes process is defined by the intensity I(t) of events (commits) given by where ft i ,i~1,2, :::g are the timestamps of past commits, l(t) is the spontaneous exogenous rate of commits, f i is the fertility of commit i that quantifies the number of commits (of first generation) that it can potentially trigger directly, and w(t{t i ) is the memory kernel, whose integral is normalized to 1, which weights how much past commit activities influence future ones. The function w typically reflects how tasks are prioritized and performed by individuals according to a rational economy where time is a non storable resource [46]. Expression (10) expresses that the number of commits contributed between time t and tzdt 1+1 = 2.5: Superlinear Productivity in Collective Group Actions PLOS ONE | www.plosone.org results from two sources: (i) an exogenous source l(t)dt representing the spontaneous commits not related to previous commits; (ii) an endogenous term represented by the sum over all commits that were made prior to t, and which are susceptible to trigger future commits. An obvious triggering mechanism is debugging: a past commit may attract the attention of a developer who fixes a bug and thus improves the code. Another triggering mechanism by which a previous commit may trigger a future commit is when the former enables new functionalities and relationships that open novel options for the developers. The Hawkes model is the simplest conditional Poisson process that combines both exogeneity and endogeneity. The class of Hawkes models can be mapped onto the general class of branching processes [47]. The statistical average fertility Sf i T defines the branching ratio n, which is the key parameter. For nv1, n~1 and nw1, the process is respectively sub-critical, critical and super-critical [48,49]. In the sub-critical regime (nv1), the average activity tends to die out exponentially fast and the  source term l(t) controls the overall dynamics. At criticality (n~1), on average one commit is triggered in direct lineage by a previous commit, corresponding to a marginal sustainability of the process with infinitesimal exogenous inputs. The super-critical regime (nw1) is characterised by an explosive activity that can occur with finite probability. The results derived below are thus fundamentally associated with the existence of a critical phase transition determined by the control variable n. The nature of the critical phase transition for this Hawkes model with distribution of fertilities has been described in Refs. [36,50,51]. Interpreting a cluster or connected cascade in a given branching process of triggered contributions as the burst of production in a group of developers, the distribution of contributions is thus mapped onto that of triggered cluster sizes [36].
Let us define the complementary cumulative distribution P 1st w (r) of contributions (number of commits) per developer directly triggered by a given past commit, which can be called firstgeneration daughter commits generated by a mother commit. Consider the case where P 1st w (r) is also a power law Close to or at criticality, the distribution of cluster sizes, which is equivalent to the distribution of productive activity P(rwR) given by (6) has an exponent m~1=2 [52], under the condition that the distribution P 1st w (r) of contribution sizes triggered directly by previous contributions (so-called first-generation cascades) decays sufficiently fast, i.e., with c §2. The result m~1=2 holds also for any distribution P 1st w (r) decaying asymptotically faster than a power law [36]. When 1vcv2, the mean field exponent m~1=2 is changed into [36] m~1=c: Together with (9), the superlinear exponent b is predicted to be b~1=m~c, for 1ƒcƒ2, ð13Þ that is, equal the exponent c of the tail distribution of first generation contributions per developers. For cw2, m~2 and therefore b~2. An analytical derivation of the prediction (13) using the Hawkes process (10) that anchors rigorously the large deviation argument of the previous section is given by Saichev and Sornette [53]. Figure 6 synthesizes the relation between superlinear productive activity, (critical) cascades, the distribution of first-generation triggering and the total distribution of activity per contributors over a sufficient long period.

Empirical tests
We now turn to empirical tests of this theory. For each 250 days period and for each project in our database (Archive S1), we have calibrated the power law tails of two distributions: 1. the distribution of the total number r of commits per contributor over the 250 days, which is taken as a proxy for P tot w (r), with exponent m; 2. the distribution of the number of commits per developer per 5 days time bin, which is assumed to be a reasonable proxy for the distribution P 1st w (r) of the first generation production characterized by the exponent c.
For each OSS project, we have used the discrete maximum likelihood estimator (MLE) with a p-value threshold pw0:1, obtained by bootstrapping, and Kolmogorov-Smirnov Distance KSv0:15 to select the ranges over which the calibration is performed [54] (see Table S1, for detailed results of each OSS project analyzed). Figure 5 shows the result for the Apache Web Server project. The fitting procedure qualifies the existence of a power law tail for the two empirical distributions with estimated exponents respectively equal to m~0:92 + 0:1 and c&1:28 + 0:1. These values with their error bars are compatible with the prediction (12) m~1=c, resulting from the cascades of triggering [36]. This result is typical of the other investigated OSS projects, as shown Figure 7, albeit with a considerable variability. This is expected since the projects are likely to be characterized by many more dimensions that the production and cascading effects considered here. Figure 7 presents b as a function of c (panel A) and 1=m as a function of c (panel B) for all the OSS projects on our database, According to the cascading model of productive activity presented in the previous section, we should have b~c~1=m, according to (13). Indeed, one can see that b, c, and 1=m are clustered around &4=3. Almost half of the considered periods (184 of a total of 390) fitted over all projects belong to the regime where 1vbv2 and 1vcv2 (panel A) and forty percent (86 out of 213) are such that 1v1=mv2 (panel B) as predicted by the theory.
Let us first focus on the relationship between 1=m and c shown in panel B of Figure 7. Note that the statistics on the exponent m is significantly smaller compared to that for c simply because we obtain one data point over each 250 day periods for m compared with one data point per 5 days time bin for c. The shaded square represents the domain over which the theory applies (86 over 213 data points). To test quantitatively the relation 1=m~c, we used a Gaussian bivariate distribution model. The dotted ellipses show the first three standard deviations equi-levels around the barycenter 1=m&c&4=3 and the black line represents the principal axis of the bi-Gaussian model. We also performed a principal component analysis (PCA). The red dotted lines show the two main directions of the variance obtained with the PCA. Both methods support a positive correlation between b and c with slope &1:02 with the bi-Gaussian approach and &1:47 with PCA. To our knowledge, this may be the first empirical test ever of the renormalization of the exponent c of first generation events into the renormalized exponent m~1=c due to the cascade of triggering over all generation in a critical branching process [36,52].
The evidence for the relationship between b and c is presented in panel A of Figure 7. First, one can observe a prevalence of the large-deviation critical interaction regime as the grey square area delimited by 1ƒc, bƒ2 is very densely populated (184 out of 390). Second, as already pointed out, the barycenter of the cloud of data points is on b&c&4=3, as expected from theory. However, we find limited support for a clear linear relation between b and c.
The bi-Gaussian model analysis provides the three dotted ellipses showing the first three standard deviations away from the barycenter. The black line representing the main axis of the bi-Gaussian model suggests a negative correlation between b and c. Using a PCA analysis, we find a positive relationship on the second principal component, with slope &1:24. These results suggest that very produc-tive projects and periods within projects, characterized by a large superlinear exponent b, are likely to be due to more complex interactions between the developers and their mutual triggering that assumed by the simple theory developed above. In particular, differentiation between same-developer commit triggering and inter-developer com- Figure 6. Relationship between superlinear productive bursts, cascading dynamics, and heavy-tailed distributions of 1 st generation and cumulative contributions. (A) (light blue) Triggering mechanism generating the clusters of size with renormalized exponent m~1=c from the distribution of first generation ''daughter events'' with exponent c. For the sake of simplicity, we represented one cluster of activity per contributor, but triggering can occur between contributors provided that the probability of triggering remains the same between all contributors. (B) (light green) shows how the triggering mechanism generates superlinear productive activity A as a function of the number of active contributors c. doi:10.1371/journal.pone.0103023.g006 1+1 = 2.5: Superlinear Productivity in Collective Group Actions PLOS ONE | www.plosone.org mit triggering seem necessary along the lines of Refs. [19,55].
Reconciling present findings and superlinear production in large cities Figure 8 reveals that the clouds of superlinear production exponent b exhibit an interesting regularity as a function of the total number of contributors N of an OSS project. The intuition motivating this investigation is the following. While a minimum critical mass of contributors is needed to foster productive bursts, large projects suffer from coordination costs, which may offset the increasing return of productive activity. Figure 8 (panel A) shows indeed that the superlinear exponent b decreases on average with the size of the projects. Panel B demonstrates that, for projects of up to 33 contributors, the number of 250 days periods with bw1 (superlinear regime) increases as a function of the total number N of developers, approximately according to (ratio of time windows with bw1)*1:37 log 10 N: ð14Þ For Nw33, a different regime occurs characterized by a much smaller ratio of the time periods with superlinear productivity (bw1). Taken together, the two panels of Figure 8 support the view that superlinear productivity is the appanage of relatively small projects with no more than 30-40 developers in total, while larger groups face the difficult challenge of creating and maintaining productive bursts. The data is too scattered unfortunately to allow us to draw a firm conclusion on the value(s) that b converges towards for large project sizes.
There may be a link between our results and a previous study reporting the phenomenon of superlinearity on a completely different class of objects, namely cities. Data from 360 US metropolitan areas have shown that wages, number of patents, GDP and intensity of crime scale superlinearly with population size [production *(population) b ] with an exponent b&1:15 [56,57]. The value of b larger than 1 reflects the fact that productivity increases by about 11% with each doubling in population [58]. Qualitatively in line with our findings, the superlinearity found in our OSS data is significantly stronger (b&4=3 on average, with large variations and some projects being characterised by much larger b's) for the smaller projects with no more than 30-40 developers. We note that our results apply to a completely different range of group sizes compared with the results for cities involving population of tens of thousand to tens of millions inhabitants. The underlying mechanisms are perhaps different [59]. For cities, the superlinear scaling in urban productivity demonstrates the importance of cities as centers of enhanced interactions, leading to generation and exchange of knowledge and exploitation of innovations [58]. For the OSS projects, many other factors come into play, such as the role of diversity and complementarity, which describes the fact that doing more of one thing increases the return to doing more of another. Other possible mechanisms include synergies, economies of scale, coordination and leadership, role model and entrainment effect, motivations, friendship and other psychological factors. However, Figure 8 suggests that these mechanisms dampen out as the project size becomes very large, possibly leaving only those still active at the level of city sizes.
Expanding on the remark on the different sizes involved in our OSS database compared with cities, we present a simple mechanism and theoretical argument that may explain the smaller value of the superlinear exponent for cities, deriving it from our results obtained for small group sizes. The key idea is that the population of a city can be partitioned into many groups of persons interacting closely within a group and loosely or not at all across groups. Groups can be firms, or department within firms, clubs, and other organisations through which people interact. We assume that, within each group, the superlinear production law (1) holds with the exponent b&4=3 found in our OSS database.
The second ingredient is that group sizes g are widely distributed, roughly as Zipf's law [15], where p(g) is the probability density function of the group sizes g, z~1 if Zipf's law holds exactly, while in general z can deviate from 1 for a variety of reasons [16]. Let us assume that a city of total population N is constituted of n groups, respectively with memberships of N 1 ,N 2 , :::,N n individuals. The total production of the city is then, according to (1), assuming for the moment and for simplicity that b is independent of group sizes. R(N) in expression (16) can be estimated as [38,39] R(N)*n ð g max (n) where g max (n) is the largest group size among the n groups, which can be estimated by g max (n) p(g)dg*1?g max (n)*n z : ð18Þ By conservation and assuming for simplicity no strong overlap between the groups, we have approximately This leads to n*N for zw1 and n*N z for zv1. In words, a relatively thin tail of the group size distribution (zw1) is associated with a number of group scaling proportionally to the total city population N. In contrast, for a heavy tailed distribution (zv1), the number of groups scales sublinearly with N, as the few largest groups account for a finite fraction of total population. Reporting in expression (17), this yields R(N)*N b , with the exponent b obeying three possible regimes.
1. zƒ1 implies b~b: the same superlinear production exponent defines the whole city production as a function of its population as does the production of each independent group. The mechanism is clear: for zv1, a few single largest groups dominate the n-partition and account for the majority of the city population. The same scaling holds essentially because the city is almost controlled by a single group and we have assumed the same exponent b for all groups. The empirical evidence suggests that this case does not apply. 2. 1vzvb implies b~b=z. In this regime, there are still very large groups that contribute to the superlinearity but their relative numbers is much less than for zƒ1. The values b~4=3 with b~1:15 can be reconciled with z~b=b&1:16. This exponent is, with error bounds, roughly compatible with the value found for firms in the US, close to 1:25 [60]. 3. 1vbvz implies b~1, which corresponds to a linear growth of production of the city with its population. In this regime, the overall city production is controlled by the many small groups constituting the city and there are no scale effects other than a proportionality with the number of small groups.
While this argument is quite naive, it demonstrates the importance of the interplay between partitions of cities in groups, the corresponding productivity of such groups and the size distribution of these groups. A similar story is likely to be relevant in large OSS projects, groups and firms, which for a variety of reasons ranging from cognitive limitations [61] to efficiency maximization [62] are found to organize in subgroups, often in a hierarchical way [61].

Discussion
In the early days of the industrial revolution, Adam Smith noted how the successive efficiency gains of communication means have helped reach unprecedented pools of resources and how they have unlocked some limitations of the labor market through improved division of labor [63]. The telegraph, telephone and more recently the Internet have further pushed back the possibilities for knowledge production and for labor organizations on the model of collective action [64]. Nowadays, unrelated people spontaneously team up across the world in open collaboration projects and join forces to create knowledge in the form of software, natural language [65], mathematics [66] as well as for the production of tangible goods [67]. These organizations rely primarily on the principles of peer-production [68]: (i) task self-selection, (ii) peerreview and (iii) iterative improvement, at odds with traditional market and firm production organizations [69]. Expertise can be timely and rightly pulled from a broader community towards efficient problem resolution. The present understanding of group performance in social psychology goes in the same direction: experiments involving small groups performing coordination tasks [8,70], problem solving [37] and innovation [14] support the hypothesis that larger groups perform better because more diverse cognitive abilities can be pooled. Group productive activity can also be more than the sum of their parts if members develop social sensitivity among each others [20]. However, the marginal gain of having more individuals in a group decreases rapidly to be negligible beyond five individuals [37,71,72]. Similarly, as projects attract larger communities, more coordination is required through social norms and formal governance structures [21], which may in turn reduce the positive effects of peer-production [73].

Conclusion
In this paper, we have shown that productive bursts, associated with increasing return of activity, result from the mechanism of critical triggering of commits among contributors. Specifically, we have shown that production intensity, or production per unit time, grows superlinearly as a function of the number of participants in a group. Practically, we have found a superlinear relationship R*c b with bw1 between the total number R of commits measured per n-day time windows for different OSS projects and c is the number of active contributors in the same n-day time windows. We have found that these results are robust with respect to the length n of the time windows, i.e. when varying n from 1 day to 10 days.
Such critical triggering may operate according two co-existing mechanisms: interactions and large deviations. These mechanisms have been falsified in three independent ways: (i) documenting the superlinear relationship between productive activity R and the number of active contributors c characterized by the scaling exponent 1vbv2; (ii) measuring the power law tail distribution of first generation cascades with exponent 1vcv2 and checking that it explains the superlinear productivity exponent b; and (iii) measuring the power law tail distribution of production cluster sizes with exponent m and verifying that it is approximately equal to the 1=c, where c is the distribution of contributions per developer at short times.
We have found that superlinear productive activity holds for a broad range of project sizes and types, with a slight decrease of the average scaling exponent b with the total number of contributors N. The frequency of productive bursts occurrence in projects has been found to be very large for Nƒ33 compared with larger projects. The results suggest that size and threshold effects have an influence on the ability to trigger and maintain critical triggering of individual contributions. Indeed, contributions must create enough reaction opportunities to trigger on average as many follow-up contributions. Pervasive communication systems (social networks), physical proximity (e.g. cities), or even personal dedication to the project surely help increase opportunities for a contribution to trigger a follow-up action. On the other hand, large and complex structures with overwhelming communication loads or inadequate governance structure can inhibit the ripe circulation and reuse of knowledge for the sake of further cumulative innovation. The large deviation mechanism provides another take-away lesson: open collaboration does not imply equal work between contributors. On the contrary, productive bursts are the hallmark of a minority of individual engagement with intense interactions and short-lived contributions of far above average sizes. Whether these large deviation contributions pull engagement by others or on the contrary are pushed by the community remains an open question to be elucidated. Table S1 Table containing summary statistics (comma separated  file), SbT, ScT, and SmT, for each project analyzed in this study. (CSV)

Supporting Information
Archive S1 Compressed archive of Python Numpy arrays containing the time series of all commits, including timestamp, user, file modified, for each open source software project analyzed in this study. (ZIP)