Fat-Tailed Fluctuations in the Size of Organizations: The Role of Social Influence

Organizational growth processes have consistently been shown to exhibit a fatter-than-Gaussian growth-rate distribution in a variety of settings. Long periods of relatively small changes are interrupted by sudden changes in all size scales. This kind of extreme events can have important consequences for the development of biological and socio-economic systems. Existing models do not derive this aggregated pattern from agent actions at the micro level. We develop an agent-based simulation model on a social network. We take our departure in a model by a Schwarzkopf et al. on a scale-free network. We reproduce the fat-tailed pattern out of internal dynamics alone, and also find that it is robust with respect to network topology. Thus, the social network and the local interactions are a prerequisite for generating the pattern, but not the network topology itself. We further extend the model with a parameter that weights the relative fraction of an individual's neighbours belonging to a given organization, representing a contextual aspect of social influence. In the lower limit of this parameter, the fraction is irrelevant and choice of organization is random. In the upper limit of the parameter, the largest fraction quickly dominates, leading to a winner-takes-all situation. We recover the real pattern as an intermediate case between these two extremes.


Introduction
The social world is populated by organizations. The organizational landscape is changing all the time, with organizations being created, restructured and dissolved. In this dynamical environment, potentially several thousands of agents interact following different motivations, both at the individual level and at the collective level. The coevolution of various institutional settings, private and public, as well as the existence of actors and activities at different levels of aggregation, makes research on organizational dynamics a complex subject matter.
We study organizational growth processes, that is to say, the time evolution of company size for a system of organizations. These processes exhibit statistical regularities, despite their complexity [1]. One main striking regularity concerns the nature of the probability distribution for the growth rate, i.e. how fast the size changes in time. This distribution has two features. First, it follows a fat-tailed pattern, meaning that organizational size changes very little most of the time, but dramatically every once in a while, leading to rare (yet possible) booms and catastrophic crashes. The second feature is that fluctuations (i.e. the variance) in growth rates are less severe for larger organizations, so that the variance in growth is not uniform across all size scales.
Our theoretical approach to the problem is the theory of social mechanisms in the analytical sociology framework [32]. In this tradition, models about social phenomena contribute to our theoretical understanding if they make clear the micro-level mechanisms that bring about a certain macro-level outcome, in this case the non-Gaussian growth-rate pattern. It is individuals that, through their actions, bring about macro-level outcomes [33]. The network of individual contacts is the setting in which the actors can influence each other, with activities in economic life being embedded in social networks [34]. Our proposed model has thus a focus on individuals in organizations as the relevant actors, and the network connecting them is of fundamental importance. This approach makes explicit the interplay between individuality and social influence, and shows how it can lead to an unexpected macro outcome.
Within this context, there are at least two possible reasons contributing to the features we observe in the growth-rate pattern. The fat tails could be caused by asymmetries in the structure of the social network that make certain actors more salient or popular than others. Their involvement in the process of influencing group membership would generate these occasional large growth fluctuations. Along this line, the fat-tailed pattern would be a result of the underlying fat-tailed nature of the network structure. Concerning fluctuations being smaller for larger companies, there could be a rich-get-richer phenomenon at play, by which large organizations become larger by a positive feedback process, and thus their size fluctuates less, while small organizations are more sensitive to perturbations caused by member entrance and exit. This is the firm diversification argument [4,9].
Several models have been proposed over a range of approaches (see overview below), but there is no general consensus about a dominant mechanism to account for the emergence of fat-tailed growth-rate distributions. Additionally, the majority of models in the literature do not focus on the network of agents that are members of an organization, but are rather aggregated models based on economic considerations or on various types of stochastic processes. For this reason, there is a need for models that describe the mechanism by which social actions at the micro level generate the pattern at the macro-level.
We address these two possible reasons by modelling a social network where agents are subject to social influence when it comes deciding on organizational membership. We build and simulate a first model based on [26], the SAF model. This first model represents the localized, network-dependent aspect of social influence. Therefore, our first modelling aim is to implement the SAF model and simulate it on different network topologies, in order to explore if a fat-tailed network structure is necessary for observing the non-Gaussian growth pattern. Secondly, we add a context-dependent aspect of social influence. We implement this by in the extended SAF model with an influence parameter that weights an individual's membership choice by contextual influence. We propose an alternative to the diversification argument as a possible explanation of fluctuation dependence with organization size, in terms of a combination of network-dependent and contextdependent social influence.

Overview of existing models
In this subsection, we describe existing models. None of the existing models, except for the one we take our departure from, incorporate micro-level mechanisms grounded on individuals and their interactions. We classify models into three categories: economic, physical and stochastic. We shall use 'group' and 'organization' as synonyms from here on.
Economic models account for a large fraction of the models of growth processes in organizations. We find economic models for firm sizes as far back as the mid-twentieth century. Simon [35][36][37] developed the concept of growth opportunities; Sutton later elaborated on this concept. Lucas [38] considers on the distribution of managerial talent; more recent models also use this notion [39]. Jovanovic introduced a model for firm learning [40] through an evolutionary-like process. In more recent times, Amaral et al. built a model on the concept of optimal size [1,3]. A study by Bottazzi [9] used the concept of market diversification. Finally, Dosi studied the relation of growth with innovation and production efficiency [14].
Another model category is physical models, which combine physical and socio-economic concepts. We name three of them: microcanonical models [41,42], models using Bose-Einstein statistics [12,13] derived from an urn-and-ball scheme, and percolation models [43] (see [44] for a much deeper review).
Within the third category of stochastic models is one of the oldest contributions to the literature, namely the Gibrat model (1931) [45] (see also the reviews in [46,47]). Many of the classical models assume Gibrat's Law, or some more sophisticated version of it [16,18,19,37,[48][49][50]. Therefore, we describe it in some more detail. The model is based on the following assumptions: 1. Law of Proportionate Effect or Gibrat's Law: The absolute growth rate of a company is independent of its size, i.e.: with Dt the time period between measurements, E t an uncorrelated random noise, usually taken to be normally distributed with E E~0 and s E %1, In order to measure growth, the central variable we look at is the growth rate, defined as r(t,Dt): log 10 S t S t{Dt . The choice of Dt (typically one year) is conditioned by the sampling frequency in the data sets. We use here Dt~1 year. Naming S(t i ):S i , we write a rate in general as and call S 0 initial size (this term is not to be confused with the size at the initial time step S ini ; it is rather the size from which the growth rate is computed). We should also note that the statistical distributions depend on Dt [1]. The size distribution is typically approximated by a log-normal. On the other hand, the growth rate in the Gibrat model follows a random-walk-like dynamics, and its distribution is normal. However, the literature agrees that a good way to describe at least the body of the growth-rate distribution for empirical data is through a Laplace (or ''tentshaped'') distribution [1,2,4,12] p(r 1 DS 0 )~1 ffiffi ffi Fluctuations in the Size of Organizations: Role of Social Influence where r r 1 (S 0 ) is the mean value of the growth rates in the bin, and s 1 (S 0 ) its standard deviation. This means that the Gibrat pattern is qualitatively very different from what one observes in real data. The growth-rate distributions for all initial-size bins are alike (due to Gibrat's Law), contrary to real observations, in particular regarding the decay of the fluctuations s 1 (S 0 ) as S 0 increases. It is reported in the literature that a power law of the form provides a good description for this decay [1]. Moreover, the Laplace tails are ''fatter'' than Gaussian, i.e. extreme values have higher probability. This implies that large growth rates (both positive and negative) are more likely in reality than in the Gibrat model. In other words, organizational size changes very little most of the time, but it can occasionally also change dramatically.
Additionally, there is a subcategory of models called subunit models, in which the size of a company is constructed as the sum of the contributions of internal subunits, e.g. different divisions. One well-known model is by Amaral et al. [4]. A variation of this model is the transactional model [51]. Another variation represents groups as classes composed of subunits [22,52]. In the hierarchical tree model [1,3] organizational hierarchy comes into play explicitly.
A final model in this category is based on additive replication in its general form. We call it general SAF model, for Schwarzkopf, Axtell and Farmer who first proposed it [26]. A specific case of this model is the base for ours (see SAF model below). At each time step, each member of an organization is replaced by x new ones, this last value taken from a replication distribution p(x). There is a competition rule: the new element is either taken from another group with probability j, or created from scratch with probability 1{j. The model is implemented on a social network, where vertices are individuals and edges are acquaintance relationships. The general SAF model is the only model among the reviewed literature that makes an explicit reference to a social network. This model follows our approach when it comes to designing a micro simulation model that generates a macro-level pattern through a defined mechanism.
The range of approaches from different disciplines shows the interdisciplinary nature of the phenomenon and the potential application of meaningful generative models to different scientific fields.

SAF model
The dynamics by which people become members of an organization has many different aspects [53]. One of them is influence through contacts in social networks, which we call contact influence. The setting for the SAF model consists then of a contact network. See the illustration in Fig. 1. There are N vertices representing individuals, and the arcs between them are links of contact influence. We follow [26] here. We work in the strongcompetition limit (j~0): Each agent added to a group must be taken from another group. Consequently, the total number of agents N is constant. The number of groups G is fixed as well.
The only interaction we consider among agents is social influence, through edges in their contact networks. The network arc i?j meaning that agent i is influenced by agent j. This simplification leaves out interactions coming from the formal (or informal) hierarchical structure of the organization. The underlying structure in the model is rather the social network of individual contacts, which we assume static over the time span of the problem.
The model variables are the sizes S (a) (t) of each organization a at time t, with a[f1,2, Á Á Á ,Gg. The size of an organization at a certain time is the sum of the individuals in that group at that time. Previous research has shown that other size definitions -for instance in terms of sales -produce similar statistics (see e.g. Ref. [1]).
Regarding the time evolution, at each time step t, a vertex is picked at random. The probability for vertex i to switch to group g we call switching probability, and is computed as wherek k i,a is the degree of vertex i in group a, counting its own group. That is to say, from the number of vertices that influence vertex i (counting itself), how many belong to group a. This rule conditions the decision to switch group on the group membership of network neighbours, and allows for the possibility to stay in the current organization. We impose an extra rule stating that no group can die out permanently. This is done to avoid that groups hit the absorbing state at size zero and thereby keeping the system in equilibrium. We implement the rule as follows: every Monte Carlo step a check is performed; if a group has zero size, a random vertex is switched to the empty group. A non-equilibrium version of the model is also possible. It would lead to different dynamics, with all system realizations going to a final one-group absorbing state. We tested this version as well, and found that the fat-tailed pattern is qualitatively reproduced. The implications are different, though. For instance, simulation time in the non-equilibrium version of the model could be translated more directly into some function of real time, while for an equilibrium model the association is not so direct.
One advantage with this model is that it has a clear sociological interpretation: the more close acquaintances a person has in a certain group, the more likely it is that the person will choose to become member of that group. The decision is mediated by a contact network, which puts an emphasis on the relational component of membership choice.

Extended SAF model
So far we have proposed a model where an individual is more influenced to join a group the more acquaintances she has in that group at the time. Influence is exerted via the individual's contact neighbourhood. But social influence can be broader than that. Several models for social influence have been proposed, for example regarding culture [54], opinion formation [55], information sharing in groups [56], etc. Specifically, there is a contextual aspect of social influence. Different settings can entail different pressures towards homogeneity of opinions or membership [57]. We add one parameter to our model that represents the degree of this kind of social influence, which we call contextual influence.
The key element we want to incorporate is that influence in a certain setting is not a property of individual agents, but rather a property that affects all the members in the mentioned context. We assume, for simplicity, a uniform contextual influence. We model it through a parameter of contextual influence d, with 0ƒdv?. In the limit d?0, the person does not feel any pressure to align herself with the neighbourhood. On the contrary, in the limit d?? the person acts solely based on the majority opinion in her surrounding neighbourhood.
We now define the extended SAF model. The assumptions, parameters, and variables listed before are still valid. However, the time evolution is now governed by the following switching probability: Setting d~1 recovers the first model. In the situation of low contextual influence (d?0) a vertex can change its state at random, not being influenced by the groups of the contact vertices, while still retaining information on the possible groups she can choose to switch to. The system configuration tends thus to a random one. This can be thought as analogous to a hightemperature (disordered) situation in a physical system. In the situation of high contextual influence (d??) the vertex looks highly upon her contacts. Configurations where the vertex is not aligned with the majority of her neighbours become less and less likely, and the system tends to polarize itself in domains. This is the analogous of a low-temperature (ordered) situation, the difference being that a physical unit does not have global information about the total number of groups in the system.

SAF model
We implement the SAF model in an Erdös-Rényi (ER) undirected network, by Monte Carlo simulation. The size distribution fits a log-normal distribution. The growth rate distribution for the basic case is shown in Fig. 2A. All initial-size bins fit a Laplace distribution. The variance decay with an increase of S 0 is also verified. Additionally, Fig. 2B plots the socalled scaled distributions (used in e.g. [2,4,28]). Given the function in Eq. Under this rescaling, the distributions for the different initialsize bins should collapse onto a single curve close to the Laplace distribution, as the figure shows.
Looking at the simulated growth-rate distributions, the upper tail tends in general to underestimate the corresponding Laplace curve, while the lower tail follows it more closely. We interpret this as a consequence of working with constant N. In our model, membership growth in one organization is done at the expense of membership decline in the rest of the groups. This is reflected in less frequent positive growth rates. The fit is still good because the Laplace is a highly-peaked distribution, concentrating much of the mass around zero, so the main deviations represent a small fraction of the total deviation. The fact that real systems exhibit fatter tails on both sides could be due to many factors, including the fact that growth-rate distributions could be a superposition of the distributions for different Ns.
We then implement the simulation with different network properties, in order to see the impact of network structure on the observables.
As a first change, we change the network from undirected to directed. We do so, keeping the mean degree constant, which implies that the number of influence arcs (incoming and outgoing) to a vertex remains on average unchanged. The reciprocity of each individual edge is lost, but the situation is still balanced on average, because the arc distribution along the network is random. That is to say, each agent on average influences and is influenced by the same number of alters. The comparison is illustrated by Fig. 3A-B. We can observe that the pattern is similar, both in terms of variance and of the ranges of initial-size bins.
Next, we change the network degree distribution from ER to scale-free (SF), again keeping the mean degree constant. The plots are shown in Fig. 3C-D. The undirected case is qualitatively similar, while the difference comes with the scale-free directed case. In the latter the distributions have larger variance in all initial-size bins, and more importantly, the higher initial-size bin comprises a much larger size range. The SF degree distribution appears to induce this behaviour, and our interpretation of this is as follows. In a SF network, by definition, there will be few ''popular'' vertices, and a lot of vertices with low popularity. As long as the influence is symmetric, the popular vertices drag its neighbours to their groups, but after a while, the equilibrium condition turns the tables-the high degree of a few vertices just makes the process faster in certain moments. Imposing a directed network breaks the symmetry. In the directed case, some popular vertices are highly influential (they receive a lot of arcs, and consequently them changing their group impacts many other vertices) while other popular vertices are highly susceptible (they radiate many arcs, but a group change is not as influential). We understand that this asymmetry manifests itself on the dynamics, causing the growth-rate distributions to be broader, and a lot of peaks of high group size reflected in the higher S 0 range.
The fact that both ER and SF networks are able to generate the Laplace pattern can be interpreted in the light of the timeevolution rule of the model in Eq. (4). In effect, it is a rule implementing some kind of preferential attachment, with the probability to switch group becoming greater as the vertex degree increases.

Extended SAF model
Using the extended SAF model, we now test different contextual influences. The analysis is done on a square lattice to facilitate the visualization of domains of vertices belonging to the same organization. The lattice has periodic boundary conditions, and we use the Moore neighbourhood (q~8 nearest neighbours). We do not lose generality by using a square lattice as an example topology. The growth-rate statistics are qualitatively the same for all studied topologies. In particular, for dw1, the emergence of clusters is recovered for ER and SF as well. Thus, the regular With S 0 the size at one year, and S 1 the size after one year, we define the growth rate r 1 : log 10 (S 1 =S 0 ). Here we plot the conditional PDF p(r 1 DS 0 ) to have a growth rate r 1 given an initial size S 0 , in log-scale. The data is binned by initial-size ranges, and shown by organization type. We also plot a fit by MLE to the Laplace distribution in Eq. clustering of the square lattice is not necessary to get the statistical pattern. It is rather the coefficient d that drives the dynamics.
In Fig. 4, we plot the group spacial distribution across the lattice, as well as the growth-rate distribution, for three values of d. The first situation, d~0, corresponds to a situation of low contextual influence-the system has no clear domains, the organization assignment tending to a random uniform one. This is reflected in the growth-rate distribution with a pattern similar to the one encountered in the Gibrat model. The second situation, d~1, recovers the original SAF model. The third situation, d~10, corresponds to a situation of high contextual influence. There are clear domains where a few organizations absorb the majority of agents. This is reflected in the growth-rate distribution as a collapse of all distributions on highly-peaked curves close to r 1~0 .

Discussion
In our study we show how individual agents, having local information on membership alternatives and interacting with local simple rules through their social network, can generate fat-tailed macro patterns of organizational growth. In doing so, we have not assumed any institutional constrain or external perturbation. Rather, it is the internal dynamics of the interaction that bring the distribution about. Individual agents are subject to contact influence in their localized network neighbourhoods, but the aggregation of their individual membership decisions brings about unexpected macro-level outcomes. Sometimes, like in the case of large values of growth rates, the consequences at the macro level are quite extreme. This result is relevant for the design of policies and regulations, which are usually much more grounded in traditional approaches. Non-Gaussian patterns tend to challenge our worldview of how a lot of processes typically work.
The growth-rate fat-tailed pattern shows up in Erdös-Rényi, scale-free and square-lattice regular networks, both in terms of the Laplace distribution and in the decrease of the variance with S 0 . This suggests that the mechanism driven by influence-based rules is more relevant to the pattern's qualitative replication than the details of network topology. Going back to our aims at the introduction, we find then that the scale-free character of the network is not a necessary condition to get a Laplace-like growthrate distribution.
Looking at the results from the extended SAF model, the parameter region around d~1 provides a mechanism able to replicate the system's growth process features. When d?0, the system has no clear group clusters, the organization assignment tending to a random one. On the contrary, when dw1, there are clear domains where a few organizations absorb the majority of agents. The intermediate situation best describes the real system's behaviour, and is modelled as a combination of contact and contextual influence. While the high-d situation produces stable rich-get-richer dynamics, to the extent that single-group clusters do not break up, in the intermediate situation the rich-get-richer dynamics is no more stable. The situation around d~1 is also the only one where fluctuations are different for different initial sizes. This addresses our second aim, so that our model does not resort to the diversification argument to generate the decreasing variance with initial size.
The sociological interpretation of our results is that the transition zone where the real system exists is an intermediate situation, dominated by neither totally random behaviour nor totally compliant behaviour. It is possible to interpret this from the point of view of the information an agent has to have in order to act. One way to implement the SAF model (d~1) is to think that, at a given time step, an agent chooses a random link amongst her neighbours, and switches to that contact's group. Such an implementation means that, on average, each organization a will be picked with a probability equal to the corresponding extended vertex degreek k i,a of that agent. Each agent needs to know only the group membership of the contact she last encounters, making this situation a reasonable model of a dynamics where people successively meet contacts without any further information. The high-d situation demands more information, since the agent should know the membership of all her contacts at a given time to be able to determine which is the majority membership. On the side of low contextual influence, choosing at random requires to know at least how many groups there are. So the intermediate case offers the agent a localized decision rule with minimal information requirements. We therefore get to a realistic model without invoking any argument of the real system being self-organized around a critical region. The parameter d can thus be reinterpreted as a way to weight different choice strategies, and tuning it around d~1 recovers a case of bounded rationality [58].
Therefore, we have two interpretations for d: the degree of contextual influence, and the tuning of membership choice strategy. The former is external to the individual, while the latter is internal. Both have an impact on the agent's behaviour, and both produce the aggregate pattern we observe empirically when tuned around the value for the SAF model. This suggests a duality between agent and social context, where the two views are consistent with the statistics we observe, and compatible with each other. We think this way of thinking exemplifies how to model one of the core issues in sociology, i.e., the interplay between individuality and social influence.
Further research should try to identify quantitatively how the statistical properties of the growth-rate distribution respond to systematic variations in both model and network parameters. For instance, in our explorations we found that the typical size of a group, given by N=G, seems to affect the distribution's variance. A significant model extension to consider would be to allow the system size N to change in time. This variation would have to be implemented with care in this network approach, because the properties of the growing network should be monitored dynamically throughout the simulation. Other interesting extensions could be to incorporate community and hierarchical structure. The possibility to belong to more than one organization is another important point.
Another discussion concerns extensions to the parameter d. In this study we have introduced it as a parameter quantifying the effect of an agent's social context. Contextual influence enters as an exponent that weights the probability to switch group. In our formulation, the degree of contextual influence is uniform for all agents and constant in simulation time. There are relevant extensions to consider. For instance, one could assume that different types of organizations have particularities as to their social settings, and model this with a parameter that depends on organizational type. These different parameters can then be related to the growth-rate statistics. Additionally, this framework of analysis should be quite dependent on the size rage of the organizations under study, i.e. small voluntary-oriented organizations with local range have a setting where the social networks may dominate the dynamics, while large formal organizations have other structural elements in place so that a direct application of our model would not be advisable.
Finally, a better understanding of organizational growth processes could be applicable to other processes producing similar statistical features, from bird populations to financial and economic systems. This being said, one should still be careful in signaling the apparent universal presence of these common features as evidence of the systems belonging to the same class. On that line, it is reported that the exponent b of the variance power-law relation, despite its value being similar for different . Group and growth-rate distributions for extended SAF model. We show the distributions on the right side, and on the left side snapshots of the last time step. As the degree of contextual influence d increases, we observe how domains gradually appear. The higher the contextual influence, the more likely is that a vertex would align herself with her neighbours. (A) d~0. Low contextual influence, random behaviour similar to Gibrat model pattern. No  systems, may not be universal. However, it is likely that different growth processes share similarities in terms of the underlying mechanisms driving them.