Emergence of Shared Intentionality Is Coupled to the Advance of Cumulative Culture

There is evidence that the sharing of intentions was an important factor in the evolution of humans’ unique cognitive abilities. Here, for the first time, we formally model the coevolution of jointly intentional behavior and cumulative culture, showing that rapid techno-cultural advance goes hand in hand with the emergence of the ability to participate in jointly intentional behavior. Conversely, in the absence of opportunities for significant techno-cultural improvement, the ability to undertake jointly intentional behavior is selected against. Thus, we provide a unified mechanism for the suppression or emergence of shared intentions and collaborative behavior in humans, as well as a potential cause of inter-species diversity in the prevalence of such behavior.


Philosophy
It has been argued in the philosophical literature that the intentions behind collective acts can be distinct from an aggregation of individual intentions (Bratman, 1992;Searle, 1990). This is shared/collective intentionality, the idea that "we will do X" is distinct from "I will do X [because she is also doing X]". Collective action can occur through either mode of reasoning. Consider the following two player game. It is a coordination game in which the two players (me and you) only achieve payoffs if they coordinate on the same action. Given coordination on the same action, the players would rather that this action be action A.
A B A 2 0 B 0 1 When it comes to describing actions in such a game, an expression such as (W1) "We are performing action B" can be considered equivalent to an expression such as (I1) "I am performing action B and you are performing action B". However, when it comes to describing intentions, there can be no such equivalence. Consider the expressions (W2) "We intend to do B" and (I2) " I intend to do B and I think that you intend to do B". 1 It can be argued that, given that the players know the payoffs of the game, that (W2) does not make sense because the intended action profile is Pareto inefficient: both players could do better if they instead intended to do A. Moreover, such an alternative intention would be self enforcing in that given a shared intention to do A, neither of the players need worry about a subsequent deviation by a cheating partner. However, such an argument does not apply to (I2). If I truly believe that you intend to do B, then it is optimal for me to do B (and vice versa), regardless of the Pareto inefficiency of the action profile (B,B). To phrase this another way, in the context of optimization, individual intentions give rise to individual optimality constraints, whereas shared intentions give rise to joint optimality constraints. In the latter case, the object which is chosen optimally is a vector of the actions of multiple players. 2 There has been debate amongst philosophers as to whether shared intentions can always be reduced to individual intentions plus beliefs about the intentions of others (Bratman, 1992;Butterfill, 2012;Gilbert, 1990;Gold and Sugden, 2007;Searle, 1990;Tuomela and Miller, 1988;Velleman, 1997). The current paper shows that the behaviour implied by the sharing of intentions, that is the joint optimization of action choice, can evolve in the absence of hierarchical beliefs (I think that you think that I think...) and other complex modes of reasoning.

Psychology
The current study is further motivated by recent work in developmental psychology. Experiments with children suggest that the collaborative urge in humans is a primal one, which develops early in infancy, prior to much of our aptitude for rational inference, and certainly prior to our ability to articulate complex hierarchical beliefs such as are required by traditional game theory (See Tomasello and Rakoczy, 2003 for a summary, as well as Tomasello et al., 2005 and the critical responses contained therein). Moreover, this collaborative urge is considerably weaker in non-human great apes (Tomasello and Carpenter, 2007;Tomasello and Herrmann, 2010;Wobber et al., 2014). To summarize in the words of Tomasello (2014): ...humans are able to coordinate with others, in a way that other primates seemingly are not, to form a "we" that acts as a kind of plural agent to create everything from a collaborative hunting party to a cultural institution. (1) This accumulated evidence has provided support for the hypothesis that the human ability to collaborate led to the development of our unique cognitive skill set, rather than causality running only in the opposite direction. In short, human collaborative activity provided a niche in which sophisticated modes of reasoning could evolve. This is known as as the shared intentionality hypothesis (Call, 2009) or the Vygotskian intelligence hypothesis (Moll and Tomasello, 2007;Tomasello, 2014;Vygotsky, 1980). The principal philosophical work to which Tomasello and his coauthors make reference when discussing the sharing of intentions is the work of Bratman (1992). However, although Bratman does not argue that shared intentions can be wholly described by individual intentions plus beliefs, he does make use of the notion of common knowledge (I know that you know that I know...ad infinitum) in his discussion of shared cooperative activity. Of course, common knowledge requires hierarchical thought and thus a considerable degree of sophistication in reasoning. This raises a potential problem of circularity: if collaboration requires sophisticated reasoning, how could collaboration have arisen prior to such reasoning? Our work addresses this problem with a very clear answer. Consistent with the philosophical analysis discussed in Section 1.1 and with the psychological evidence summarized in statement (1), we model the sharing of intentions as collective agency: individuals who can share intentions will sometimes come together and adjust their actions to their mutual benefit. That is, they engage in joint strategic choice.
(i) We show that joint goal oriented behaviour, as implied by shared intentions, could evolve prior to sophisticated reasoning (our agents are myopic optimizers). This is plausible: developmental studies of children indicate that they can undertake intentional action at earlier ages than they can understand beliefs (Baron-Cohen, 1994;Call and Tomasello, 1999;Carpenter et al., 1998a,b;Wellman and Bartsch, 1994;Wellman et al., 2001).
(ii) Moreover, we show conditions under which such joint behaviour would not evolve. That is, the ability to collaborate is not always an unambiguous good. Sometimes the short term benefits to collaboration can work to the long term detriment of a society. In a model of multi-level selection this can work against the evolution of a collaborative disposition.
The above points show that the ability to collaborate and share intentions could be selected for or against differently, depending on geographical location, ecological conditions, climate variability and species. The last of these is particularly important, as any story that explains how humans could become the collaborative species we are today, should explain how it could be that this is less so in other great apes.

Game theory
There exists a large literature in cooperative game theory on the behaviour of coalitions. For a survey the reader is referred to Peleg and Sudholter (2003). There is a smaller but established literature at the intersection of noncooperative and cooperative game theory (See, for example Ambrus, 2009;Aumann, 1959;Bacharach, 2006;Bernheim et al., 1987;Konishi and Ray, 2003;Luo and Yang, 2009). There is also a large and established literature in evolutionary game theory that seeks to explain altruism ('cooperation'), where altruism manifests itself through the actions that players choose (See Eshel and Cavalli-Sforza, 1982;Nowak, 2006;Wilson and Dugatkin, 1997, and references contained therein). The sharing of intentions and collaboration is, however, different from altruism. Rather than manifest itself through the identity of chosen actions, as altruism does, shared intentions manifest themselves through how actions are chosen. Specifically, when actions are chosen, collaborating players optimize over joint strategy choice. This does not imply any kind of concern for the well-being of others, merely an efficient (in the short term) way of attaining higher payoffs in conjuction with other players. Work on the incorporation of such collaborative, coalitional behaviour into evolutionary dynamics forms a relatively new and rapidly growing literature (Newton, 2012a,b;Sawa, 2014), although considerable work has previously been done in the context of matching, which in effect concerns coalitions of size two (Diamantoudi et al., 2004;Jackson and Watts, 2002;Klaus et al., 2010;Newton and Sawa, 2015;Roth and Vande Vate, 1990). The result, used in the current study, that strategic updating by coalitions can both slow (the conservative effect) and hasten (the reforming effect) the convergence of a population to an efficient 'new' action, was first analyzed in Angus (2013, 2015). The cited work contains an in depth analysis of such effects, showing that they are supported on a wide variety of networks beyond the scale free networks that are used in the current study.

Mutualism vs. altruism
The current study concerns the evolution of the sharing of intentions and collaboration by individuals to their mutual benefit. That is, the sharing of intentions is a mutualistic activity and not an altruistic one, although an individual who shares his intentions could in theory have altruistic motives. Models of multi-level selection are by now standard in the literature on the evolution of altruism (See Bowles, 2006;Traulsen and Nowak, 2006). A recent survey and critique of this literature can be found in Rusch (2014). We use multi-level selection to model the evolution, and non-evolution, of the mutualistic behaviour that is collaborative action choice in a game. To model this without confounding issues of altruism we use a simple coordination game as the underlying game.
Entries are interaction-payoffs of an individual whose strategy is given by the row when interacting with an individual whose strategy is given by the column.
The altruism literature focuses on explaining efficient coordination in prisoner's dilemma type situations. The question of whether efficient collaboration would always evolve in situations in which individuals' interests are perfectly aligned has, to the best of our knowledge, not been addressed. West et al. (2007) distinguish four types of behaviour: selfishness -actions that benefit oneself but harm others (+, −), mutualism (+, +), altruism (−, +) and spite (−, −). Rusch (2014) notes that selfishness and mutualism are often taken for granted, with research focusing on the evolution of altruism and spite. The current study indicates that this neglect of the evolution of mutualism should be reconsidered, and that pairwise and small-group mutualistic interaction (coalitional updating) can sometimes work to the detriment of the welfare of the larger society (the deme). It is thought that many instances of collaboration amongst hunter gatherers are mutualistic. Examples include hunting for whales and other large animals (Alvard, 2001;Alvard and Nolin, 2002) and fishing expeditions (Sosis et al., 1998). See Smith (2003) for a survey. It has been subsequently noted by Bird et al. (2012) that group hunting may not always be mutualistic for all members of a hunting party as, for example, sharing norms may disproportionately reduce the take of the best hunters. However, despite the usefulness of modern anthropological evidence, our model does not concern modern hunter-gatherers, and the authors find it hard to conceive of a species developing sophisticated egalitarian cultural institutions prior to the ability to collaborate in simple mutualistic situations.
2 Multi-level selection 2.1 Strategy updating Strategies are updated by single individuals but also by pairs of individuals who can share their intentions. A pair of players can only share intentions if both players in the pair are SI types and they are neighbours on the interaction graph. Each period within a generation, either one individual or one pair of individuals is randomly selected to update their strategy.
The algorithm by which the set of updating individuals is selected is as follows. Firstly, a single individual is selected uniformly at random from all of the individuals in the deme. Let this individual be labeled i. If individual i is an N type, then he will update his strategy on his own. We return the set of updating individuals {i}. If individual i is an SI type, then uniformly at random choosek ∈ {1, 2}. If k = 1, then return the set of updating individuals {i}. Ifk = 2, then uniformly at random choose a set of players from the set of all feasible coalitions and return this set as the set of updating individuals. For example, if i has three neighbors who are SI types, j 1 , j 2 and j 3 , then there are four feasible coalitions: {i}, {i, j 1 }, {i, j 2 }, {i, j 3 }, any of which may be returned as the set of updating individuals.
Note that due to combinatoric considerations, when there are many SI types in a deme, the number of feasible coalitions of sizek grows rapidly ink. Therefore, it is to avoid a strong bias in favour of larger coalitions that we select the maximum coalition sizek before randomizing over possible coalitions.
When a single individual (i.e. not a pair) is selected, he plays a better response to the strategies of the other individuals. That is, holding the strategies of all other individuals fixed, he chooses some strategy ('old' or 'new') that gives him a payoff at least as high as his current payoff. If both strategies give a payoff at least as high as the current payoff, then he chooses uniformly at random.
When a pair of neighbouring SI types is chosen to update their strategies, they can share intentions and collaborate in choosing identical strategies (both choose 'old' or both choose 'new'), but will only do so if by doing so they obtain payoffs at least as high as their current payoffs, holding the strategies of all other individuals fixed. If no such opportunity for successful collaboration exists, both individuals remain playing the same strategy as before. If there are multiple opportunities for successful collaboration; that is, both individuals switching to 'old' and both individuals switching to 'new' would weakly increase the payoffs of both individuals in the pair, then either of these possibilities occurs with probability one half. In summary, the pair plays a coalitional better response (Newton, 2012b;Newton and Angus, 2015).
In our comparison treatments we sometimes include the possibility of updating coalitions of players that contain more than two players; that is, k > 2. For these treatments, every member of a coalition chosen to update must be an SI type. Furthermore, coalitions chosen to update must be connected in the following way: the subgraph induced by restricting the interaction graph to the vertices corresponding to the individuals in the given coalition must be connected. The interpretation of this is that individuals who share their intentions as part of a coalition must have some form of communication and interaction in order for them to do so. Note that for k = 2, this description of feasible coalitions reduces to our previous definition of feasible coalitions as neighbouring pairs on the interaction graph. The process of selecting an updating coalition is as above, but withk being selected uniformly at random from {1, . . . , k} instead of from {1, 2}. Coalitions of size k > 2 play coalitional better responses in a similar manner to coalitions of size k = 2 as described above (replacing 'both individuals switching' with 'all individuals in the coalition switching').
Finally, when any player has the opportunity to update his strategy, we allow him to make a mistake: independently with small probability ε any updating individual switches to a random strategy instead of to their intended strategy (Young, 1993).

Deriving thresholds for example networks
Here we illustrate threshold effects for conservative and reforming effects of SI through a discussion of two example networks, the network of overlapping triangles of Figure A(a) and the square lattice with von Neumann neighborhood of Figure A (b). For the purpose of this example we shall only consider α < 3. Consider the situation illustrated in Figure A(a) in which all individuals are playing strategy 'old' except for the individual labeled a, the individual labeled b, and their mutual neighbor. Note that such a strategy profile can be reached irrespective of whether any of the individuals in the population are SI types. For example, starting from a profile at which every player plays 'old', it might be the case that individual a has an updating opportunity, makes a mistake (with probability ε), and switches to 'new' when his best response is to remain playing 'old'. Individual b could make a similar mistake, switching to 'new', following which the best response of the mutual neighbor of a and b, when called upon to update, would be to switch to 'new'.
Consider the individual labeled a in Figure A(a). Let o a be the number of his neighbors who play 'old', and n a be the number of his neighbors who play 'new'. If individual a is selected to update his strategy as an individual (whether he is an N or an SI type), he will gain from switching from 'new' to 'old' whenever αn a < o a . Now, at the strategy profile depicted in Figure A(a), o a = 2 and n a = 2, so this condition becomes α < 1, which is impossible. Hence a will remain playing 'new'. Now, consider the case that neighboring individuals a and b are both SI types and are selected to update strategies as a pair. They will both gain from switching from 'new' to 'old' if αn a < o a + 1 and these conditions both become α < 3 /2. So, from the strategy profile depicted in Figure  Let o c be the number of his neighbors who play 'old', and n c be the number of his neighbors who play 'new'. If individual c is selected to update his strategy as an individual (whether he is an N or an SI type), he will gain from switching from 'old' to 'new' whenever αn c > o c . Now, at the strategy profile depicted in Figure A(a), o c = 3 and n c = 1, so this condition becomes α > 3. Hence c will remain playing 'old' unless α is very high. Now, consider the case that neighboring individuals c and d are both SI types and are selected to update strategies as a pair. They will both gain from switching from 'old' to 'new' if α(n c + 1) > o c and α(n d + 1) > o d . As o c = o d = 3 and n c = n d = 1, these conditions both become α > 3 /2. So, from the strategy profile depicted in Figure  A(a), if α > 3 /2, the presence of SI types can cause additional pairs of individuals adjacent to the set of individuals choosing 'new' to switch to 'new'. The set of individuals playing the new technology can be expanded by coalitional better responses.
Hence, we see that for the network of overlapping triangles, an important threshold value is α = 3 /2. For values of α below this threshold, SI slows the adoption of new technologies. For values of α above this threshold, SI speeds the adoption of new technologies. The argument above can be repeated almost word for word for the square lattice with von Neumann neighborhood with strategy profile and labeling of individuals given by Figure A(b). For precise, analytic results for vanishing mistake rates (ε → 0) we refer the reader to Angus (2013, 2015). The cited work also illustrates the difficulties in determining analytically the values of thresholds for general, more realistic social networks such as the ones used in the current study.

Reproduction and selection within demes
Each period within a generation, the payoffs of each individual in a deme are determined by the payoffs in the game in Table A. For example, consider a period in which a deme is at technology level τ . Consider an individual within that deme who is playing 'new' and has three neighbours playing 'new' and two neighbours playing 'old'. The payoff of this individual will be 3ατ /5, his average payoff across all of his neighbours. The fitness of the individual in a given generation is the sum of these payoffs across every period in the generation. A fitness vector is constructed in which the fitness of individual i, 1 ≤ i ≤ n within the deme is the ith entry in the vector. The vector is normalized so that it is a unit vector. Then the value in the ith element of the vector gives the probability with which any given child born into the next generation will be the offspring of individual i. To illustrate, assume we have a fitness vector (p 1 , p 2 , . . . , p n ). Then, any given individual born into the next generation will, independently, be the offspring of individual i with probability p i . With probability 1 − µ, the offspring of individual i will have the same type, SI or N , as individual i. However, with probability µ any given offspring will undergo a mutation and be the opposite type to her parent. In this manner, all n positions in the deme for the subsequent generation are filled.
Note that the replication here is nothing more than a discretized version of the replicator dynamic. Naturally, in the finite setting, there are some differences to the continuous population replicator dynamic. The clearest difference is that even without considering selection or mutation, when starting from any mixed population, the randomness in reproduction, genetic drift, can eliminate either of the types from the population. Genetic drift assists in creating variation in the proportions of types within demes, so that higher level selection between demes can then take place.
α τ should be understood as the within-deme relative fitness benefit of using technology τ + 1 rather than technology τ . Payoffs and fitness within demes depend directly on α. Unlike the model of Bowles (2006), we do not assume the presence of an egalitarian food sharing norm. Such a norm, if it did exist, would enter our model through weaker within-deme selective pressure. Our simulations show that selective pressure within demes acts in favour of SI ( Figure G). This effect would be weaker if egalitarian norms existed.

Deme extinction and survival
Each generation, any given deme, with probability η, faces an invader who is one of the other demes drawn at random. If the incumbent deme has higher technology than the invader, then nothing changes. If the invader has higher technology than the incumbent, then the incumbent deme is eliminated and its place is taken by a replica of the invading deme (same number of SI and N types, same technology level, same payoffs). If the incumbent and the invader have the same technology level, then the incumbent is replaced with probability one half.
Note that α affects inter-demic contests only indirectly through its effect on technology adoption. It is possible to set up the model differently, so that rather than old and new technologies giving payoffs of 1 and α τ respectively, they give payoffs of α τ and α τ +1 , with higher values of τ associated with higher values of α τ . Group fitness can then be made to depend on cumulative fitnesses of individuals in the deme. Test simulations indicated that this approach gives similar results to our chosen approach, but is significantly more computationally demanding.
We examine η = 0.05, 0.10, 0.20, with the middle value being our benchmark value. This value, η = 0.10, corresponds to a deme extinction rate of approximately η /2 = 0.05 per generation. This is less than the benchmark rate of 0.075 used in Bowles (2006). That is, the benchmark rates of 'conflict' used in the current study are lower than those used in the most comparable existing study. Furthermore, our results are robust to lower and higher conflict probabilities. We refer the interested reader to the Supporting Online Material to Bowles (2006) for arguments and references in support of such rates of conflict. Key citations therein include Frayer and Martin (2014); Hill and Hurtado (1996); Keeley (1996); Kelly (2000).
Importantly, unlike Bowles (2006); Choi and Bowles (2007), where a higher prevalence of altruistic types is assumed to lead to greater success in conflict, the type of individuals (SI or N) in the current study has no direct effect on probabilities of deme extinction or duplication. We do not, in the words of Sterelny (2014), 'lump the civic and the military virtues'. This means that, although the deme extinction and replacement events of the current study can be understood as the outcome of conflict, they can also be understood as representing the outcome of bloodless competition over land or resources in which higher technology gives an advantage, or simply higher survival (conversely, extinction) rates for demes with higher (conversely, lower) technology.

Deme extinction: examples
In Figure B    In Figure B the low-α regime is visualised. At the conclusion of generation 20 (Figure B a), four interdeme contests are material, the demes being successfully invaded are indicated by the × symbol and the new, daughter, demes have been added to the top of the 'stack', and are encircled. In this case, since the new demes and the invaded demes exist at the same 'old' technology level for all contests A, B, C, D, the outcome of the contests have been decided by an equiprobable, random, choice as competing technologies would have been equal in all cases. At the conclusion of generation 20 we can also see several demes obtaining higher technology step levels of 2 and 3.
By generation 40 ( Figure B b) a larger diversity of deme technology is apparent, with a degree of type-based superiority being established: only high N type fraction demes exist at technology step 4. Two material contests are apparent (A and B), both contests having been decided by technological superiority: both invaded demes sat at technology step 1 whilst the invading demes had advanced to technology step 3. Thus, the daughter demes, replacing the invaded demes, are created at technology step 3, a key part of the dynamics of technology and types in our model.
By generation 80 (Figure B c) almost all demes are dominated by N type individuals. Material interdeme competition between technologically unbalanced demes is still possible (as evidenced by event A) but such events, likely between demes of similar SI or N type fraction, will not result in material changes to the overall population fraction of SI or N types.
Hence, in these three 'frames', we can see the emergence of technologically superior, N type dominated demes, which, over time, via inter-deme contests, ultimately greatly reduce the number of SI types in the population. Any emergence of SI types in a given deme due to drift, will be short-lived, as for α less than the threshold value for these parameters, demes with high numbers of SI types suffer from the conservative effect of shared intentions (k > 1) and will eventually fall behind the technology frontier.
In Figure C an above threshold scenario is visualised, with benchmark conditions and α = 2.2 used. Again, we visualise a single replicate at generations 20, 40 and 80. In this case, generation 20 (Figure C a) reveals a wider range of technology in use across the demes owing to the higher rate of new technology adoption implicit in the higher α value. Nevertheless, already at the conclusion of generation 20 it can be seen that the incidence of demes dominated by N types is low, with higher technology steps more likely to be occupied by demes with mid to high numbers of SI types, leading, over time, to the eradication of demes with high numbers of N types.
By generation 40 (Figure C b), the dominance of SI types is entrenched, with the few remaining demes with high numbers of N types soon to be eradicated. By generation 80 (Figure C c) demes with few SI types are non-existent, with any remaining variation in type due to mutation and genetic drift. 3 that of 12 linguistic zones, 9 are listed as having a number of languages and dialects in the range 45-103. Our choice of n = 32 is similar to that in Bowles (2006) and is an estimate of the number of individuals who are able to breed in any given generation, that is approximately one third of census size. In addition, our choice is informed by the survey of Hill et al. (2011), who analysed 32 modern hunter-gatherer societies (total 5,067 individuals) and found that mean 'band' (adult members of a residential unit) was 28.2 (range 5.8 to 81.6). d, which determines the number of edges in our scale-free graphs, as discussed in Section 4, was constrained by the computing power at our disposal, although it seems intuitively plausible that most individuals will be predominantly influenced by relatively few others (friends, family, hunting partners).

Choice of benchmark parameters
α is the focus of our treatments and is examined across a range of values that lead to both the evolution and non-evolution of SI.
Our choice of T = 2000 assumes that updating of actions is relatively rare, with the opportunity for some set of individuals to update arising every 3-4 days. We could have used larger values of T . What this would do is to exaggerate the technological differences between demes who are gaining technology fast and those who are gaining it slowly, increasing the selective effect at the inter-demic level.
We focus on small values of k because we wish to examine the evolution of collaboration and it is likely that the ability of pairs or small groups to share intentions would have to evolve prior to the ability of large groups to do likewise. Tomasello (2014) regards pairwise sharing of intentions as a special case as it only requires consideration of the first and second person (me and you) and not the third person (him). We agree with this reasoning and choose to make k = 2 our benchmark maximum coalition size.
The benchmark choice of ε = 0.05 was relatively arbitrary and corresponds to a mistake rate of one in twenty.
The mutation rate of µ = 0.001 is high, but still only corresponds to an average of two mutations occuring in the metapopulation every generation. Working with lower mutation rates slows down initial waiting times until homogeneous populations become heterogeneous, but does not change the dynamics which occur thereafter, which is the object of interest in the current study.
Finally, the conflict probability η was chosen to be less than previous estimates. That is, we do not require unreasonably strong deme-level selection to get our results. This is further discussed in Section 2.4.

Network type & Characteristics
Social networks play two important roles in the study. First, in each period individuals undertake pairwise productive activities with their neighbours, the product of their labours being determined by the coordination game in Table A. Second, up to k-vertex, connected subgraphs of SI types are able to jointly revise their strategy (see Figure 2 in the main paper). The same network informs both production and strategy revision. Unique social networks are generated afresh each generation for each deme, are always single-component and undirected, and are not altered during the generation.
We use 'scale-free' (SF) networks having an approximately power-law degree distribution (Barabasi and Albert, 1999). That is, the probability P (d) of a vertex having d adjacent neighbours decays as a power law, P (d) ∼ d −γ . These networks are known as SF since for a certain range of γ, the average of the degree distribution does not converge -there is no 'characteristic' or 'expected' degree. In comparison to random networks where the function P (d) decreases exponentially in d, the SF distribution exhibits so-called 'fat-tails': a much higher mass is located at large degrees than is the case with the Gaussian distribution.
SF networks have been discovered in many social, biological and physical systems (for a review, see Barabasi, 2009). Whilst it is impossible to identify the network structure of long-gone civilisations, recent, detailed study of the Hadza hunter-gatherers of Tanzania, a potentially representative Pleistocene-like culture, has demonstrated remarkable similarities between the social networks displayed within the Hadza and modern social network characteristics (Apicella et al., 2012). For instance, P (d) was found to differ significantly from a random network distribution, with fat-tail phenomena present; ties were found to be strongly reciprocal (e.g. if A nominated B, implying A → B in g, then with high probability B nominated A, implying B → A in g, or simply A ↔ B in g, note: nominations were private); and, ties were strongly assortative -high in-degree vertices nominated more social contacts, whilst vertices with high out-degree were more likely to be nominated. Together, these features point to SF networks as being a reasonable analogue. To build the SF social networks, the Scale-Free Network Generator algorithm (Barabasi and Albert, 1999) was used as implemented in Matlab by George (2007) and freely available online. 4 Example SF social networks of size n = 32 and approximate average degree 4, 6 and 8 are visualised in Figure D. Common, average charactersitics of the social networks used in the study are given in Table C after measuring each characteristics over 10,000 replicates of the network generator at each average degree level. 5 The last columns of the table include an equivalent ensemble characterisation of 10,000 random graphs (built by 100% random rewiring of a regular lattice) at approx. degree 6. Comparison of the the SF (study) (d 6) to the random graph data reveal expected higher clustering (0.32 to 0.17), equivalent path length (2.1 to 2.1) and diameter (3.7 to 4.0), and much larger average maximal degree (16.0 to 9.8) in the SF networks over the random graphs. It is worth noting that whilst we believe the choice of SF social networks to be the most appropriate given considerations above, our previous work Angus, 2013, 2015) demonstrates that conservative and reforming effects of k > 1 strategic revisions in social networks are observed in a wide range of network types including regular, 'small-world' and random versions of lattice networks, along with five empirical social networks drawn from a range of contemporary sources (n range 22 to 379).

Coalition formation
Strategy updating is either individual or proceeds by coalitional strategy updating in which a pair (k = 2) or more (k > 2) of SI type individuals, who are neighbours in the social network, can share intentions and develop a coalitional better response. Within each replicate of our study, each social network g| n,d is likely unique given the vast space of such networks, hence, for each deme, at generation initialisation, after the social network has been generated, a feasible coalitional database formation algorithm is run. The resultant database is stored in a specific data-structure which enables fast recall since the strategy updating step is called once every period (i.e. 128,000 times per generation) with, for example, around 50% of these calls for k = 2 being coalitional in nature.
For k = 2 the coalitional formation algorithm is straightforward: all edges in g between two SI types are viable coalitions and are added to the database.
For k = 3 the set of coalitions of size 2, is used as seed-coalitions, each seed-coalition being addressed in turn, adjacent SI type vertices being identified, and new size 3 coalitions being added to the database if not present already.
For larger k, the algorithm proceeds iteratively as for k = 3, building up from size 2, then 3, .. and so on, until the full coalitional set for the given k is found.
Naturally, for low SI fraction demes, the coalition database formation step is very fast. As SI fraction approaches 1.0, and k 2 the algorithm can take several seconds, as the number of feasible coalitions becomes very large. However, for small k the algorithm is very efficient: up to k = 3 coalition formation with a 100% SI type, n = 32 SF network, plus 10,000 calls to the library takes ∼ 3s on a single core machine.
Run-time library lookup proceeds first by an equiprobable choice of coalition size up to k. For example for k = 2, approx. 50% of the time a single individual will be selected for strategy revision, and approx. 50% of the time, a k = 2 coalition will be considered, but only if the starting individual is of SI type. Even in this case, a 2 member coalition may not be feasible because the SI type starting individual has no SI type neighbours in g.
Computationally, an alternative approach to pre-construction of the feasible coalitional set for a given g and k would be to conduct run-time coalition formation, i.e. randomly select some individual, and build a coalition of up to size k including that individual (if SI type, and having SI neighbours). However, simulation testing indicated that pre-defining the universe of feasible coalitions of up to size k for a given g and run-time look-up was around 11 times faster than run-time coalition formation.

Robustness
To explore the robustness of the study's main results to variations in the key parameters a full-factorial design survey of the convergence properties of the model under low (α = 1.2) and high (α = 4.0) rates of technological change was conducted. Four parameters were varied over a wide treatment set, given in Table D, leading to 81 experiments in all (see Table E for details).
In the low α treatments, individuals were randomly assigned to SI or N type with 0.50 probability at initiation, whereas in the high α treatments, experiments were initiated with a full-N type population, echoing the approach of the respective studies reported in the main paper. Each experiment was conducted over 20 unique random seeds (i.e. 1,620 model runs in all).
Since the focus of these experiments was to test whether the dynamics would eradicate, or fully establish, SI types in the population under low, or high, rates of technological change respectively, each replicate was stopped, with the generation number recorded, when the convergence criteria was met. Convergence required more than 75% of the demes (i.e. at least 49 of 64 demes) to each exhibit at least 90% population fraction of N types (low α treatments) or SI types (high α treatments) respectively at the conclusion of the generation. Convergence times (in generations) are given (mean, s.d., and range of 20 replicates) for all 81 treatments across both the low and high α settings in Table E. No replicate at any combination of parameters within the variable ranges specified failed to converge as defined within 2000 generations, with most replicates much faster. Figure 4 in the main paper illustrates the mean final SI type population fraction after 500 generations across a range of α values with the system initialised to equiprobable SI or N type, at the benchmark parameter settings.

The long run
In Figure E, we repeat these simulations but allow them to run for a further 1,500 generations (i.e. to generation 2,000) to demonstrate the stability of the N and SI type regime under α of 1.2 and 2.2 respectively, which lie either side of the phase-transition point. To demonstrate the speed of SI type dominance under technology conditions above the transition point, the α = 2.2 experiment (panel b in Figure E) was initiated with the SI type absent from all demes. In Figure F we present long-run simulations of the slowest parameter combination identified in the robustness experiments for k = 2, namely, the {d = 8, η = 0.05, ε = 0.025} experiment (refer Exp 19 in Table E). As with the robustness experiments, initiation saw 50% and 0% SI types in the starting population for the low (α = 1.2) and high (α = 4.0) regimes respectively.
Since very low conflict and strategy updating mistake-rate probabilities in the low α experiment give rise to very mild population selection dynamics, we run 40 replicates at the low α regime and present, in Figure F panel a, the 25th-75th percentile range, along with the median, across all replicates for clarity. We find that even under such weak population selection dynamics as given in this experiment, the N and SI type regimes are nevertheless remarkably stable over the long run within the low and high technology gradient respectively.
Given that the full-factorial robustness exercise demonstrated that all other parameter combinations imply stronger selection dynamics than this treatment (shorter wait-times), it would seem that the predictions of the N and SI type outcomes under low and high values of α respectively are robust to specific choices of the parameters within the wide ranges tested.  Table E).
Dashed vertical lines in each panel give the mean convergence wait type (as reported in the table). a: The median (black line) and 25th to 75th percentile range (grey area) from 40 replicates is shown; b: The median (black line) and individual replicates (red line, 10 in all).

Placebo Trials
Here we report the results of placebo experiments to complement our main results. Placebo experiments serve two purposes: first, they provide one of the methods of model validation; and second, they provide information on the key drivers of the main results of a given model. We conduct placebo experiments under the benchmark parameter settings, for a below and above threshold α value over 500 generations, shadowing the trials reported in Figure 4 of the main study.
The two key stages of the model we enrol in the placebo experiments are the group and individual selection stages, represented by the deme-deme conflict, and reproduction stages respectively (refer Figure 1 of the main paper). Deme-deme conflict was switched 'on' or 'off' simply by entering the deme-deme conflict module or skipping it respectively. Alternatively, reproduction was switched from replicator dynamics (RD) as used in the study ('on'), to uniform probabilistic selection in the placebo 'off' setting. In Figure G (left) we report the placebo results of a below-threshold treatment (α = 1.6), complementing Figure 4 (Phase I) in the main study. As can be seen in the figure, both placebo trials where deme-deme conflict was skipped obtained no SI or N type dominance across the population, irrespective of the use of uniform probabilistic or RD reproduction ( Figure G a and b). Whereas, when the Deme Extinction stage was used as per the main study (c), without RD reproduction, the study results were recovered.
In Figure G (right) we report the placebo results of an above-threshold treatment (α = 2.2), complementing Figure 4 (Phase II) in the main study. Here, it is apparent that SI types can predominate in the metapopulation even without inter-demic selection. However, this dominance is further strengthened by inter-demic selection.
Next, we explore the robustness of the model's main results to two additional manipulations: i) interdemic migration; and ii) less than 100% replacement of members of an invaded deme by replicas of members of a succesfully invading deme.

Migration
Amongst studies which explore the emergence of individual behavioural traits and include selection at an inter-demic level, migration between demes is sometimes considered. A recent survey of eight such papers finds that three of them consider migration but five do not (Table S1, Rusch, 2014). Migration of individuals amongst demes is of interest since any process which leads to the accumulation of substantial inter-demic diversity will be tempered or extinguished completely by inter-demic mixing (Bowles, 2006).
Migration is not part of our benchmark model. To explore the influence that migration might have, we add a random migration stage to the benchmark model. This takes place after the reproduction stage and prior to the next generation (refer Fig. 1 of the main paper). A single migration event was implemented as follows: 1. Choose two different demes at random, A and B; 2. Choose at random, individual i from deme A, and individual j from deme B; 3. Move individual i to deme B and individual j to deme A. Note that this only has an effect if i and j are of different types, in which case it is equivalent to changing the type of a chosen individual in each deme.
Since the migration process occurs at the end of each generation, no complications arise from withingenerational concerns such as payoffs and social networks. At the beginning of the next generation, no distinction is made between new immigrants and the rest of the individuals in a deme.
We conduct experiments where 1 /2M mn migration events occur each generation. M is a new parameter, the migration probability, and m, n are the number of demes and the number of individuals per deme as per the benchmark model. Note that the migration process is necessarily neutral with regards to deme size, and as such, causes the movement of two individuals (not one) per migration event. Consequently, to allow for comparison with related studies, we add the 1 /2 multiplier to obtain the number of migration events per generation.
In Figs. H & I we present the results of the benchmark model with migration added at low-and highvalues of α. We provide in each figure the benchmark results for comparison. Selection against SI for low α holds for M up to 2.5% and breaks down for M ≥ 5%, which is approximately equivalent to full mixing every twenty generations (Fig. H). For the high α case, selection for SI withstands any level of migration considered (Fig. I).
The level of migration under which selection against SI types is retained (for low α) is not dissimilar to similar thresholds in related work. For example, in the study of García and van den Bergh (2011), where the benchmark scenario has no migration, a migration rate of 5% requires a dramatic increase in the benefit/cost ratio of the studied social dilemma game to ensure that 'altruists' are selected for.

'Brutality' in conflict and replacement
In the benchmark model, if a deme goes extinct it is replaced by a replica of the (usually higher technology) 'invading' deme. This can be thought of as the higher technology deme 'winning' a conflict scenario and  effecting a policy of 100% 'brutality' on the loser (resulting in the complete replacement of the losing deme by the winning deme); or, one can think of the higher technology deme undergoing a fissioning process (resulting in the complete 'displacement' of the lower technology deme due to superior rates of productivity and associated fecundity). Either interpretation is appropriate for our study, as the type of individuals only effects inter-demic selection indirectly through technological advance (see Section 2.4 for further discussion). Nevertheless, amongst related studies which consider a 'conflict' interpretation of inter-demic interaction, the fraction of the losing deme which is replaced by the winning deme -the level of 'brutality' -is sometimes varied away from 100%.
Of the six relevant studies considered in Table S1 of Rusch (2014), only that of Choi and Bowles (2007) departs from 100% brutality. In the cited work, brutality is determined endogenously as a function of the relative strengths of the conflicting demes, with resultant average levels of brutality as high as 40% reported for some equilibrium states. In transient states, brutality levels of 0% to 100% are employed. However, comparison is not straightforward due to the far more elaborate deme-deme interaction scenario employed in the cited work (see Section 7 of the Supplementary Online Content of the cited paper for details).
In the spirit of Choi and Bowles (2007) we implement a less-than-unity 'brutality' experiment which respects the relative 'strengths' (technology steps) of the demes. Let the degree of replacement of the population of successfully invaded demes be given by β. We introduce parameter a to vary the level of β which applies when a deme is successfully invaded by another deme which has a technology level δ steps higher.
β(a, δ) = 1 − 0.5 aδ+1 . (2) In our benchmark setup, since a successfully invaded deme is replaced by a duplicate of the invading deme, all individual and group characteristics are copied over, including the technology step of the invading deme.
In keeping with this approach, we set a floor of 50% on β, such that the new deme is populated with at least 50% of the individuals from the invading deme, with the remainder of the individuals coming from the invaded deme. This majority of individuals from the invading deme bring with them the invading deme's level of technology. After every successful invasion, β is computed via (2), and a number (βn to the nearest integer) of individuals from the invading deme is chosen to replace the same number of randomly chosen individuals in the invaded deme. In both cases, random selection is conducted without replacement.
We note that this approach to brutality/displacement is inherently stochastic. Particularly, early in any given simulation the difference in the share of SI or N types between two demes is likely to be small, meaning that random selection is not at all guaranteed to lead to an increase in the type of individual that is represented in greater numbers by the invading deme. In this sense, whilst our floor of 50% brutality may appear high, the untargeted, stochastic, nature of selection implies a much weaker form of transmission from 'winner' to 'loser'. Further note that although we test reduced brutality/displacement, we do not correspondingly increase the probability η of such events occurring. That is, the rate of replacement of individuals by invaders during the Deme Extinction stage drops below its benchmark value of η /2 = 5%.
Example evaluations of (2) are given in Fig. J. The specification implies that the brutality/displacement experienced by an invaded deme will be greater the more it lags behind the technology step of the invading deme. Parameter a enables control over the responsiveness of β to the difference in technology step, δ. Early in a simulation run it will typically be the case that δ = 0 since demes start level in technology. Consequently, the level of displacement is set to the 50/50 floor. As the simulation progresses, technology differences may emerge, leading to δ > 0 and consequently increasing β (when a > 0).
Again, the benchmark model was evaluated at the low-and high-α scenarios with a range of parameter values for a ∈ 0.0, . . . , 2.0. Figures K and L give the results of each scenario respectively. Selection against SI at low α is returned for a range of treatments up to a ≤ 0.25, whilst in the high α there is selection for SI for the full range of a explored.
To understand in more detail the way that (2) applies within the low α regime, we provide, in Fig. M, density plots of realised β for early and late generations across each treatment where a > 0. As expected, early in the simulation (a, top panel) a non-zero fraction of β realisations are at the β = 0.5 or 50/50 brutality/displacement level due to interacting demes having equal technology. However, as the simulation progresses (b, bottom panel), inter-demic differences in technology become the norm, with β densities  Figure K: Robustness of selection against SI to 'brutality' < 100% at low alpha (α = 1.6). Results of a: SI population fraction; and b: technology rate over all 64 demes, 5 replicates and generations 451-500 mirror those of Fig. 4 in the main paper at benchmark conditions however with β ('brutality') varied away from 100% by parameter a as indicated. Below the figure, average values of β are given over either the first-, or last-, 50 generations of the simulation run for a > 0. Benchmark results (β = 1.00) given at left for comparison.  shifted to higher values. Note that even the a = 0.5 treatment, where the benchmark result for low α is retained, β values are drawn from a wide support on the unit interval, away from unity, even late in the simulation. Hence, we conclude that the model's main result in this regime is robust to considerable variance in the displacment level applied.

Numerical simulation
The model was implemented in the Matlab programming language and run using one of three versions: R2013b, R2014a or R2014b as the project developed. The software were developed by the authors specifically for the project and all visualisations and data analysis was likewise conducted within the Matlab environment. In addition to the main Matlab primitive toolsets, the parallel and statistics toolboxes were used to parallelise computations (see below) and to conduct fitness-biased selection of parents during RD or uniform intra-deme reproduction. Post-processing of figures was achieved with OmniGraffle. All simulations were conducted using the parallel toolbox with 8 to 24 threads on one of two MacPro platforms (2010 or 2014 models). Parallelisation took place at the level of demes, i.e. sending a random selection of deme computations to be run in parallel for one generation. Any calling instance of Matlab was run without graphical-user-interface from the Mac OS X Terminal or X11 xterm, typically over ssh.
Random number control was achieved by initiating each replicate with the replicate number as the seed to the Matlab stream method mt199937ar (Mersenne Twister with Mersenne prime 2 199937 −1). Stream initialisation ensured that any given replicate had identical, and reproducible, initial population conditions (initial individual SI or N type and technology choice, and assignment to a given deme). During generation initialisation, the network generator algorithm updated the stream with a unique, system clock-based seed, prior to network generation to ensure that our results did not depend in a sharp way on a particular sequence of networks being chosen from the vast global network space.
As an indication of run-times, utilising the 2014 MacPro with 12 cores (12 GB RAM, SSD HD), a generation under benchmark conditions (2,000 periods, 64 demes of 32 individuals) took 4.3s, implying a single replicate of 500 generations (1 × 10 6 periods in all) as used, for example, in Figure 4 of the main paper, took approximately 36 min.