Skip to main content
Advertisement
  • Loading metrics

Exploiting individual differences to bootstrap communication

Abstract

Establishing a communication system is hard because the intended meaning of a signal is unknown to its receiver when first produced, and the signaller also has no idea how that signal will be interpreted. Most theories for the emergence of communication rely on feedback to reinforce behaviours that have led to successful communication in the past. However, providing such feedback requires already being able to communicate the meaning that was intended or interpreted. Therefore these accounts cannot explain how communication can be bootstrapped from non-communicative behaviours. Here we present a model that shows how a communication system, capable of expressing an unbounded number of meanings, can emerge as a result of individual behavioural differences in a large population without any pre-existing means to determine communicative success or strong prior constraints on language structure. The two key cognitive capabilities responsible for this outcome are learning to behave predictably in a given situation, and an alignment of psychological states ahead of signal production that derives from shared intentionality. Since both capabilities can exist independently of communication, our results are compatible with theories in which large flexible socially-learned communication systems like language are the product of a general but well-developed capacity for social cognition.

Author summary

A fundamental question in the field of language evolution is how a communication system can emerge in a population that is initially unable to communicate. One theory is that human communication is the product of social cognition, particularly the ability for partner’s minds to be focussed on the same thing, for example a concrete object or a more abstract sensation, during an interaction. We introduce a model that demonstrates that if such capabilities are sufficiently well-developed, an efficient communication system emerges as an inevitable product of agents learning intentional behaviour from each other. This system utilises all available signals, and can distinguish arbitrarily many ambiguous meanings to the maximum extent that constraints allow. These findings contrast with earlier studies, which suggested that it might be necessary for some limited communicative feedback (such as hand gestures) already to be present to build a larger system. One reason why the possibility of bootstrapping communication from purely non-communicative capabilities was missed may be that the mechanism for its emergence is extremely subtle. It relies on agents both having to be confronted with considerable ambiguity when they interact, and small differences in their behaviour being necessary for systematicity to grow.

Introduction

Human language is a large and flexible communication system that relates signals to meanings in a consistent manner across societies of up to millions of speakers [1,2]. These socially-learned mappings are established by convention [1,35]: in a communicative interaction, the signaller appeals to their experience to choose a signal that is likely to convey their desired intention, and the receiver does the same to draw a plausible interpretation. A fundamental question, relevant both to the acquisition [68] and origins [2,911] of language, is how agents can agree on which of a large repertoire of signals maps onto one of a potentially infinite number of meanings [12,13] when no convention has previously been established.

Many models of convention formation through social learning rely on feedback, such as pointing, to confirm whether a signaller’s intended meaning was correctly interpreted by the receiver [1417]. In evolutionary game theoretic terms [18], feedback delivers a reward for success that allows lucky guesses between a pair of agents to be amplified into community-wide conventions. However, such accounts fall foul of a signal redundancy paradox [19]: the ability to provide this feedback presupposes that a convention for the intended or interpreted meaning already exists. For example, if someone rubbing their stomach is sufficient to inform a receiver that the signaller is in an otherwise unobservable state of hunger, using the word ‘hungry’ redundantly duplicates communication that is already possible.

Analyses of innately-specified animal call systems, such as that employed by vervet monkeys to evade different predator types [20], sidestep this difficulty by eshewing any commitment to internalised meanings [3]. Instead, these analyses are couched in terms of actions and reactions performed by signaller and receiver, respectively [21,22]. Nevertheless, predation is a response to pre-existing observable behaviour and communicates a failure to react appropriately to the signaller’s action in a manner that is at least as forceful as pointing. Emergence of communication through natural selection is therefore afflicted by the same signal redundancy paradox. The fundamental question we address in this work is if there is any way to bootstrap communicative conventions without some form of communication already being in place.

The proposal that we pursue here is statistical learning of signal and meaning co-occurrences [2325]. By reusing a signal that has, by chance, been produced more often in a given context, a convention might emerge blind of any overt agreement on its meaning. Returning to the earlier example, if several hours have passed since signaller and receiver last ate, both are likely to feel hungry at the same time, and a receiver may correlate a signaller’s stomach-rubbing gesture or the word ‘hungry’ with their own unobservable sensations independently of any pre-existing means to communicate them. Repeating this gesture when experiencing such sensations in the future might then allow its association with an intentional behaviour to grow. Communication would then be said to exist when such associations are shared across the society.

Although this type of statistical learning has proven powerful for children to learn a set of pre-existing conventions from adults [13,26,27], evidence of its ability to build a communication system from scratch is equivocal. On the one hand, iterated artificial language learning experiments have shown that human participants introduce systematicity into initially random mappings between words and objects, despite never receiving feedback about production errors [28]. Simulations replicate this finding in very small societies [29,30]. However shared systems fail to get off the ground in larger societies [31]. Even with just two agents, the system that emerges lacks structure: a signal is assigned randomly to each meaning with no regard to utilising the full signal repertoire [32].

Taken together, these findings suggest that large societies of statistical learners might need to be equipped with strong expectations or constraints on language structure [6,33] to build rich and effective communication systems [34]. Examples of such constraints include an expectation that a word is more likely to describe a whole object rather than one of its parts [35], or that a novel term is more likely to describe a novel object [36]. An alternative is that various social cognitive mechanisms of shared intentionality [37] are sufficient to constrain the candidate meanings for a signal. Prominent among these is joint attention [38], whereby interacting agents are aware that they share a common focus of attention, which then provides candidate meanings for any signal that is produced. Neuroimaging studies [39,40] suggest that such correlated mental states can be created ahead of signal production, indicating that if communication emerges through such interactions, it can be bootstrapped from non-communicative capabilities, rather than relying on a pre-existing channel for communicating success or failure.

In pursuing this hypothesis, we further determine how tightly attention needs to be constrained to a small range of candidate meanings for communication to emerge, and how much the attention of interacting agents needs to overlap. To this end, we construct a model, described in Methods below, that allows for agents’ attention to different possible meanings to be carefully controlled. To aid interpretability of the results, we further ensure that the mechanism for learning any associated signalling behaviour is grounded in the solid principles of Bayesian inference [41,42] and information theory [43], as opposed to inventing ad-hoc rules. The final key component of this model is that agents behave predictably in a given situation [7,8,4446].

By applying a mathematical analysis, we compare how shared communication emerges in three different scenarios, and find that a qualitatively different cultural evolutionary mechanism operates in each case. When joint attention is tightly constrained to a single meaning, we find that signals become conventionalised by neutral evolutionary forces analogous to genetic drift [47]. This explains the lack of structure previously identified in this regime [32]. When attention is entirely unconstrained, and feedback is used to resolve the resulting ambiguity, signals that are most likely to be correctly interpreted are directly selected for. On the other hand, when candidate meanings are weakly constrained, a shared communication system can emerge as long as interacting agents distribute their attention over these candidate meanings in a sufficiently similar way. However, the selection pressure here is indirect, and arises only if there is sufficient variability in individual behaviour across the population. We further find that this mechanism for bootstrapping communication from non-communicative social cognitive capabilities scales to arbitrarily many meanings and signals, no matter how large the society, contrary to what was previously suggested by simulations [31].

Methods

We begin by setting out the principles behind the model that we introduce in this work, along with the details as to how they are implemented.

Distribution of attention

Intentional signalling behaviour evolves over time through repeated interactions between a pair of agents drawn from a society of N agents. In the simulation results shown below, these pairs are drawn uniformly at random (the interaction network is a complete graph), but the analytical results hold for a much wider class of interaction networks (see S1 Appendix).

In an interaction, one agent is designated the signaller (and labelled i) and the other the receiver (j). Both have their attention distributed over a set of M possible meanings. We denote the amount of attention given by the signaller to meaning m as and the amount given by the receiver as . See Fig 1. Note that the attention to a given meaning varies between interactions: we have, however, suppressed an explicit dependence on time to keep the notation manageable. These attentional weights are normalised so that they always sum to unity for each agent in every interaction: for all .

thumbnail
Fig 1. Attention of signaller and receiver to specific meanings.

In each interaction, signaller i and receiver j distribute their attention over a set of meanings (shapes) with varying weights (indicated by size). Variation between interactions at different times for the same agent is quantified by the certainty ; variation between signaller and receiver at the same time by the alignment .

https://doi.org/10.1371/journal.pcsy.0000078.g001

There are two important statistical properties of the distribution over attention. The first of these, the certainty C, quantifies how much the attention to specific meanings varies between interactions. For example, in the earlier interaction shown in Fig 1, the signaller is attending more strongly to the duck than in the later one. The weight determines the probability that they select meaning m as the topic of the interaction, that is, the meaning that they wish to communicate. When attention varies strongly between interactions, it must necessarily also vary between meanings within each interaction, as the total amount of attention is fixed. This means we can also interpret the certainty as a measure of how strongly constrained an agents’ beliefs are about the meaning of a signal produced in an interaction. For example, where strong constraints (such as whole-object or mutual exclusivity constraints [35,36]) are operating, attention is likely to be drawn to a small number of meanings that are specific to a given interaction.

In accordance with the above, we define the certainty as

(1)

where the expectation value and variance are taken over all possible interactions involving agent . The numerator specifies how much more likely it is for two topics to be the same if they are sampled with replacement from the same set of attentional weights (i.e., at the same time) than if they are drawn from two independent sets of attentional weights (i.e., at different times). The denominator is the maximum value that the numerator can take (occurring when attention is always focussed on a single meaning). Thus is normalised to the range , with corresponding to the situation where the attention distribution is static and completely unconstrained, and to where it is maximally variable and tightly constrained. We assume that all agents experience the same overall level of certainty, i.e., , a common value for all agents .

The proposed mechanism for the emergence of communication without feedback is that the signaller’s behaviour might correlate with the meanings that the receiver is attending to. The second key property therefore is how similar the signaller’s and receiver’s attention distributions are. For example, in Fig 1, although the weights are not exactly the same, we see that both agents attend more to the duck in the first interaction than in the second, while they attend more to the rabbit in the second than the first. Mechanisms of shared intentionality [37], including joint attention [30,38], can generate this alignment of the attentional weights.

We quantify the level of alignment between different agents i and j with the parameter

(2)

which is normalised so that . As with the certainty we assume that the alignment between any pair of agents takes the same value, Aij = A when . Although not exactly the same as Pearson’s correlation coefficient, its interpretation is similar. First, the numerator equals the denominator when both agents always direct their attention in exactly the same way over the set of meanings (i.e., always): this is the maximum alignment that can be achieved given the underlying level of certainty, A = 1. On the other hand, if there is no systematic relationship between the signaller’s and receiver’s attentional weights, A = 0. In this case, there is no way that the signaller’s observable behaviour could vary systematically with the receiver’s attention, and we expect it to be impossible for communication to emerge in the absence of communicative feedback in this case.

In the following, we assume that each agent attends to meaning m with overall weight when averaged over many interactions, that is, there is no long-term difference between agents in how they distribute their attention, only when. Our mathematical treatment allows this average weight to vary arbitrarily between meanings, although we restrict to the case of the uniform distribution in simulations and when analysing the emergence of communication. To realise a specific combination of C and A in simulations (the latter restricted to ), we drew the set of weights for the signaller from a Dirichlet distribution so that the desired variance in (1) was obtained. Then, with probability A we assigned exactly the same set of weights to the receiver; otherwise (with probability 1–A) we sampled an independent set of weights from the same Dirichlet distribution. Note however that our mathematical analysis does not rely on this specific construction, and in the three regimes we consider, applies to any distribution of weights with the same statistical properties.

Topic selection and signal production

As noted above, the signaller i selects the topic of the interaction by sampling from their attentional weights . They are further equipped with a memory of previous interactions, partitioned according to the meanings that they experienced as receivers by employing Bayesian inference strategies set out below. This partitioning is illustrated in Fig 2 through the lower row of boxes. Each circle within a box corresponds to a memory of a signal previously interpreted as the corresponding meaning, with more recent memories being stronger (larger circles) than more distant ones. The size weighted distribution over signals for meaning m is denoted , normalised so that . A central assumption in the model is that agents are motivated to behave predictably [7,8,46]. Concretely, this means that agents conform to their own beliefs about each other’s behaviour, and produce signal s for meaning m with probability . Note that although the signal is illustrated in Fig 2 as a verbalisation, it could be any other intentional behaviour, such as a facial expression or a hand gesture.

thumbnail
Fig 2. Topic selection and signal production.

The signaller samples a topic (here, the rabbit) in proportion to the attentional weight (size of the corresponding shape). The lower row of boxes indicates memory of which signal (different coloured circles) was previously interpreted by the signaller as having the corresponding meaning. Only the memories related to the topic are relevant (irrelevant memories are shaded). Memories decay over time, the size of each circle corresponding to the strength of the memory that remains. The signaller samples a signal in proportion to its strength in the memory. Here a blue circle is sampled, which corresponds to the verbalisation ‘oyb’.

https://doi.org/10.1371/journal.pcsy.0000078.g002

Signal interpretation

The task faced by the receiver is the inverse to that of the signaller: instead of choosing a signal to represent a meaning sampled from the attentional weights, they must infer the topic given the signal and their own attentional weights. Bayesian inference [41,42] provides a natural mathematical framework for performing such probabilistic reasoning and is thus a widely adopted paradigm in cognitive science.

Here, it manifests as the receiver’s attentional weights constituting a set of prior beliefs for the meaning of the signal they have encountered. When weighted by their estimate of how likely it is that a member of their society would use signal s to convey meaning m (which, as noted above, corresponds also to the probability that they themselves would use this signal for that meaning), we obtain the posterior distribution over interpretations

(3)

See Fig 3. This posterior distribution accounts in a principled way both for the receiver’s expectations as to likely topics of conversation and their experience of signalling behaviour in their society.

thumbnail
Fig 3. Signal interpretation.

The receiver interprets the signal (here, the vocalisation ‘oyb’) by focussing on all memories of that signal’s production in the past (blue circles). Memories of other signals are irrelevant (hence shaded). These memories are combined with the receiver’s attentional weights by scaling their size in proportion to those weights (shown as each box, and its contents, scaled accordingly). The interpreted meaning (here, cat) is sampled according to the resulting combined weight.

https://doi.org/10.1371/journal.pcsy.0000078.g003

Predicting signalling behaviour with finite memory

At the heart of any model for the cultural evolution of signalling behaviour is a process by which signalling behaviour is learned (see e.g. [17] for a survey). Although details vary considerably, the unifying premise is that when a signal is produced in the context of a given meaning, a memory of this is retained such that the same signal is more likely to be produced in the future when the agent seeks to communicate that same meaning. Here we follow the principled approach of [48], which is also grounded in Bayesian inference. The idea is that, given a sequence of T signals that have been interpreted with meaning m, and a prior distribution over the S available signals, the agent computes the posterior predictive distribution [42] that expresses how likely the next signal with meaning m is s. This furnishes the distribution over signaller’s productions, , introduced above. A limitation of the formalism of [48] is that it is restricted to the case of batch learning: that is, an agent acts only as a receiver while being exposed to the T signals, after which they mature and act only as a signaller, producing signals from an unchanging distribution . In the present work, we require agents to act as signallers and receivers concurrently, engaging therefore in online learning and with their signalling behaviour evolving over time. Moreover, we also seek to build in a finite-memory constraint, such that older interactions carry less weight than newer ones. This feature is important to avoid the system becoming trapped in suboptimal states [17].

The prior that is adopted by [48] is a Dirichlet distribution over sets of frequency estimates for each meaning m and agent . Here we use a hat to distinguish these conjectured frequencies from the posterior predictive distribution . We assume that all signals are initially considered interchangeable, which implies a prior of the form

(4)

where we have omitted the normalisation to avoid undue notational clutter. The parameter α determines how much variability an agent expects in signalling behaviour, with small values of α corresponding to an expectation that one signal is very much more likely to be used than any other for a meaning, while larger values of α allow for multiple signals to be used variably. The value of α thus quantifies the strength of a prior constraint on the structure of a nascent communication system.

For batch learning, the likelihood of a sequence of signals being produced is obtained by multiplying the conjectured frequency of each one:

(5)

The posterior predictive distribution is then obtained by averaging the frequency over the posterior obtained by multiplying the prior and the likelihood (and normalising appropriately) [48].

To incorporate memory loss, we appeal to information theory, specifically, that in an optimal encoding of the conjectured signal distribution, a memory of signal s consumes bits of memory [43]. Suppose now that a random fraction λ of the bits in the memory are deleted whenever new information is stored. Then, the string that was stored for a signal s gets converted into a memory of a different signal with the shorter length , and thus has a higher frequency. This corresponds to replacing each frequency in the likelihood (5) with the modified frequency where .

Given these definitions, one finds (by using the properties of the Dirichlet distribution [48]) that after T interactions, the posterior predictive distribution is

(6)

Here, if the signal produced in interaction t, , is s, otherwise it is zero. Meanwhile if the agent chose to store information about the interaction at time t, otherwise it is zero. By default, a receiver stores information after every interaction. If desired, we can also incorporate communicative feedback into this process by storing new information only when communication is a success: in such a case, we take only if the interpretation m matches the topic . If it is a failure, memory of previous signals is assumed to decay through the same mechanism as above, but nothing is stored to replace it. The numerator of (6) can be recognised as set of counts of interactions where was the topic, was the signal produced, and m was the meaning subsequently inferred, these counts weighted exponentially by the time that has passed since that interaction. The denominator ensures normalisation. The case considered by [48] corresponds to no memory decay, , and all the weights in the sum are equal to one.

It is more helpful for both simulation and mathematical analysis to work with the change in the posterior predictive distribution that occurs as a result of the interaction that takes place at time T. By subtracting (6) at two successive time points, we find

(7)

When the rate of information loss is small, the sum in the denominator converges to as , where is the probability that conditioned on m being interpreted. When there is no communicative feedback, always, whereas when feedback is available, its value depends on the state of the system and changes over time.

To interpret the update rule (7), it is helpful to represent it pictorially—see Fig 4. The prior is represented by the circles with thick outlines. Since the prior represents the information available to an agent before they learn from the signallers they interact with, we assume that this cannot be forgotten, and thus these circles retain their initial size at all times. The remaining circles represent memories of signals that have been interpreted with the meaning under consideration. These shrink in size each time that meaning is interpreted. In the absence of feedback, or when feedback is present and the interaction was a success, a new circle of unit size and colour corresponding to the signal that was produced is entered, thus reinforcing the association between meaning and signal. Only when we assume that communicative feedback is already present and the interaction was a failure is nothing entered into memory: this results in a reversion to the prior uniform distribution over signals.

thumbnail
Fig 4. Retention of an interaction in memory.

After the receiver has drawn an inference (here, cat) given the signal they have encountered (here, the vocalisation ‘oyb’), a memory of the interaction is retained. The default is to reinforce the association between the signal and interpreted meaning, operationalised by shrinking the size of memories from earlier interactions, and inserting a unit-sized memory corresponding to the signal (here, a blue circle). When feedback is available, an alternative is to decline to store the memory due to a mismatch between the interpretation and the intended topic. Previous memories shrink, yielding to a reversion to the uniform distribution (right) which occurs because prior knowledge corresponds to circles of fixed equal size (heavy outlines) in this representation.

https://doi.org/10.1371/journal.pcsy.0000078.g004

Similar update rules have been applied in related works, albeit without the Bayesian and information-theoretic underpinnings that we have taken time to emphasise here. For example, a linear combination of new and existing behaviour, subject to linear biases, was advanced in [49] on the basis of simplicity and yields the same update when the topic was tightly constrained to a single meaning. A benefit of the current approach is that it extends to multiple arbitrarily constrained meanings with no further assumptions. Likewise, [17] presents an urn model similar to that shown in Fig 4, albeit with an ad-hoc deletion rule to model memory loss, rather than reversion to the prior. More generally, the first term in (7) corresponds to a classic reinforcement learning update [50]. In this framework, failing strategies have sometimes been suppressed by applying a negative reinforcement, i.e., inverting the sign of the first term. However, this is problematic, because it can lead to the creation of negative probabilities, which has to be prevented in some way [51]. Here, reversion to the prior has the same effect as suppressing failing strategies, but without such problems arising. We further note that the underlying Bayesian computation is in principle highly computationally demanding, because to implement it fully would require agents to maintain probability distributions over all possible combinations of signal frequencies. This feature sometimes draws criticism of Bayesian inference as a model for human cognition [41]. Here, we see that the algorithm for updating the posterior predictive distribution requires agents only to keep track of association strengths between each signal and meaning and to apply the linear rule (7).

For reference in the following, we summarise the key parameters and variables in the model in Table 1.

thumbnail
Table 1. Parameters and variables in the model of signal emergence.

https://doi.org/10.1371/journal.pcsy.0000078.t001

Communicative gain

In the initial condition, agents use every signal with equal probability for every meaning: for all , s and m because the prior distribution in the Bayesian procedure for estimating signal frequencies is invariant under exchange of signals. Such a state is non-communicative, as the signalling behaviour is identical no matter what the topic. To assess whether communication has emerged as agents repeatedly interact through the sequence of steps described above, we require a measure of communicative gain.

One possibility is the frequency with which the receiver’s inference matches the signaller’s topic. However, these can be the same for reasons incidental to communication. An extreme case of this is where both agents are focussed on the same meaning in every interaction (i.e., C = A = 1). Then, the receiver will always correctly identify the topic even in the non-communicative state.

To this end, we use instead the probability ps of a blind success, that is, whether two agents picked uniformly at random (no matter the actual social network structure) are able to communicate a topic that is also chosen uniformly (again no matter the actual distribution of attentional weights). This measure, defined as

(8)

therefore assesses the potential of the communication system to resolve ambiguity in even the most challenging situation.

It is helpful when considering simulation results to scale this success probability such that, for a given number of meanings M and signals S, the non-communicative state maps to 0 whilst its maximum value of maps to 1. We call this scaled measure the communicative gain G:

(9)

We note that there are other possible metrics that might better capture the structure and information content of the signalling system that emerges, for example, those based on such information-theoretic measures as Kullback-Leibler divergence or mutual information. Since here we focus mostly on the simplest possible cases (such as all meanings having equal probability), communicative gain is sufficient for our needs.

Results

We now determine how and when communication systems emerge under three different assumptions as to how candidate meanings for a signal are constrained in an interaction. First, we examine the case where constraints on a signal are so tight that the topic can be identified by the receiver in every interaction. Then, we turn to the opposite extreme, where there are no such constraints and the topic can be determined only by resorting to feedback. Finally, we consider the case of greatest interest, where constraints are weak but the alignment between agents is sufficiently high that communication can nevertheless emerge in the absence of feedback.

Tight constraints through prior expectations

As noted above, when the certainty C and alignment A both equal 1, the constraints on candidate meanings are so tight that both the signaller’s and receiver’s attention is always focussed on a single common meaning. This could occur for example because some unexpected and dramatic event (like a clap of thunder) prompted the interaction, or through the application of a mutual-exclusivity constraint that has ruled out all other possible meanings for the signal that has been produced [27,36].

Simulations of the model under these conditions tend to exhibit communication systems in which only some meanings have a dominant signal, that is one that is used with a frequency across the society. Furthermore, there are some signals that do not dominate any meanings. Fig 5 shows how these dominant signals are distributed in single realisations of the emergence process for various combinations of the prior strength α and the number of meanings M. Such communication systems have a suboptimal communicative gain, because some signals are effectively unused. A similar result was reported by [32] who further observed that the success rate achieved is close to that found if one assumes that each meaning is expressed by a single randomly-assigned signal.

thumbnail
Fig 5. Communication systems that emerge under tight meaning constraints.

Patch shading indicates the frequency with which each signal s is used to convey meaning m, averaged across a society of N = 5 agents. In all cases , C = 1, A = 1 and S = 12. Meanings increase from M = 14 (leftmost column) to M = 36 (rightmost column). Prior strength (upper row) or (lower row). A single signal dominates for a given meaning when . Signals have been ordered by the number of meanings for which they dominate; and meanings ordered so that they are adjacent when the same signal dominates. The horizontal line indicates the boundary between those signals that dominate at least one meaning and those that do not dominate any meaning. The vertical line indicates the boundary between meanings that are dominated by a signal and those that are not. We see that more signals dominate as the prior strength is reduced.

https://doi.org/10.1371/journal.pcsy.0000078.g005

We can explain this finding by deriving a stochastic differential equation for an agent’s frequency estimate from the update rule (7). In S1 Appendix, we show that the resulting equation is

(10)

where is a random function that models fluctuations around the average behaviour given by the first term. In this equation is the frequency of a signalling strategy for agent , whereas is the corresponding frequency averaged across the society of interacting agents. This equation is valid when the set of signallers that any receiver interacts with is representative of that wider society, which seems to be true of social networks that have a sufficiently high density of long-range connections (see S1 Appendix).

The significance of this equation is that it can be recognised as describing the dynamics of Wright’s island model from population genetics [52]. This is a neutral model [47], in which the first term in square brackets describes passive migration between an island (which here corresponds to a single agent) and a mainland (here, the rest of the society). The second term describes a symmetric mutation process at a rate proportional to , that is, the product of the forgetting rate λ and the prior strength α. This mutation originates in prior expectations never being forgotten over time as learned signalling behaviour is. The stochastic term can be identified with genetic drift.

The long-term behaviour of this population dynamics is well understood [49,53]. At high mutation rates, one typically finds coexistence between the different genotypes, which here corresponds to the non-communicative state in which all signals are used for every meaning. At low mutation rates, one signal is expected to dominate each meaning across the entire society, as a result of fluctuations arising from sampling (i.e., the cultural analogue of genetic drift). However, there are no interactions between different meanings, meaning that each signal that dominates does so independently of the others. This lack of structure, and therewith under-utilisation of the signals, is a direct consequence of the lack of ambiguity—the receiver always knows the topic ahead of signal production, so signalling cannot add any information to what is already available.

Therefore, as , we should expect to find the state proposed by [32], that is, where the probability that D different signals dominate in the state that emerges is distributed as

(11)

We test this prediction by comparing with simulations at in Fig 6 for different M at S = 12. The good agreement confirms that when meanings are tightly constrained, unstructured associations between signals and meanings arise through neutral cultural evolutionary forces that are analogous to migration, mutation and genetic drift.

thumbnail
Fig 6. Distribution of the number of dominant signals under tight constraints.

Empirical distribution (points) obtained from simulations with , A = C = 1, S = 12 and with M = 14 (left), M = 17 (middle) and M = 36 (right). The theoretical distribution is given by Eq (11), and corresponds to the case where a randomly-assigned signal dominates each meaning. We find good agreement with the theoretical prediction.

https://doi.org/10.1371/journal.pcsy.0000078.g006

Weak constraints modulated by feedback

We now turn to the opposite extreme, where agents’ attention is maximally unconstrained. This corresponds to a certainty C = 0, implying that all agents distribute their attention across the M meanings in exactly the same way in every interaction. The value of alignment parameter A is irrelevant in this case. The only way for the resulting ambiguity to be resolved is by allowing communicative feedback into the model. Recall that this is implemented as shown in Fig 4, where reinforcement occurs only after successful interactions, and reversion to the initial uniform distribution over signals occurs on failure. Simulations show that a communicative state that utilises all signals is typically arrived at, as long as the product of the forgetting rate λ and prior strength α is sufficiently small.

We can develop a quantitative understanding of the communicative gain that is achieved when this occurs by deriving the stochastic differential equation for an agent’s frequency estimate . As previously, the procedure is described in S1 Appendix. This time, we arrive at a replicator-mutator type equation [18]

(12)

Here, is the local fitness of signalling strategy s when m is the receiver’s interpretation, is the mean fitness over all signals competing to express meaning m and . As above, the second term in square brackets corresponds to symmetric mutation, and to genetic drift. This is not a neutral model, but one in which signalling frequencies are driven by selection. Broadly speaking, the signalling strategy with the highest fitness will proliferate, until balanced by the mutation that derives from forgetting.

In the current scenario, we find in S1 Appendix that the fitness is , the probability that a randomly-chosen receiver interprets signal s as meaning m, as defined by Eq (3). That is, competition within each local population favours the signal which is most likely to be correctly interpreted. This can be viewed as a form of higher-order reasoning, implemented as an obverter strategy [17,54] or recursively within the rational speech act framework [55]. Here, this is not built into the model, but is an emergent product of an interaction between agents behaving predictably and communicative feedback.

The full high-dimensional system of Eq (12) does not facilitate a straightforward analysis. However, to understand the emergence of communication it is sufficient to consider the special case where the attention distribution is uniform, , and the single-coordinate reduction that is obtained if we restrict the dynamics to a space of symmetrically-structured communication systems, illustrated in Fig 7. These communication systems partition the meanings into S disjoint subsets of equal size , each conveyed by a common signal with the same frequency x. The other signals are used for each of those meanings with frequency . Such systems are optimal in the sense that they distinguish between different meanings to the maximum extent possible (given the number meanings and signals and the effect of forgetting).

thumbnail
Fig 7. Symmetrically-structured communication system.

In a symmetrically-structured communication system, each signal is used with frequency x for exactly meanings that are not shared with any other signals (shown dark shaded). The remaining signals (light shaded) are used with equal frequency .

https://doi.org/10.1371/journal.pcsy.0000078.g007

In this symmetric case, we find in S1 Appendix that the deterministic part of Eq (12) simplifies to

(13)

where for the present case of full ambiguity and feedback-driven learning, the parameter . (We will obtain the same equation, but with a different value of Γ when we examine the case without feedback below.)

We see that the non-communicative state where is a fixed point of this equation. It is unstable when , which means any fluctuation away from the initial condition will grow, and the system will evolve deterministically towards a different fixed point. The fixed point of interest is the communicative state where

(14)

In Fig 8, we plot the communicative gain (9) that is expected at this fixed point as a function of the prior strength α at fixed , and find that when , the system does indeed evolve into this communicative state.

thumbnail
Fig 8. Communicative gain with feedback.

Lines show the communicative gain G expected for different prior strengths α at the communicative fixed point defined by (14) when selection is driven by feedback and interactions are maximally ambiguous. The solid lines correspond to where the non-communicative initial condition is unstable, and the dashed lines where it is stable. Points are from simulations with C = 0, A = 1, , S = 12 signals, or 48 meanings in a society with N = 20 agents (left) and N = 40 agents (right).

https://doi.org/10.1371/journal.pcsy.0000078.g008

The situation above this threshold is more subtle. The communicative state exists up to , and is stable whenever it exists. Thus, there is a region where both the non-communicative and communicative states are stable, shown with the dashed lines in Fig 8. Here it may be possible to reach one state from the other through a sufficiently large fluctuation, and we find a transition to communication occurs for lower values of in this intermediate range. The left and right panels of Fig 8 shows the state that is reached at two different society sizes (N = 20 and N = 40 agents, respectively). We see that the region in which a fluctuation allows the non-communicative state to be escaped is slightly smaller in the larger society. We therefore expect that the probability of a fluctuation large enough to reach the communicative state will vanish as the society size increases, and that in the limit of an infinite society, communication would emerge only when the non-communicative state is unstable, i.e., when .

An intriguing consequence of the way that feedback is implemented in this model is that for fixed cognitive parameters (i.e., forgetting rate λ and prior strength α), there is an upper limit on the number of meanings that agents can entertain if communication is to emerge. The origin of this limit is the reversion to uniform distribution over signals when communicative failure occurs. Since the probability of a failure is high in the non-communicative state when the number of meanings M is large, any fluctuation towards systematic use of one signal over the others for any given meaning is quickly suppressed. This is the case even with just two signals, that is, if agents are faced with the task of dividing a large meaning space into two categories.

Weak constraints modulated by shared intentionality

We now tackle the case where candidate meanings are weakly constrained (i.e., C close to zero) and we do not allow communicative feedback. Now, the only way that communication could emerge is if receivers are able to exploit shared intentionality, i.e., correlations between a signaller’s behaviour and how their own attention is distributed. In S1 Appendix, we show that the stochastic differential equation that governs the evolution also takes the form of the replicator-mutator Eq (12), albeit with and the very different local fitness

(15)

Here is the frequency with which signal s is produced over all interactions across the society.

From this expression we first observe that if C = 0, all signalling strategies have the same fitness. Thus, in this case, symmetric mutation and genetic drift operate independently for each agent, meaning that no shared communicative behaviour can emerge. This is the expected result, as there is no variation in attention that can be exploited here.

To understand what happens for small nonzero C, we first assume that individual differences between agents can be neglected: that is, one can take . When, as previously, we restrict to the space of symmetrically-structured communication systems, an equation of the form (13) results with and the parameter which cannot be positive since C > 0 and . Recalling that the non-communicative fixed point is unstable if , we conclude that there is no possibility of communication emerging spontaneously within this analysis.

This, however, is not what happens in simulations. For example, Fig 9 clearly shows that a nonzero communicative gain is possible for a wide range of parameter combinations. We now show that communication is selected for deterministically in an arbitrarily large society, and therefore not the result of the type of fluctuation that only occurs in small systems. Rather, it is the consequence of individual differences between agents that are always present. To this end, it is helpful to appeal first to the simplest possible case, where the system comprises two agents whose attention is always fully aligned (A = 1) and there are just two signals available. Further assuming, as in the symmetrically-structured states, that the overall frequency of each signal , averaged across all meanings, remains at throughout the evolution, the local fitness (15) simplifies to

thumbnail
Fig 9. Communicative gain in the absence of feedback.

The location of the communicative fixed point (14), and hence the gain in the communicative state, is predicted to depend on the ratio where Γ is the threshold value (22). Lines show this predicted gain for S = 5 (left) and S = 11 (right), with the solid part of the line indicating where the non-communicative fixed point is unstable. We find that simulation data (points) for various combinations of C (denoted by colours) and A (by marker shape) are in reasonable agreement with these predictions. In all simulations M = 55 and .

https://doi.org/10.1371/journal.pcsy.0000078.g009

(16)

Since the frequencies are normalised, , the mean local fitness is

(17)

and the fitness difference that appears in the replicator-mutator equation is

(18)

We now focus on the signal s* that is used most frequently for the meaning m* under consideration at the society level, that is, with a frequency . We further identify the agent who is using that signal with a frequency that is at least as high as this average, and denote them with a + sign. This frequency we write as where . The deterministic part of the replicator-mutation Eq (12) for this agent in this special case is

(19)

Similarly, for the other agent, who uses signal s* for meaning m* less often, it is

(20)

When the mutation rate is very small, the dynamics are dominated by the selection term (i.e., the one proportional to C. Since the signal s* is by definition in the majority, , then for small deviations ε from this average we have that the rate at which shrinks through negative selection is smaller than the rate at which grows through positive selection—see Fig 10. Thus, the combined effect of this competition is for the mean frequency to grow overall, i.e., a net positive selection. This selective effect, however, is countered by mutation. When one accounts for both evolutionary forces, one finds that the mean frequency, obtained by summing (19) and (20) and dividing by two, obeys the equation

thumbnail
Fig 10. Interaction between competition and individual variability.

The rate at which the lower signal frequency among the two members of the society grows is given by the upper curve, while that for the higher signal is given by the lower curve. When the mean of these two frequencies exceeds , the lower signal frequency increases faster than the higher frequency decreases (shown by the arrows), yielding a net increase for this majority signal.

https://doi.org/10.1371/journal.pcsy.0000078.g010

(21)

In words, this equation states that if the variance in signalling frequencies over different agents in the society, , exceeds , the signal that is in the majority for the meaning under consideration will tend to grow in frequency. Importantly, the net positive selection is driven by individual differences that arise spontaneously as a result of the randomness inherent in the interactions between agents, and whose magnitude does not depend on the size of the overall society. Therefore, although this deterministic instability is created by a fluctuation effect, it is not the rare type of fluctuation that allows stable fixed points to be escaped in small societies, and is suppressed in large ones.

In S1 Appendix we extend the analysis to the general case of more than two agents and signals, obtaining an expression for the fitness that applies at the society level, rather than the individual level. Using this expression in (12), and restricting again to symmetric communication systems, we obtain Eq (13) with the threshold Γ now given by

(22)

Here, V quantifies variability in signal frequencies for a fixed meaning across the society in the same way that the certainty C quantifies variation in attention weights over interactions, that is, analogously to (1),

(23)

where both the expectation value and variance are over agents . In common with C, this parameter lies in the range , and is zero if all agents are identical in their signalling of meaning m and one if they are maximally different, given the society averages. In S1 Appendix we further provide an estimate of the value of V that applies when we take into account the amplitude of the drift term in the replicator-mutator Eq (12).

The key point is that if both the alignment A and variability V are sufficiently large, the requirement that for communication to emerge spontaneously can be satisfied. The threshold (22) does not depend explicitly on the number of agents N, meanings M or signals S, and the implicit dependence on M through V can be neglected when M is large (see S1 Appendix). Thus, there is a deterministic instability to bootstrapping communication in populations of arbitrary size, no matter how many meanings or signals are available, as long as interacting agents’ attention covaries sufficiently strongly from one interaction to the next, and differences between individuals are able to grow.

To test these predictions, we note from (14) that the location of the communicative fixed point depends on the number of signals S and the scaled mutation rate . Thus, we should find for different combinations of M, A and C (and hence V, which depends on all of these quantities), that the communicative gain falls along a single curve when plotted as a function of for fixed S. Simulation data in Fig 9 provide reasonable support for this prediction, with the deviations most likely arising from imprecision in the estimate of V presented in S1 Appendix.

Moreover, we can identify combinations of A and C where communication is expected to arise when other parameters are held fixed. These are compared in Fig 11 with regions where a communicative gain exceeding one half is found in simulation. Strictly speaking, the region determined analytically should coincide with that where any positive gain is achieved. However, that region is found to be larger than that predicted (although has the same qualitative shape). We suspect that this is a consequence of finite-size fluctuations that allow the communicative state to be reached even when the non-communicative state is stable also being present. The existence of these is suggested by Fig 9 where we find points close to the dashed part of the line. To confirm this would require running the simulations in much larger societies: however, this is computationally challenging as the simulation run time required grows linearly in the size of the society N, and we would probably need to simulate societies at least two or three orders of magnitude larger to gain such confirmation.

thumbnail
Fig 11. Phase diagram for the emergence of communication.

Shaded regions show combinations of C, A and α that facilitate communication without feedback in the limit of an infinite number of meanings M. Simulations are for the case , S = 11, M = 55 and points are plotted when the gain G > 0.5.

https://doi.org/10.1371/journal.pcsy.0000078.g011

Discussion

In this work, we constructed a model within which agents can learn signalling behaviour from other members of their society, correlating this with how their attention is distributed over possible meanings, and optionally utilising communicative feedback. This model was grounded in principles of Bayesian inference [41,42] and information theory [43]. An important feature of the model was the ability to control constraints on candidate meanings through the extent to which attentional weights varied between interactions and between interacting agents.

Our main finding is that effective communication can emerge spontaneously in large well-connected societies without any pre-existing ability to communicate success or failure. The necessary ingredients are: (i) that agents learn to predict which signal is used by others to express each given meaning [44,45]; (ii) act cooperatively by conforming to these predictions [7,8,46]; and (iii) possess sufficient shared intentionality [37] that mental representations about likely topics are well aligned before a signal is produced [39,40]. Once individual differences have opened up between agents, communication, and by extension language, can be bootstrapped from social cognitive capabilities that are not specific to communication or language [2,7].

There is no limit on the number of signals, meanings or agents for bootstrapping to be possible. Agents also do not need to be equipped with strong constraints on the structure of the communication system. Instead, it is sufficient for the probability that signaller and receiver agree on the topic in the absence of a convention to be slightly above chance. Most importantly, a threshold level of alignment must be exceeded, suggesting that species with limited shared intentionality would not be able to bootstrap a large socially-learned communication system. A curious finding is that feedback-driven learning is effective for communication only when the number of meanings is sufficiently small. We do not intend to suggest that feedback plays no role in everyday conversation or language acquisition, just that it might be counterproductive while establishing a communication system de novo.

The inevitability of effective communication under the above conditions was missed in earlier studies for various reasons. First, if topics are highly constrained (large C in the model), signals have little work to do and less effective systems that fail to utilise all signalling behaviours emerge [32]. Second, fluctuations in small societies can be large enough that a communicative state is found even though at the deterministic level, non-communication is stable. This can be seen in Figs 8 and Fig 9, where communicative gain was achieved in simulations where both communication and non-communication are stable (dashed lines). Such fluctuations are expected to be suppressed in large societies, consistent with earlier simulations [31].

Although minimal by design, our model lends itself to a number of potential applications and generalisations. First, we assumed that our agents initially had no prior exposure to linguistic behaviour, appropriate to an emergence scenario. An alternative would be to bring agents together who have created different signalling systems, thereby modelling a language contact scenario. It would be interesting to see if the model replicates features observed for example in the emergence of new sign languages [56]. It would also be interesting to understand how communication systems emerge and change when the interaction network is not fixed, but is affected by the signals adopted by individual agents. This might, for example, allow for a self-contained model of sociolinguistic effects, consistent with the view of language as a complex adaptive system [4]. Second, for reasons of mathematical tractability, we restricted to the case where all meanings are equally likely to be attended to. It would be worthwhile to generalise to non-uniform meaning distributions, including those where, for example, attending to parts of an object implies also attending to the whole object. One would then be able to determine whether the components we have included in our model are sufficient to reproduce the colexification properties that have been established for natural languages [57].

Finally, we note that a major limitation in the present study of emergence is that the set of signals and meanings have been assumed fixed and available to agents. In particular, this implies that all agents use all signals interchangeably in the initial condition, whereas a more natural scenario would be one in which meanings and signals are created by agents as they experience the world around them. One way to address this shortcoming might be to exploit aspects that our model has in common with transformer models [58] in which meanings are represented as vectors and attention is distributed over meanings as the model attempts to predict the next signal in a sequence. Our results suggest that populations of transformers that are exposed to correlated views of a complex meaning space and seek to predict each others’ output could bootstrap a common classification of that meaning space without relying on a pre-existing set of word embeddings or the assumption of an explicit reward system [16]. As well as providing a means to test the robustness of the mechanism for bootstrapping communication that we have identified here in less idealised scenarios, such investigations might also suggest novel self-supervised multi-agent machine learning algorithms.

Supporting information

S1 Appendix. Mathematical derivations.

Derivation of the stochastic differential equations, Eqs (10) and (12), and the estimated individual variability V.

https://doi.org/10.1371/journal.pcsy.0000078.s001

(PDF)

Acknowledgments

We thank Tim Rogers for comments on the manuscript. For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission.

References

  1. 1. Clark HH. Using language. Cambridge: Cambridge University Press; 1996.
  2. 2. Tomasello M. Origins of human communication. Cambridge, MA: MIT Press; 2008.
  3. 3. Lewis DK. Convention: A philosophical study. Oxford: Blackwell; 2002.
  4. 4. Beckner C, Blythe R, Bybee J, Christiansen MH, Croft W, Ellis NC, et al. Language is a complex adaptive system: position paper. Language Learning. 2009;59(s1):1–26.
  5. 5. Hawkins RXD, Goodman ND, Goldstone RL. The emergence of social norms and conventions. Trends Cogn Sci. 2019;23(2):158–69. pmid:30522867
  6. 6. Bloom P. How children learn the meaning of words. Cambridge, MA: MIT Press; 2000.
  7. 7. Tomasello M. Constructing a language: a usage-based theory of language acquisition. Cambridge, MA: Harvard University Press; 2005.
  8. 8. Chater N, Christiansen MH. Language acquisition as skill learning. Current Opinion in Behavioral Sciences. 2018;21:205–8.
  9. 9. Hurford JR. Biological evolution of the Saussurean sign as a component of the language acquisition device. Lingua. 1989;77(2):187–222.
  10. 10. Fitch WT. The evolution of language. Cambridge: Cambridge University Press; 2010.
  11. 11. Hurford JR. Origins of language: a slim guide. Oxford: Oxford University Press; 2014.
  12. 12. Quine WVO. Word and object. Cambridge, MA: MIT Press; 1960.
  13. 13. Blythe RA, Smith ADM, Smith K. Word learning under infinite uncertainty. Cognition. 2016;151:18–27. pmid:26927884
  14. 14. Steels L. Evolving grounded communication for robots. Trends Cogn Sci. 2003;7(7):308–12. pmid:12860189
  15. 15. Puglisi A, Baronchelli A, Loreto V. Cultural route to the emergence of linguistic categories. Proc Natl Acad Sci U S A. 2008;105(23):7936–40. pmid:18523014
  16. 16. Sukhbaatar S, Szlam A, Fergus R. Learning multiagent communication with backpropagation. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016. p. 2252–60.
  17. 17. Spike M, Stadler K, Kirby S, Smith K. Minimal requirements for the emergence of learned signaling. Cogn Sci. 2017;41(3):623–58. pmid:26988073
  18. 18. Nowak MA. Evolutionary dynamics: exploring the equations of life. Cambridge, MA: Harvard University Press; 2006.
  19. 19. Smith ADM. Mutual exclusivity: communicative success despite conceptual divergence. In: Tallerman M, editor. Language origins: perspectives on evolution. Oxford: Oxford University Press; 2005.
  20. 20. Seyfarth RM, Cheney DL, Marler P. Monkey responses to three different alarm calls: evidence of predator classification and semantic communication. Science. 1980;210(4471):801–3. pmid:7433999
  21. 21. Donaldson MC, Lachmann M, Bergstrom CT. The evolution of functionally referential meaning in a structured world. J Theor Biol. 2007;246(2):225–33. pmid:17280687
  22. 22. Scott-Phillips TC, Blythe RA, Gardner A, West SA. How do communication systems emerge?. Proceedings of the Royal Society B. 2012;279:1943–9. https://doi.org/10.1098/rspb.2011.2181
  23. 23. Siskind JM. A computational study of cross-situational techniques for learning word-to-meaning mappings. Cognition. 1996;61(1–2):39–91. pmid:8990968
  24. 24. Isbilen ES, Christiansen MH. Statistical learning of language: a meta-analysis into 25 years of research. Cogn Sci. 2022;46(9):e13198. pmid:36121309
  25. 25. Roembke TC, Simonetti ME, Koch I, Philipp AM. What have we learned from 15 years of research on cross-situational word learning? A focused review. Front Psychol. 2023;14:1175272. pmid:37546430
  26. 26. Blythe RA, Smith K, Smith ADM. Learning times for large lexicons through cross-situational learning. Cogn Sci. 2010;34(4):620–42. pmid:21564227
  27. 27. Reisenauer R, Smith K, Blythe RA. Stochastic dynamics of lexicon learning in an uncertain and nonuniform world. Phys Rev Lett. 2013;110(25):258701. pmid:23829764
  28. 28. Kirby S, Cornish H, Smith K. Cumulative cultural evolution in the laboratory: an experimental approach to the origins of structure in human language. Proc Natl Acad Sci U S A. 2008;105(31):10681–6. pmid:18667697
  29. 29. Smith ADM. Establishing communication systems without explicit meaning transmission. In: Kelemen J, Sosık P, editors. Advances in Artificial Life. Springer; 2001.
  30. 30. Kwisthout J, Vogt P, Haselager P, Dijkstra T. Joint attention and language evolution. Connection Science. 2008;20(2–3):155–71.
  31. 31. Vogt P, Coumans H. Investigating social interaction strategies for bootstrapping lexicon development. Journal of Artificial Societies and Social Simulation. 2003;6(1).
  32. 32. Fontanari JF, Cangelosi A. Cross-situational and supervised learning in the emergence of communication. IS. 2011;12(1):119–33.
  33. 33. Saffran JR. Statistical language learning. Curr Dir Psychol Sci. 2003;12(4):110–4.
  34. 34. De Beule J, De Vylder B, Belpaeme T. A cross-situational learning algorithm for damping homonymy in the guessing game. In: Rocha LM, editor. Artificial life X. MIT Press; 2006. p. 466–72.
  35. 35. Macnamara J. Cognitive basis of language learning in infants. Psychol Rev. 1972;79(1):1–13. pmid:5008128
  36. 36. Markman EM, Wachtel GF. Children’s use of mutual exclusivity to constrain the meanings of words. Cogn Psychol. 1988;20(2):121–57. pmid:3365937
  37. 37. Tomasello M, Carpenter M. Shared intentionality. Dev Sci. 2007;10(1):121–5. pmid:17181709
  38. 38. Tomasello M, Farrar MJ. Joint attention and early language. Child Dev. 1986;57(6):1454–63. pmid:3802971
  39. 39. Stolk A, Verhagen L, Toni I. Conceptual alignment: how brains achieve mutual understanding. Trends Cogn Sci. 2016;20(3):180–91. pmid:26792458
  40. 40. Schilbach L, Redcay E. Synchrony across brains. Annu Rev Psychol. 2025;76(1):883–911. pmid:39441884
  41. 41. Griffiths TL, Chater N, Kemp C, Perfors A, Tenenbaum JB. Probabilistic models of cognition: exploring representations and inductive biases. Trends Cogn Sci. 2010;14(8):357–64. pmid:20576465
  42. 42. Gill J. Bayesian methods: a social and behavioral approach. 3rd ed. Boca Raton: CRC Press; 2015.
  43. 43. Cover TM, Thomas JA. Elements of information theory. Hoboken, NJ: Wiley; 2006.
  44. 44. Bar M. Predictions in the brain: using our past to generate a future. Oxford: Oxford University Press; 2011.
  45. 45. Pickering MJ, Garrod S. An integrated theory of language production and comprehension. Behav Brain Sci. 2013;36(4):329–47. pmid:23789620
  46. 46. Contreras Kallens P, Christiansen MH. Distributional semantics: meaning through culture and interaction. Top Cogn Sci. 2025;17(3):739–69. pmid:39587986
  47. 47. Kimura M. The neutral theory of molecular evolution. Cambridge University Press; 1985.
  48. 48. Reali F, Griffiths TL. Words as alleles: connecting language evolution with Bayesian learners to models of genetic drift. Proc Biol Sci. 2010;277(1680):429–36. pmid:19812077
  49. 49. Baxter GJ, Blythe RA, Croft W, McKane AJ. Utterance selection model of language change. Phys Rev E Stat Nonlin Soft Matter Phys. 2006;73(4 Pt 2):046118. pmid:16711889
  50. 50. Roth AE, Erev I. Learning in extensive-form games: experimental data and simple dynamic models in the intermediate term. Games and Economic Behavior. 1995;8(1):164–212.
  51. 51. Bereby-Meyer Y, Erev I I. On learning to become a successful loser: a comparison of alternative abstractions of learning processes in the loss domain. J Math Psychol. 1998;42(2/3):266–86. pmid:9710551
  52. 52. Wright S. Evolution in Mendelian populations. Genetics. 1931;16(2):97–159. pmid:17246615
  53. 53. Burden CJ, Griffiths RC. Stationary distribution of a 2-island 2-allele Wright-Fisher diffusion model with slow mutation and migration rates. Theor Popul Biol. 2018;124:70–80. pmid:30308179
  54. 54. Oliphant M, Batali J. Learning and the emergence of coordinated communication. Center for Research on Language Newsletter. 1997;11:1–46.
  55. 55. Goodman ND, Frank MC. Pragmatic language interpretation as probabilistic inference. Trends Cogn Sci. 2016;20(11):818–29. pmid:27692852
  56. 56. Senghas A, Coppola M. Children creating language: how Nicaraguan sign language acquired a spatial grammar. Psychol Sci. 2001;12(4):323–8. pmid:11476100
  57. 57. Futrell R, Hahn M. Linguistic structure from a bottleneck on sequential information processing. arXiv preprint 2024. https://doi.org/10.48550/arXiv.2405.12109
  58. 58. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017. p. 6000–10.
  59. 59. Blythe RA, Fisch C. Emergence of signalling. GitLab repository. 2025. https://git.ecdf.ed.ac.uk/rblythe3/emergence-of-signalling
  60. 60. Blythe RA, Fisch C. Emergence of Signalling simulation data. 2025. https://doi.org/10.7488/ds/8023