Naming a Structured World: A Cultural Route to Duality of Patterning

The lexicons of human languages organize their units at two distinct levels. At a first combinatorial level, meaningless forms (typically referred to as phonemes) are combined into meaningful units (typically referred to as morphemes). Thanks to this, many morphemes can be obtained by relatively simple combinations of a small number of phonemes. At a second compositional level of the lexicon, morphemes are composed into larger lexical units, the meaning of which is related to the individual meanings of the composing morphemes. This duality of patterning is not a necessity for lexicons and the question remains wide open regarding how a population of individuals is able to bootstrap such a structure and the evolutionary advantages of its emergence. Here we address this question in the framework of a multi-agents model, where a population of individuals plays simple naming games in a conceptual environment modeled as a graph. We demonstrate that errors in communication as well as a blending repair strategy, which crucially exploits a shared conceptual representation of the environment, are sufficient conditions for the emergence of duality of patterning, that can thus be explained in a pure cultural way. Compositional lexicons turn out to be faster to lead to successful communication than purely combinatorial lexicons, suggesting that meaning played a crucial role in the evolution of language.


Supporting Information for
Naming a structured world: a cultural route to duality of patterning by Tria et al.
1 Rules of the game Figure S1 illustrates two examples of a failed and a successful game, respectively.
2 Scaling with the size M of the conceptual space In Figure S2 we report the analysis of the dependence of the observables considered in the main text on the number M of objects that compose the conceptual space of the agents. We found that the properties related to the emerged lexicon only mildly depend on M . This is important since it allows a quantitative comparison of the model results with the actual human reality, for which in general only a rough estimate of the size of the conceptual space is possible.

Scaling with the population size
In Figure S3 we report the analysis of the dependence of the observables considered in the main text on the population size N . Again, we found that the properties related to the emerged lexicon only mildly depend on N .

Compositionality with other similarity measures
We consider here two possible variants of the MM similarity adopted in the main text: the same pairs similarity (SP) and the all pairs similarity (AP) . The SP similarity is defined as follows: given two words w 1 and w 2 , we aligned them either making the left-end or the right-end coincide. We then add 1 only when the two words share the same form in the same position. Forms shared by the two words in different positions do not contribute to the SP similarity score. The SP measure will be the maximum between the left-end and the right-end alignments. The AP similarity does not require alignment: given two words w 1 and w 2 , a score of 1 is given for each form shared by two words, irrespective of their relative position in those words. Figure S4 reports the results for the excess similarity (as defined in the main text) when using the SP (left) and the AP (right) similarity scores.

Uncorrelated scale-free networks
We have considered networks with different topologies in order to test the robustness of our simulations with respect to the structure of the underlying network. Each node i of a network is first characterized by its degree k i (number of links) and a first characterization of the network properties is obtained by the statistical distributions of the nodes' degree, P (k). In order to quantify the topological correlations in a network, two main quantities are customarily measured. The clustering coefficient c i of a node i measures the local cohesiveness around this node [1]. It is defined as the ratio of the number of links between the k i neighbors of i and the maximum number of such links, k i (k i − 1)/2. The clustering spectrum measures the average clustering coefficient of nodes of degree k, according to Moreover, correlations between the degrees of neighboring nodes are conveniently measured by the average nearest neighbors degree of a vertex i, k nn,i = 1 ki j∈V(i) k j , and the average degree of the nearest neighbors, k nn (k), for vertices of degree k [2] k nn (k) = 1 In the absence of correlations between degrees of neighboring vertices, k nn (k) is a constant. An increasing behavior of k nn (k) corresponds to the fact that vertices with high degree have a larger probability of being connected with large degree vertices (assortative mixing). On the contrary, a decreasing behavior of k nn (k) defines a disassortative mixing, in the sense that high degree vertices have a majority of neighbors with low degree, while the opposite holds for low degree vertices [3].
In the main text we considered the homogeneous Erdős -Rényi random graph [4,5], in which nodes are linked with a uniform probability p link . In this case, a small diameter and a small clustering coefficient are obtained, and the degree distribution is homogeneous and binomial. The specific properties of the graph depend on p link . In particular if M is the number of nodes, for p link > log(M )/M the graph will almost surely be connected.
We consider here the random scale-free network obtained from the uncorrelated configuration model [6]. It has a broad degree distribution P (k) ∼ k −γ and it is constructed in such a way to avoid two-and three-vertex correlations, as measured by the average degree of the nearest neighbors k nn (k) and the clustering coefficient of the vertices of degree k, respectively. The average degree distribution is finite for the values of the exponent 2 < γ, and the second moment of the distribution in finite for 3 < γ. We consider here two values for the degree distribution exponent, one below and one above the latter threshold: γ = 2.5 and γ = 3.5. We find that all the considered observables do not depend on the value of γ and, moreover, show the same qualitative (and in most cases quantitative) behaviour as when considering an homogeneous Erdős -Rényi random graph with the same number M of nodes (see Figure S5).

Hearer
Before After f27f1f8 f0f22f0 Figure S1. Examples of Games. Top. Example of a failed game. In this game the Speaker S selects the Topic and decides to utter the word f1f1. This word is unknown to the Hearer H since f1f1 is not present in any of H's inventories. In this case the game is a failure and H adds the word f1f1 to her inventory for the Topic. Bottom. Example of a successful game. In this game the Speaker S selects the Topic and decides to ls utter the word f1f1. This word is known to the Hearer H since f1f1 is present in H's inventory for the Topic. In this case the game is a success and both S and H remove from the Topic's inventory all the competing words but f1f1.   Figure S5. Dependence of the graph structure In this figure we report the results obtained by considering a different structure of the conceptual space. We considered in particular random scale-free networks obtained from the uncorrelated configuration model [6], characterized by a degree distribution P (k) ∼ k −γ . In our case we used γ = 2.5 and γ = 3.5. Top Left. Word length distribution. In the main figure the distribution of word length for different τ and for the two values of the degree distribution exponent γ is reported. The inset reports the average word length as a function of τ /M and for the two values of γ. As observed in the text, the word length distribution (and thus the average word length) does not depend on γ and is perfectly comparable to that obtained when considering Erdős -Rényi random graph (compare with fig. 2 in the main text). Top Right. Frequency-rank distribution for elementary forms. In the main figure the frequency-rank distribution for elementary forms is shown again for different values of the parameter τ and γ = 2.5. In the top inset we show the same distribution fixing τ /M = 1 and for the two values of γ. In the bottom inset the number of distinct elementary forms composing the lexicon is reported, as a function of τ and for the two values of γ. Again the results do not depend on γ and are perfectly comparable to those obtained when considering Erdős -Rényi random graph. Bottom Left. Combinatoriality. Combinatoriality C (see the text for definition) for the two values of γ as a function of τ . In the inset the normalized entropy, as defined in the text, is reported, again for the two values of γ as a function of τ . Bottom Right. Excess similarity of words as a function of the distance of the corresponding objects on the graph. The excess MM similarity (see text) for different values of τ and for the two values of γ. As far as a non trivial structure of the world is preserved, the results do not depend on the actual value of the γ exponent. We here considered a graph with M = 100 nodes in order to have a greater variability. Note that the increase in excess similarity at high distances is an artifact, as the high error bars indicate, of the small number of objects at those distances in the graph. All the above results are averaged over 100 realizations of the process on the same graph with population size N = 10 and number of objects M = 40.