Modeling competitive evolution of multiple languages

Increasing evidence demonstrates that in many places language coexistence has become ubiquitous and essential for supporting language and cultural diversity and associated with its financial and economic benefits. The competitive evolution among multiple languages determines the evolution outcome, either coexistence, decline, or extinction. Here, we extend the Abrams-Strogatz model of language competition to multiple languages and then validate it by analyzing the behavioral transitions of language usage over the recent several decades in Singapore and Hong Kong. In each case, we estimate from data the model parameters that measure each language utility for its speakers and the strength of two biases, the majority preference for their language, and the minority aversion to it. The values of these two biases decide which language is the fastest growing in the competition and what would be the stable state of the system. We also study the system convergence time to stable states and discover the existence of tipping points with multiple attractors. Moreover, the critical slowdown of convergence to the stable fractions of language users appears near and peaks at the tipping points, signaling when the system approaches them. Our analysis furthers our understanding of multiple language evolution and the role of tipping points in behavioral transitions. These insights may help to protect languages from extinction and retain the language and cultural diversity.

Here, we answer the questions raised above by using real-world language evolution data from Singapore [28] and Hong Kong [29,30,31] to find optimal parameters and validate the extended Abrams-Strogatz model [27]. The model parameters found in this process drive the utilities of competing languages, the strength of majority preference for the most popular language and minority aversion to this language and measure this utility's impact on language evolution. Currently, all the modeled languages coexist. We investigate the behavioral transitions of the languages under the perturbations of each parameter. We find that the language with the highest language utility tends to grow faster and eventually gain the largest fraction of speakers. Moreover, when the majority preference is small than a certain critical value, the popular languages may lose its leading position during the evolution process. Finally, when the minority aversion is not sufficiently strong, the languages with small (including the language with the smallest) initial fractions of speakers may gain the largest fraction of speakers. From the above analysis, we obtain the complete phase diagram for each community, showing the evolution of each language in each dataset and the relation between transitions and parameters. Secondly, we analyze the relation between convergence time and state of competing languages, and show that the competition arises to the highest level when the language dominance switches from one language to another. Finally, we illustrate individual and combined effects of two language biases, the majority preference and the minority aversion, by simulating how languages with the largest initial fractions of speakers are affected.

DATA AND MODELING
To model the real world language competitions, we use dataset of languages used in Singapore (the whole country), languages used in the Chinese community of Singapore, languages used in Indian Community of Singapore, and languages used in Hong Kong. We consider speakers of one language in our dataset as people who consider this language as their primary language.
Singapore. In the 1950s, dialects such as Hokkien were the most widely spoken language in Singapore. In the 1957 census, about 1.8% people mainly spoke English, about 0.1% people mainly spoke Chinese Mandarin. However, after the implementation of a series of policies from the 1950s to present, the proportions of speakers of different languages in Singapore considerably changed. Until 2010, English and Chinese Mandarin became the most spoken languages with speakers proportion of 32.3%, 35.6%, respectively in Singapore's entire country [28].
Hong Kong. We use language data collected between 1949 to 2016 in Hong Kong [30, 31,29]. The number of people who mainly speak English increased in these 67 years, surpassing the number of people who mainly speak Hakka, Hoklo, or Sze Yap. We do not include Cantonese speakers in our dataset since the number of people who use Cantonese as their common language is much larger than the speaker population in our dataset. We normalize all languages in our dataset before fitting then into our model. We employ the extended Abrams-Strogatz model [27] to test the competition among multiple languages, where x i is the fraction of the population speaking language i, and P ij represents the transition rate from language j to language i.
where β(≥ 0) and α − β(≥ 0) represent the strength of the majority preference and the minority aversion, respectively. s i > 0 is the utility of language i, and n i=1 s i = 1. We utilize numerical simulation to compute the parameters in our model. The following equation calculates the difference: where i is the time step (we use year as a time step unit) varying from 1 to n which is the number of total time steps, j is the index of languages varying from 1 to m, which is the number of languages. x i,j is the rational value of fraction of language j users at time step i, x i,j is the theoretical value of a fraction of language j users in time step i.
To fit this model to our language data, we set a range for parameter majority preference β and minority aversion α − β, and then, iterating over this range of majority preference and minority aversion, we evaluate possible values for language utilities s 1 , s 2 ...s m . Given β and α − β to minimize the difference D between real language fractions and theoretical language fractions we calculate from Eq. 1. Then we repeat narrowing range of parameters and doing grid-search for language utilities, β, and α − β to keep increasing the precision of our theoretical model. We end the repetition until our theoretical model is close enough to the real world language data. Note that since parameter s i , which represents state utility in extended Abrams-Strogatz model, acts as a useful or advantageous factor or feature of state i, in our language competition model, we change the definition of s i to the utility of language i.
By fitting the language data into extended Abrams-Strogatz model, we compare the real evolution processes in different districts and their corresponding simulation results in Fig. 1. It is clear that the model successfully represents the behavior of the data, correctly showing the evolution of each language. In Singapore, the whole country dataset, Dialect starts with about 0.975 fractions of speakers and is surpassed by Mandarin in 1996, by English in 1997. As for Chinese community of Singapore dataset, the trend of speakers in a different language is similar to that of speakers in Singapore the whole country dataset where English and Mandarin gradually replace dialects: it starts with about 0.766 fractions of speakers and is surpassed by Mandarin in 1994, by English in 2001. In the Indian community of Singapore dataset, Tamil started with the most substantial fraction (about 0.613) of speakers but continually lost its speakers and eventually was exceeded by English in 2003. The commonality of these three datasets is the increase of English speakers, which might be caused by the increasing number of English medium schools [28] in this period. The increasing usage of Mandarin in Singapore the whole country dataset and Chinese community dataset might benefit from "Speak Mandarin Campaign" implemented in 1979 in Singapore [32]. In Hong Kong dataset, Sze Yap, which is a Chinese vernacular in Hong Kong, owns the most substantial fraction of speakers(about 0.578) at the beginning, but is gradually replaced by the remaining three languages, and it almost goes to extinction in 1999.

STATE DEFINITION
We will refer to the language with the largest number of speakers as the most popular, but if this language drives the competing languages to extinction, we will refer to is as dominant.
Here we need to define each state in the evolution of language competition. In our simulations, two distinct states are defined as: "coexistence state" and "dominance state". "Coexistence state" arises if at least two languages survive but in this state, an extinction of some languages is still possible. In contrast, in "dominance state", the survival of one language leads to the extinction of all others. The "coexistence state" and "dominance state" are illustrated in Fig. 2, where the left column shows example of "coexistence state" with its title showing the language with largest fraction of speakers, while the right column displays examples of "dominance state". Note that the "coexistence state" and "dominance state" here all refer to the state after fractions of speakers for all languages converge to their steady states.
As shown in Fig. 2A, during language competition, Dialect's fraction of speakers drops dramatically, while English's fraction of speakers increases. Eventually, none of the competing languages disappear, with English being the most  Fig. 2E, Mandarin owns the largest fraction of speakers and stays in "coexistence state," while in Fig. 2F, Mandarin leads to the extinction of all other languages. As we took the Singapore dataset as an example to define different states, we are now able to describe future simulations in a more straight forward way.

STATE UTILITY s i
In the extended Abrams-Strogatz model, s i represents the utility of language i. Accordingly, in our language dataset, each language i has its own s i representing its utility. We analyze the relation between s i and the competition between language i and other languages when s is in the range [0, 0.6]. Since the total of all language utilities is by definition 1, in each simulation, with the increasing of the utility of one language, increasing utility of one language decreases utilities of other languages proportionally to their current utility. In the Singapore whole country dataset (Fig. 3ABC   In Chinese community dataset (Fig. 3DEF), we set majority preference β = 0.63 and minority aversion α − β = 0.36. In Fig. 3D In Indian community dataset (Fig. 3GHI), we set majority preference β = 0.21 and minority aversion α − β = 0.82. In Fig. 3G, when language utility of English s i ∈ [0, 0.32], Tamil, which is the language with the largest utility, is dominant. Then the system reaches "coexistence state" as the utility of English further increases. When s i exceeds 0.44, English starts to be dominant. In Fig. 3H, when language utility of Tamil s j ∈ [0, 0.33], English is dominant. When s j ∈ [0.34, 0.45], the system is in "coexistence state". When s j > 0.45, Tamil becomes dominant. In Fig. 3I, English is dominant for the initial range of s k , then its fraction of speakers decrease significantly, causing other languages to grow as the system enters "coexistence state." When s k > 0.43, Malay is dominant.
In Hong Kong dataset (Fig. 3JKL), we set majority preference β = 0.987 and minority aversion α − β = 0.095. In From these simulations, we find that when the language utility s i of language i is relatively small, one of the other languages, which is usually the language with the highest language utility, tends to be dominant. As s i increases, languages might come into "coexistence state," which acts as a transition period for language i to become dominant. Moreover, when s i is large enough, language i becomes dominant.

CONVERGENCE TIME
It is notably hard to predict the critical transition from one state to another because the state of the system may show little change before the tipping point [33]. Critical slowdown [34] defined in statistical physics is an indicator for early warning signals with applications to many fields, ranging from the economy [35] to ecology [36]. Here we employ the convergence time as the early warning signals for the behavioral transition in the language competition. Fig. 4 shows the convergence time of different datasets under different language competitions, where the x-axis represents the initial fraction of one language, the left y-axis represents the equilibrium fraction of each language, and the right y-axis represents the convergence time τ . In Fig. 4A, τ reaches a peak when the initial fraction of Dialect increases to 0.56 and Dialect replaces English as dominant language. Similarly, at τ in Fig. 4B, when the system transitions from "dominance state" to "coexistence state", τ reaches its peak. In Hong Kong datasets, we find similar outcome. The convergence pattern observed in Fig. 4C is similar to the one seen in Fig. 4A since the peak of τ happens when the dominance switches from from one language to another. As for Fig. 4D, the peak of τ happens when the system transitions from "coexistence state" to "dominance state," which is exactly opposite to the transition in Fig. 4B, yet they show similar patterns of convergence time.
From these simulations, the peak of convergence time happens when the state transition happens of either switching the dominant language or moving from "coexistence state" to "dominance state," or vice versa. Such transitions can be caused by the comparable competing strength of different languages. The convergence time enables us to identify the "tipping point" of the system parameter values giving the control system enough time to prevent unwanted transition.

SENSITIVITY TO MAJORITY PREFERENCE AND MINORITY AVERSION
Here, we focus on how majority preference and minority aversion can affect language competitions. For each dataset, we set all parameters and languages' initial fractions of speakers to the values used in modeling and data section except for majority preference β and minority aversion α − β.
In Fig. 5A, we have β = 0.726 and α−β ranging from 0 to 1 with step 0.01. In the range of α−β ∈ [0, 0.31], Mandarin has the largest fraction of speakers, and all languages are in "coexistence state". In the range of α − β ∈ [0.32, 0.37], Mandarin is dominant, causing the extinction of Dialect and English. When the minority aversion start to exceed 0.38, then Dialect, which is the language with the largest initial fraction in this dataset, starts to be dominant. In Fig. 5C, in the range of α − β ∈ [0, 0.4], the system stays in "coexistence state" and English is dominant. When α − β further increases, English still is dominant, but as the minority aversion becomes large enough, language with the largest initial fraction (Dialect) of speakers starts to be dominant. In Fig. 5E, the system starts in "coexistence state" and English, which is the language with the largest language utility (s = 0.4), is dominant in the range of α − β ∈ [0, 0.96]. Then, Figure 4: We gradually increase the initial fraction of one language to observe the relation between times used for all languages to reach the steady state of their fractions of speakers and the distance to the tipping point for the initial fraction. (A) When the language dominance switches from one language to another, time to achieve the steady state of each language speakers' fractions reaches a peak. (B) Similar to subfigure (A), in Hong Kong dataset, when the language dominance switches from one language to another, time to achieve the mentioned above steady state reaches a peak. (C) When the system transitions from 'coexistence state' to 'dominance state', time to achieve the mentioned above steady state again reaches a peak. (D) Before the system transitions from 'dominance state' to 'coexistence state', time to achieved the mentioned above steady state reaches a peak.
when α − β ∈ [0.98, 1.02], English is again dominant. Tamil, the language with the largest initial fraction of speakers, is dominant when its minority aversion is larger than 1.02. In Fig. 5G, when α − β ∈ [0, 0.07], English, which is the language with the largest language utility (s = 0.297), is most popular, but the system is in "coexistence state". . This may be caused by the two languages' similar level of competitiveness determined by both initial fraction and utility. However, when minority aversion is high enough (larger than 0.71), Sze Yap, which is the language with the highest initial fraction, becomes dominant.
From these simulations, we conclude that low value of minority aversion favors the growth of languages with small initial fraction, and usually the language with the largest language utility s among them, can gain the largest fraction of speakers. In contrast, high minority aversion favors the growth of languages with the largest initial fraction of speakers.
in Fig. 5B we set the minority aversion α − β = 0.28, and vary the majority preference β from [0, 1]. Similar to language patterns in Fig. 5A, languages start in "coexistence state", with Mandarin owning the largest fraction of speakers when β ∈ [0, 0.74]. When the majority preference increases to (β ∈ [0.75, 0.8]), Mandarin becomes dominant, even though it had the smallest initial fraction of speakers. As the majority preference further increases and until it reaches 0.82, Dialect, the language with the largest initial fraction, starts to be dominant.
In Fig. 5D, when α − β = 0.36, the system stays in "coexistence state", with Mandarin being dominant in the range of β ∈ [0, 0.66]. Then as β increases to (β ∈ [0.67, 0.83]), Mandarin is dominant for a short range of β because for β > 0.83, Dialect, the language with the smallest initial fraction, is dominant. In Fig. 5F, at the beginning with (β ∈ [0, 0.3]), English, which is the language with the largest language utility, has the largest fraction of speakers and all languages are in "coexistence state." For β ∈ [0.31, 0.34], English is dominant, but for larger β, Tamil becomes dominant. Similarly, as shown in Fig. 5H, the system transitions from "coexistence state" to "dominance state", and the language with the largest initial fraction becomes dominant when majority preference is high enough; in this case, for β ∈ [0.76, 0.96] English is dominant, while for β ∈ [0.98, 1.16], Hakka takes over this role. Finally, when β > 1. 16, Sze Yap, the language with the largest initial fraction, is dominant.
From these four simulations, we find that when majority preference is small, languages with small initial fraction (usually the language with the largest language utility) might own the largest fraction of speakers, and becomes dominant. However, when majority preference is high enough, the language with the largest fraction of speakers usually becomes dominant, driving other language to extinction.

PHASE DIAGRAM
As for the combined effect of majority preference and minority aversion, we still use the language initial fraction of speakers and parameters from the data and modeling section. Here, we only consider the language with the largest fraction of speakers. In Fig. 6A, when majority preference β ∈ [0, 0.98], three different kinds of patterns appear. In the first pattern, all three languages (English, Dialect, and Mandarin) coexist and Mandarin is the most popular. The next pattern has Mandarin dominant. The third pattern has Dialect dominant. In this case, when minority aversion α − β is large enough, Dialect, which is the language with the most speakers, is dominate. As β increases from 0 to 0.98, Dialect becomes more and more likely to become dominant because the range of α − β for Dialect to be in this role becomes larger and larger. When β ∈ [0.98, 1.1], two different patterns arise. In the first, Mandarin is dominant while in the second, it is Dialect which is dominant. When β exceeds 1.1, Dialect is dominant. Fig. 6B shows that for the majority preference β ∈ [0, 1], again three different patterns arise. In the first one, three languages (English, Tamil, Malay) coexist and Tamil is the most popular. In the second pattern, English is dominant, while in the third it is Tamil that is dominant. Similar to Fig. 6(A), the language with the largest utility (Mandarin in Fig. 6A and English in Fig. 6B) tends to be the most popular when the majority preference and minority aversion are small. With β ∈ [1, 1.02], two different patterns are present. In the first, again three languages coexist, and English is the most popular. The second has English dominant. The third pattern arises when β > 1.02, Tamil, which is the language with the largest initial fraction of speakers, is dominant. Fig. 6C shows that when minority aversion α − β ∈ [0, 0.04], four different patterns arise. In the first, again three languages (English, Dialect, Mandarin) coexist and Mandarin has the most speakers. The second has Mandarin is dominant, while in the third pattern it is English that is dominant. Finally in the fourth case Dialect is dominant. In this case, even English can be dominant over a short range of parameters, because of its comparably high initial fraction and language utility; it has the second largest initial fraction and the second largest language utility. With α − β ∈ [0.06, 0.56], the first three patterns from the case of lowest β reappear. Again, Mandarin is the language with the largest utility and it is the most when minority aversion and majority preference are small. When α − β is greater than 0.56, Dialect is dominant. In Fig. 6D , the previous patterns, the first, the second, and the fourth reappear. In this dataset, English has the largest language utility and owns the largest fraction of speakers when the majority preference and minority aversion are small.
When the majority preference and the minority aversion are relatively small, competing languages tend to coexist with each other and language with the largest utility tends to own the most speakers. Hence, when the majority preference and the minority aversion are relatively small, they affect language competition weakly. It is the language utility that plays an essential role in this competition. As the majority preference and the minority aversion increase, some languages become dominant. When the majority preference and the minority aversion further increase, the language with the largest initial fraction is dominant, indicating that when the majority preference and the minority aversion are large enough, they will favor the growth of the language with the largest initial fraction of speakers and make it dominant.

DISCUSSION
Here we provide a model and its validation using four real-world language competitions involving several languages that extends the Abrams-Strogatz model in the important direction. The model fits well with the real data, as shown in the first section, enabling us to further analyze factors affecting the language competition in detail. Our contributions can be summarized as follows.
1. We show that language utility affects the competitive evolution of communities using several languages. When the utility of a language is low enough, it might go extinct, but when this utility is high enough, it can be dominant and drive other languages to extinction. However, it is also possible that with the value of this utility in mid-range the system can be in a "coexistence state".
2. The relation between convergence time and state transition of languages shows that convergence time to steady state fractions of language users reaches a peak at the state transition tipping points. Such critical slowdown can be caused by similar competitiveness of the different competing languages. At the tipping points either one dominant language is replaced by another, or the system transitions from a "dominance state" to "coexistence state" or vice versa.
3. We demonstrate the influence that the majority preference and the minority aversion can separately exert on competing languages. When majority preference is small, a language with small initial fraction of speakers (including the language with the smallest initial fraction of speakers) can have the most speakers after the language competition, and even lead to the extinction of other languages. When the majority preference is large enough, the language with the largest initial fraction of speakers will win the language competition, usually driving all other languages to extinction. The simulations with varying the minority aversion yield similar results as described above.
4. We discuss also the influence that the majority preference and the minority aversion can together exert on the evolution of competing languages. When both of these biases are relatively low, a language with the small initial fraction of speakers can gain the most speakers in the steady state, and become dominant. When both biases are high enough, the language with the most speakers initially is most likely to be dominant. Moreover, there are variants of these conditions yielding results other than the two reported above, as shown in Fig. 6C where English is dominant as shown in Fig. 6D where Hakka is dominant.
Our simulations illustrate the results of the language competition under various conditions, providing examples of the impact of the language utility, the majority preference, and the minority aversion on the competition outcomes. Yet, analytical formulas defining quantitatively the competitive evolution of languages as a function of time and different parameters of the model are not known yet. Moreover, our simulations are completed without considering geographical [37], physiological [38], and other factors. Hence, constructing a conclusive formulation of the language competition in the real world requires futures future research.

DATA ACCESSIBILITY
All data needed to evaluate the conclusions in the paper are present in the paper itself or are available upon request from the authors.

COMPETING INTERESTS
We declare we have no competing interests.