Fig 1.
Language abundance distributions.
Plots (a,c,e,g,i,k) show the histograms of the language abundance distributions and the fitted Allen-Savage (red line) and lognormal (blue line) distributions. The bins are centered in integer numbers, n, and have borders at n±0.5 (see, [7]). Plots (b,d,f,h,j,l) show the observed (red line), and fitted (black line) rank abundance distributions. Languages are ranked from the most abundant language on the left-hand side of the x-axis to the least abundant on the right-hand side. The errors bars correspond to 95% confidence intervals and were obtained for each country by sampling 200 times a number of points equal to the number of languages from a distribution with parameters corresponding to the maximum likelihood estimates. The Solomon Islands, Cameroon and Papua New Guinea, provide examples of very good fittings (a,b,c,d,e,f). Colombia is the typical example of a country with a dominant language, in this case Spanish, and yields a poor fit (g,h). The fitting improves considerably when we remove Spanish (red dashed line in plot g). Indonesia and Philippines are examples of poorly fitting curves that have a large plateau at intermediate language abundance classes (i,j,k,l).
Table 1.
List of countries with more than 50 languages, their number of languages, number of individuals, J, and the maximum likelihoods of θ = 2Jν, ν (the glossogenesis rate), Ps (the fraction of the number of individuals, relatively to J, of an incipient language), and Js = Ps *J.
Countries with name in italic correspond to cases of poor fitting, as revealed by the rank abundance plots. See (S1 Appendix) for the complete list of countries.
Table 2.
Corrected Akaike Information Criterion values (AICc) for the Allen-Savage (AS) and lognormal (logn) distributions, their weights (w) (Burnham and Anderson 2010) and their ratio, wAS/wlogn, for the countries with more than 50 languages.
See (S1 Appendix) for the complete list of countries.
Fig 2.
Development of a plateau in the Allen-Savage distribution.
The black bars are the histograms of 268 points sampled randomly from an Allen-Savage distribution with parameters θ = 56 and PS = 2.3x10-4 (the same parameters as Cameroon). These data points were then multiplied by (Dt+1) = 1.25 and 1.5, the latter being shown in grey bars (bars for (Dt+1) = 1.25 not shown). The lines are the fits of Allen-Savage distributions obtained with likelihood methods. Notice the development of a plateau for intermediary abundance classes when (Ct+1) increases. The bins are centered in integers numbers, n, and have borders at n±0.5.
Fig 3.
Truncated language abundance distributions.
(a) Australia (232 languages) and (b) United States (156 languages). Among all the countries studied, these were the only distributions that did not conform to the bell-shaped pattern. These skewed distributions reflect the decreasing sizes and higher extinction rates of low-abundance languages in these countries. The blue curves are the best-fit lognormal distributions.