The Twitter of Babel: Mapping World Languages through Microblogging Platforms

doi:10.1371/journal.pone.0061981

Table 1.

Basic metrics of the data set.

More »

Expand

Figure 1.

Multiscale view of the geolocated Twitter signal.

The large number of geolocated Twitter traffic allows for a high resolution characterization of human behavior. A) Europe B) Italy C) Lazio region D) Rome. The squares highlight the zooming areas.

More »

Expand

Figure 2.

Ranking of countries by users per capita.

Ranking of countries as per average number of Twitter users over a population of individuals.

More »

Expand

Figure 3.

Users and GDP per capita.

Correlation between country level Twitter penetration and GDP/capita. The adjusted R value of the fit is 0.56.

More »

Expand

Figure 4.

User Activity.

Probability density of user activity (number of daily tweets N) grouped by country (A) and language (B), and by country while considering English tweets exclusively (C). Different curves collapse naturally, without any functional rescaling, indicating the presence of a seemingly universal distribution of users activity, independent from cultural backgrounds. Countries in panel (A) and (C) are characterized by high Twitter penetration and represent different continents, while the languages in panel (B) are selected from those producing very strong signal. Dashed lines represent log-normal distributions , with and for (A), and (B), and and (C).

More »

Expand

Figure 5.

Languages by number of users.

Languages ranked by total number of users. For clarity, only languages with more than users are shown.

More »

Expand

Figure 6.

Geographic distribution of languages around the world.

A) Raw Twitter signal. Each color corresponds to a language. Densely populated areas are easily identified, while, as expected, languages are well separated among European countries. B) Dominant language usage. The color of each country indicates the fraction of users adopting the official language in tweets. Gray represent countries without statistically significant signal.

More »

Expand

Figure 7.

Language share of the most active countries.

Language adopted by users coming from Top most active countries, ordered by number of English tweets.

More »

Expand

Figure 8.

Language polarization in Belgium and Catalonia, Spain.

In each cell ( resolution) we compute the user-normalized ratio between the two languages being considered in each case. A) Belgium. B) Catalonia. The color bar is labeled according to the relative dominance of the language denoted by blue. In Belgium, English accounts for of the language share.

More »

Expand

Figure 9.

Language polarization in Montreal, QC, Canada.

English and French are considered. In each cell () we compute the user-normalized ratio between English and French (excluding all other languages). Blue - English, Yellow - French. The color bar is labeled according to the relative dominance of English to French.

More »

Expand

Figure 10.

Language polarization in New York City, NY, USA.

The second language by district or municipality (in the case of New Jersey state) is shown. Blue - Spanish, Light Green - Korean, Fuchsia - Russian, Red - Portuguese, Yellow - Japanese, Pink - Dutch, Grey - Danish, Coral - Indonesian.

More »

Expand

Figure 11.

Monthly variations in Language use.

Fraction of minority languages in specific countries as a function of the month. Increases in a specific language share indicate the presence of tourists visiting the country. Peaks are clearly visible during the local summer period.

More »

Expand