Table 1.
Basic metrics of the data set.
Figure 1.
Multiscale view of the geolocated Twitter signal.
The large number of geolocated Twitter traffic allows for a high resolution characterization of human behavior. A) Europe B) Italy C) Lazio region D) Rome. The squares highlight the zooming areas.
Figure 2.
Ranking of countries by users per capita.
Ranking of countries as per average number of Twitter users over a population of individuals.
Figure 3.
Correlation between country level Twitter penetration and GDP/capita. The adjusted R value of the fit is 0.56.
Figure 4.
Probability density of user activity (number of daily tweets N) grouped by country (A) and language (B), and by country while considering English tweets exclusively (C). Different curves collapse naturally, without any functional rescaling, indicating the presence of a seemingly universal distribution of users activity, independent from cultural backgrounds. Countries in panel (A) and (C) are characterized by high Twitter penetration and represent different continents, while the languages in panel (B) are selected from those producing very strong signal. Dashed lines represent log-normal distributions
, with
and
for (A),
and
(B), and
and
(C).
Figure 5.
Languages ranked by total number of users. For clarity, only languages with more than users are shown.
Figure 6.
Geographic distribution of languages around the world.
A) Raw Twitter signal. Each color corresponds to a language. Densely populated areas are easily identified, while, as expected, languages are well separated among European countries. B) Dominant language usage. The color of each country indicates the fraction of users adopting the official language in tweets. Gray represent countries without statistically significant signal.
Figure 7.
Language share of the most active countries.
Language adopted by users coming from Top most active countries, ordered by number of English tweets.
Figure 8.
Language polarization in Belgium and Catalonia, Spain.
In each cell ( resolution) we compute the user-normalized ratio between the two languages being considered in each case. A) Belgium. B) Catalonia. The color bar is labeled according to the relative dominance of the language denoted by blue. In Belgium, English accounts for
of the language share.
Figure 9.
Language polarization in Montreal, QC, Canada.
English and French are considered. In each cell () we compute the user-normalized ratio between English and French (excluding all other languages). Blue - English, Yellow - French. The color bar is labeled according to the relative dominance of English to French.
Figure 10.
Language polarization in New York City, NY, USA.
The second language by district or municipality (in the case of New Jersey state) is shown. Blue - Spanish, Light Green - Korean, Fuchsia - Russian, Red - Portuguese, Yellow - Japanese, Pink - Dutch, Grey - Danish, Coral - Indonesian.
Figure 11.
Monthly variations in Language use.
Fraction of minority languages in specific countries as a function of the month. Increases in a specific language share indicate the presence of tourists visiting the country. Peaks are clearly visible during the local summer period.