Figure 1.
The overwhelming majority of Spanish tweets are located in Spain and Spanish America but significant contributions arise in certain US states and major Western European and Brazilian cities.
Figure 2.
Geographical distribution of the dominant word for the concepts ‘computer’ (left) and ‘car’ (right).
Map locations are colored according to the most common expression found in the corresponding cell. The area of the circle is proportional to the number of tweets.
Figure 3.
Cumulative variance explained as a function of the number of components.
With components (vertical blue line) we are able to maintain over
of the variance present in the data while significantly reducing the matrix size.
Figure 4.
Characterization of the two superdialects.
A) and silhouette statistics as a function of
. B) Geographical representation of the two clusters,
(red) and
(blue). For visualization purposes we increased the size of each cell. The name of main cities corresponding to superdialect
are shown for clarity. C) Population distribution of the cells corresponding to each cluster.
Figure 5.
Characterization of major cluster β.
Geographical representation of regional dialects. For visualization purposes we increased the size of each cell. Three well separated regions are indicated with dashed lines along with the top 3 dominant words characteristic of that region.