Latent human traits in the language of social media: An open-vocabulary approach

doi:10.1371/journal.pone.0201703

Fig 1.

Word clouds showing the most/least correlated words for each factor (with rotation).

Word clouds showing the most/least correlated words for each FA factor (with rotation) as obtained using Differential Language Analysis ([11]). The larger the word, the more strongly it correlates with the factor (For all word clouds shown, FDR correction has been done to only show significant words. Also spatial location does not code for anything.). Color indicates frequency (grey = low use, blue = moderate use, red = frequent use).

More »

Expand

Fig 2.

Correlations between the learned factors and the Big5 factors.

More »

Expand

Fig 3.

Individual factor correlations with outcomes.

Note how F4 which captures the use of swear words negatively correlates with Satisfaction with Life (SWL).

More »

Expand

Fig 4.

Questions (left of each factor) and Likes (right of each factor) that correlate the highest (green) and lowest (pink) for each of our 5 behavioral-linguistic trait factors.

More »

Expand

Fig 5.

Word clouds showing the effect of a rotation.

Word clouds showing the effect of a rotation. A rotation yields markedly distinct factors. Note the absence of words like “paste this” in the rotated version in multiple factors as opposed to the unrotated version where multiple factors are characterized by words like “paste this” and “status update”. The larger the word, the more strongly it correlates with the factor. Color indicates frequency (grey = low use, blue = moderate use, red = frequent use) [11].

More »