Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach

Figure 1

The infrastructure of our differential language analysis.

1) Feature Extraction. Language use features include: (a) words and phrases: a sequence of 1 to 3 words found using an emoticon-aware tokenizer and a collocation filter (24,530 features) (b) topics: automatically derived groups of words for a single topic found using the Latent Dirichlet Allocation technique [72], [75] (500 features). 2) Correlational Analysis. We find the correlation ( of ordinary least square linear regression) between each language feature and each demographic or psychometric outcome. All relationships presented in this work are at least significant at a Bonferroni-corrected [76]. 3) Visualization. Graphical representation of correlational analysis output.

Figure 1