Fig 1.
Process for creating datasets.
This flowchart describes the process of creating the Cross Sectional and Longitudinal datasets.
Table 1.
Counts of unique US users per sample and year.
Table 2.
Example rows of annual token data.
Fig 2.
Token prevalence distributions per year for the longitudinal sample.
Note that the x-axis is on a log scale. There are a small number of high-prevalence tokens and large numbers of low-prevalence tokens. This accords with general expectations of word usage.
Table 3.
The 20 most-surprisingly common tokens from the longitudinal sample in 2020.
Fig 3.
Distribution of estimated annual change in prevalence for all unique tokens within the longitudinal sample.
Prevalence is stable (i.e. zero change) for many tokens. Note from the min and max annotations that extreme values are present in the data but not pictured here. See Tables 4 and 5 for illustration.
Table 4.
Top 20 winner tokens in the longitudinal sample.
Table 5.
Top 20 loser tokens in the longitudinal sample.
Fig 4.
Distribution of estimated annual change in prevalence for all unique tokens within the cross-sectional sample.
Prevalence is stable (i.e. zero change) for many tokens. Note from the min and max annotations that extreme values are present in the data but not pictured here. See Tables 6 and 7 for illustration.
Table 6.
Top 20 winner tokens in the cross-sectional sample.
Table 7.
Top 20 loser tokens in the cross-sectional sample.
Fig 5.
The interface for the web tool: Jason J. Jones Identity Trends V1.
The tool is publicly accessible at https://jasonjones.ninja/jason-j-jones-identity-trends-v1/.
Fig 6.
Example results from the web tool.