Can we detect conditioned variation in political speech? two kinds of discussion and types of conversation

Sabina J. Sloman; Daniel M. Oppenheimer; Simon DeDeo

doi:10.1371/journal.pone.0246689

Abstract

Previous work has demonstrated that certain speech patterns vary systematically between sociodemographic groups, so that in some cases the way a person speaks is a valid cue to group membership. Our work addresses whether or not participants use these linguistic cues when assessing a speaker’s likely political identity. We use a database of speeches by U.S. Congressional representatives to isolate words that are statistically diagnostic of a speaker’s party identity. In a series of four studies, we demonstrate that participants’ judgments track variation in word usage between the two parties more often than chance, and that this effect persists even when potentially interfering cues such as the meaning of the word are controlled for. Our results are consistent with a body of literature suggesting that humans’ language-related judgments reflect the statistical distributions of our environment.

Citation: Sloman SJ, Oppenheimer DM, DeDeo S (2021) Can we detect conditioned variation in political speech? two kinds of discussion and types of conversation. PLoS ONE 16(2): e0246689. https://doi.org/10.1371/journal.pone.0246689

Editor: Thomas Holtgraves, Ball State University, UNITED STATES

Received: September 30, 2020; Accepted: January 23, 2021; Published: February 11, 2021

Copyright: © 2021 Sloman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data and analysis code that support the findings of this research are openly available at github.com/sabjoslo/talking-politics/tree/master/experiments/conditional-variation.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

What can you tell about someone who addresses a group of people as “you guys” versus “yinz,” or someone who stresses the vowel sound in the word “aunt” or doesn’t pronounce the “r” in “car” [1]?

Socially conditioned variation refers to systematic and idiosyncratic shifts in the language used by members of a particular group [2]. Speech patterns that exhibit socially conditioned variation can be used to identify members of a group [3–7]. For instance, in Glasgow, how a person pronounces the letter “T” reliably indicates that person’s age [7, 8], and in New York the pronunciation of “r” reveals a number of sociodemographic attributes [5, 9]. While language is a vehicle for explicitly-constructed semantic content, structural and systematic variation in language also conveys information about a speaker’s environment and past experiences.

But do listeners take advantage of this variation as a source of social information? Can we learn—without being explicitly taught—to associate glottal stops with younger speakers [8], longer words with male speakers [6, 10], and the phrase “yinz” with Pittsburghers [11]? Recovery of statistically regular patterns is an important part of language acquisition [12–14], suggesting that such learning may be possible.

Previous work has shown that the relative frequency of linguistic signals can indeed be used to discriminate between members of different demographic groups [6, 15, 16]. More generally, people associate certain “linguistic profiles” with members of different communities and cultural backgrounds—although these profiles can reflect misleading stereotypes as well as systematic variation in speech patterns [17]. Our work examines whether people use socially conditioned variation as a cue to a particular form of social identity: political identity. In particular, we investigate whether or not participants respond to the relative frequency of linguistic signals when categorizing speakers as Democrats or Republicans.

Throughout this paper, we distinguish between the conditioned variation in usage and sense of a word. A word’s sense is its meaning, often operationalized as its dictionary definition [18–20]. Cognitively, we think of a word’s sense as contributing to an inference drawn from the concept conveyed by the word. For example, upon overhearing a politician using a word that conveys a money-related concept, such as “financial” or “monetary,” a listener could make an inference about whether the politician is more likely to be a Democrat or a Republican on the basis of the degree to which they associate each party with the concept of money.

However, even when the concept conveyed is held constant, the listener can make an even more informed guess on the basis of the specific word the speaker chose: Did the listener overhear “financial” or “monetary”? Although the two words convey the same concept and have very similar definitions, according to our data Democrats use the phrase “financial” more frequently, while Republicans use the phrase “monetary” more frequently. In other words, if they overheard the politician say “financial,” the listener should infer that the speaker is more likely to be a Democrat, while if they overheard “monetary,” the listener should infer that the speaker is more likely to be a Republican. In this case, the listener would be relying on the conditioned variation in usage of each word.

The phrase “conditioned variation” generally does not convey the nature of the conditioning variable. For example, variation in the pronunciation of the final consonants of words is often a function of the phonetic features they precede or follow [2]. Following Samara, Smith, Brown, and Wonnacott (2017), we refer to linguistic variation that can be anticipated on the basis of demographic or social characteristics of the speaker as socially conditioned variation [2]. We use the term politically conditioned variation to refer to linguistic variation that can be anticipated on the basis of the speaker’s political identity.

The driving question of our work is: Without relying on their sense of what the two parties stand for, can people use politically conditioned variation to make accurate categorization judgments—in other words, learn to pick up on the political analog of “yinz”?

Politically conditioned variation

While other demographic features, such as race, gender and age, can be quickly and accurately identified on the basis of appearance [21], people have virtually no ability to leverage a target’s appearance to accurately determine their political affiliation [22]. To the extent that people are sensitive to politically conditioned variation, verbal cues like word choice could be more reliable in determining a target’s political affiliation.

Knowing whether people can use linguistic cues to infer a person’s political identity has significant practical relevance. In light of research showing that political ideology is a predictor of behavior [23] and interest [24], a general sensitivity to politically conditioned variation could imply that politicians and partisans emit clues about important aspects of their behavior, beliefs and identity without even realizing it. In addition, information about a person’s political affiliations can affect how they are treated: Balliet, Tybur, Wu, Antonellis, and Van Lange (2018) found that in a social dilemma game, partisans cooperated more with members of their political in-group [25]. Knowledge of our sensitivity to politically conditioned variation could help us better understand the kinds and validity of cues we use to make implicit judgments about others (e.g. contributors to our first impressions [26]), and the mechanisms of group formation and appeal (e.g. the effectiveness of techniques such as “dog-whistle politics;” see the general discussion).

However, the fact that listeners often do not know the political affiliation of a speaker might also make it nearly impossible for them to acquire associations between partisan identity and conditioned variation in speech in the first place. While political identity does correlate with observable demographic characteristics, such features are noisy cues of partisanship, and observers may attribute variation in speech patterns to a more salient social category. Thus, while people have been shown to be sensitive to other sources of socially conditioned variation [2, 6, 15, 16], it remains an open question whether this sensitivity extends to ideological categories.

Detecting signals using NLP

With the advent of readily-accessible, large-scale datasets, many researchers have attempted to isolate linguistic variation conditioned on a variety of social identities [27, 28]. Diermeier, Godbout, Yu, and Kaufmann (2012) [29], Jensen, Naidu, Kaplan, and Wilse-Samson (2012) [30] and Gentzkow, Shapiro, and Taddy (2019) [31] also investigate variations in speech patterns between the two major U.S. political parties. Diermeier et al. (2012) build a support vector machine (SVM) classifier of a speaker’s political ideology and perform post-hoc feature analysis to identify the words that were especially informative in the SVM’s classification decisions [29]. Jensen et al. (2012) measure the partisanship of trigrams (contiguous sequences of three words) as the correlation between the frequency with which a speaker utters a given trigram and the speaker’s political identity [30]. Gentzkow et al. (2019) posit a generative model of speaker phrase choice, and derive a measure of phrase-level partisanship from components of the parameterized model [31].

The aforementioned models have provided crucial convergent evidence that there is reliable and detectable politically conditioned variation in language use. However, the question at hand is whether people are capable of picking up on that signal. To test this, we turn to the work of Preoţiuc-Pietro, Xu, and Ungar (2016) [6] who find that human raters are able to correctly detect sociodemographic characteristic of speakers in 70% of cases. Preoţiuc-Pietro et al. (2016) use a log-odds measure (see the following section) to isolate linguistic variation between speakers of various demographic groups (e.g. age and gender) [6]. Their method provides a roadmap to how best extend investigation into the domain of U.S. political ideology.

Foreshadowing our results, we find sensitivity that is considerably less extreme than the findings of Preoţiuc-Pietro et al. (2016) [6], highlighting that the range of factors in the correspondence of our judgments with statistical variation in word usage is far from completely understood.

Method

We used the congressional-record project [32] to access the transcripts of all proceedings in the U.S. House of Representatives between 2012 and 2017, made publicly available as part of the U.S. Congressional Record. The Congressional Record was also the basis of the results reported in the three studies on conditioned variation in political speech summarized above [29–31]. One advantage of the Congressional Record is that it is systematically formatted, allowing us to more easily label the text by matching speakers with entries in databases of members’ party affiliations. Another advantage the Congressional Record has over other corpora of political speech, such as transcripts of the U.S. presidential debates, is that it is substantially larger and has a roughly equal balance of Republican and Democratic speech.

Our corpus consisted of data from each of the most recent five calendars years. (We began data collection in 2018, meaning that 2017 was the last complete calendar year before our studies were run). A time window that was too short would not have yielded enough data for us to recover reliable statistical indicators of linguistic divergence. On the other hand, a time window that was too large would have obscured recent divergences, as politically conditioned speech patterns drift over long periods of time [30, 31]. We chose a five-year window on the basis of our judgment that this time frame would provide a reasonable sample of data which had the power to detect and isolate contemporary patterns of politically conditioned variation.

We first assembled all words spoken by a member of Congress identified as a Democrat or a Republican. While “Republican” and “Democrat” do not exhaust the set of possible political identities, they overwhelmingly dominate the party affiliations of congressional representatives. (At the time of writing, the U.S. House of Congress is composed of 232 Democrats, 196 Republicans and 1 Independent [33]). We then coerced all text in this initial corpus to lowercase, and excluded words we determined were unlikely to reflect meaningful or generalizable variation in word usage, e.g. common prepositions and the names of other sitting members of Congress. The full list of exclusion criteria is included in S1 Appendix. The corpus we used for analysis contains 13,523,319 instances of 16,218 unique words (6,924,484 instances spoken by Democrats and 6,598,835 instances spoken by Republicans).

Language can be analyzed and perceived at many different scales of analysis, e.g. phonemes, word forms, phrases and sentences [16, 34, 35]. We conducted our analyses at the word level primarily for feasibility: Word boundaries are much easier to detect by natural language processing algorithms than the boundaries of phonemes or phrases. It is also interesting to note that some argue that the word is the appropriate unit of analysis in linguistic change and language learning [16, 34, 36].

Measures of relative frequency.

Adapting the approach of Preoţiuc-Pietro et al. (2016) [6], we calculate exact measures of the relative frequency with which a word was spoken by a Democrat [Republican].

We use the log odds that the word was spoken by a member of a given party, shown in Eq 1, as our measure of the conditioned variation exhibited by a word. The conditional probability terms are calculated directly as the empirical probability that a word w was spoken by a Republican [Democrat] according to our corpus. (1)

For example, if our corpus of Republican speech contained only the words “quick,” “brown” and “brown,” and the corpus of Democratic speech contained only the words “brown,” “fox” and “fox,” and . To avoid the discontinuities that arise when some probabilities are 0, we incorporated L1 smoothing in our measurements, i.e. imputed one “phantom” observation of each word in both the Republican and Democratic distributions.

8,345 of the 16,218 unique words in our corpus had a corresponding logodds_R > 0. We refer to these as Republican words. The remaining 7,873 words had a logodds_R < 0. We refer to these words as Democratic words. The mean logodds_R value was.02 (SE = .01). This was significantly greater than 0 (t₁₆₂₁₇ = 1.96; p = .05), indicating that the distribution of logodds_R was shifted to the right of zero: In the absence of other information, odds were slightly higher that a word was spoken by a Republican. Fig 1 shows the distribution of logodds_R values.

Download:

Fig 1. Distribution of logodds_R.

Distribution of the values of logodds_R, our measure of how much more likely a word is to be said by a Republican than by a Democrat, corresponding to each word in our corpus (see text for details; logodds_D = −logodds_R). Republican words (logodds_R > 0) are red, while Democratic words (logodds_R < 0) are blue. The black line shows the approximate density of the distribution.

https://doi.org/10.1371/journal.pone.0246689.g001

The logodds_R measure also closely resembles an element of a traditional model of behavioral response to perceptual inputs. In signal detection theory, the optimal detection threshold is the likelihood ratio of signal to noise: the probability of the stimulus conditional on a signal being present divided by the probability of the stimulus conditional on no signal being present [37, 38].

Validating logodds_R as a measure of politically conditioned variation.

Implicit in our operationalization of politically conditioned variation is the assumption that the logodds_R measure calculated from speech recorded in the Congressional Record captures differentiating patterns in political speech more generally. While we assume that most of our participants have been exposed to political speech by members of both parties, we do not assume that they are regularly exposed to speech on the floor of the U.S. House of Representatives. In this section, we show the extent to which the direction of the political signal—whether or not the word is more often spoken by a Republican or a Democrat—estimated from the Congressional Record cross-validates to a more public-facing corpus of political speech: the U.S. presidential debates.

We accessed transcripts of all the debates held as part of the 2012 and 2016 presidential election cycles (general and primary, presidential and vice presidential, and main and undercard) from the American Presidency Project [39], and pre-processed them in the same way we pre-processed the raw text from the Congressional Record.

In total, 2,408 words (14.85% of the vocabulary from the Congressional Record corpus) appeared in both the Congressional Record and presidential debates corpora. The correlation between the logodds_R values calculated from the Congressional Record and from the debates is.33 (t₂₄₀₆ = 17.412; p <.01). 1,421 of these words (59.01%; SE = 1.00%) have the same estimated polarity (the direction of the sign of the associated logodds_R value) in the two corpora. An exact one-sided binomial test shows that this is significantly greater than chance (p <.01). Overall, there is systematic variation in the speech of Republicans and Democrats that is present in a variety of contexts.

While the correlation between the logodds_R values in the two corpora is highly significant, it is admittedly moderate. While we cannot completely explain the sources of divergence between the distributions in the two corpora, S1 Fig shows that the more politically conditioned variation a word exhibits, the more likely that measure of politically conditioned variation is to generalize across the two corpora. In other words, the stronger the political signal, the more likely it is to operate in both contexts.

Study 1: Testing alignment of judgments with the direction of politically conditioned variation

Before we can test whether human judgments align with politically conditioned variation when word sense is held constant, we first have to determine whether people’s judgments align with politically conditioned variation at all. In Study 1, we test this basic intuition by presenting participants with words that were statistically more likely to have been said by a Democrat or a Republican, and ask them to make judgments about the most likely party identity of the speakers of those words.

All studies reported in the following sections were approved by the Carnegie Mellon University Institutional Review Board under IRB IDs STUDY2018_00000167 and STUDY2017_00000367. We obtained electronic consent from all participants.

Participants

201 subjects completed Study 1 on MTurk. Our use of MTurk as a recruiting tool was driven by two primary considerations: i) convenience and ii) access to a more representative population than we would achieve with in-person samples (even with a local non-university sample, well below 10% of our population would be Republican given the demographics in the city in which we conducted our research). It is worth noting that scholars have documented disadvantages to using MTurk as a recruitment tool, including the possibility of non-naïvety and low quality responses (see Chandler, Mueller, and Paolacci (2014) [40] for a discussion of these issues), although our use of attention and quality checks should have mitigated that to a large degree (see details below). Moreover, a number of scholars have shown that MTurk can yield reliable data [41, 42] (our studies were completed before the “MTurk Crisis” [43] began affecting data quality). However, we cannot rule out the possibility that non-diligent responders corrupted our sample. To the extent that this is the case, we believe that would only serve to reduce our power by introducing noise, making our results conservative estimates of the population effect.

After excluding 54 participants for failing the attention check (described in the following section), our analyzed sample contains 147 participants, including 61 self-identifying Democrats and 38 self-identifying Republicans. These exclusions do not affect our main results. In this and all subsequent studies, we restricted participant eligibility to those of voting age residing in the U.S. Participants in the analyzed sample had a mean self-reported age of 37.66 (SE = .84), and included 74 men and 72 women (1 participant did not report their gender identity). 81.63% of participants reported having voted in the 2016 presidential election.

Methods

After completing a demographics questionnaire, participants were presented with a list of words and asked to “…estimate how likely it is that the word is spoken either by a Democrat or by a Republican [Republican or by a Democrat]” (the full instructions are included in S2 Appendix).

The words “Democrat” and “Republican” were presented in a random order. Participants rated each word on a 6-point scale, from “I am almost certain the speaker is a Democrat” (which we coded as 1) to “I am almost certain the speaker is a Republican” (which we coded as 6). Each page of the survey contained 20 items. For approximately half of participants the presentation order of response options was reversed.

For Study 1, we wanted to use stimuli that both exhibited significant conditioned variation (had a logodds_R with a large magnitude), and were spoken frequently enough by the associated party that participants were likely to have been exposed to the variation in usage. For each word w, we calculated the partial Kullback-Leibler divergence (PKL), a measure that combines the logodds_R with the word’s probability of occurrence (interested readers can consult Klingenstein, Hitchcock, and DeDeo (2014) [44] for further details). Words with a high PKL_D are both more likely to be spoken by Democrats and are spoken frequently by Democrats, while words with a high PKL_R are both more likely to be spoken by Republicans and are spoken frequently by Republicans. In other words, PKL isolates strong and frequent statistical signals of party identity.

As stimuli for Study 1, we selected the 39 words with the highest PKL_D and the 39 words with the highest PKL_R. (Elements of the pre-processing pipeline have changed slightly since these stimuli were selected. All analyses were run using the versions of the metrics calculated as described in the previous section). Three words were excluded from analysis of Study 1 since in subsequent pre-processing they were considered to be Congressional-specific stopwords according to the criteria identified in S1 Appendix: affordable, trump and obama. The reported analyses include 37 Democratic words and 38 Republican words.

Two of these words were randomly reselected. If in response to either of these two attention-check stimuli a participant gave a rating that differed more than one point from their original rating on that stimulus, we removed them from our analysis. In total, each participant was presented with 78 stimuli chosen on the basis of the partisanship of the stimulus (39 Democratic words and 39 Republican words) and two words included as attention checks. The full list of stimuli used for all studies is included in S2 Appendix.

All analyses reported in this paper were conducted in the programming languages Python or R [45–48]. We relied on several packages for statistical analyses and visualization, including but not limited to SciPy [49], scikit-learn [50] and Plotly [51]. All participant data and code for all of our analyses can be found at https://github.com/sabjoslo/talking-politics.

Results

The mean judgment on the Republican stimuli was 3.81 (SE = .08), just above the indifference point of the scale of 3.5 (participants could express maximum indifference with responses of “I am unsure but think the speaker is a Republican” or “I am unsure but think the speaker is a Democrat,” which we coded as a 4 and a 3, respectively). A one-sided, one-sample t-test led us to reject the null that this value was less than or equal to the indifference point (t₅₅₈₀ = 3.77; p <.01). Standard errors reported and used by the inferential tests in this subsection are clustered at the participant and item level using the method in Arai (2011) [52]. Unless stated otherwise, this is the case for results reported in the main results sections for all studies.

The mean judgment on the Democratic stimuli was 3.30 (SE = .12), just below the indifference point. A one-sided, one-sample t-test led us to reject the null that this value was greater than or equal to the indifference point (t₅₄₃₅ = −1.66; p = .05). A one-sided two-sample t-test of the difference in means also led us to reject the null hypothesis that the mean judgment of the Republican words was equal to the mean judgment of the Democratic words (t₁₁₀₁₅ = 3.50; p <.01).

The results of the inferential tests reported thus far demonstrate that in this experiment, judgments do align with the direction of politically conditioned variation. To understand the relative strength of the effect, we calculated a standardized effect size [53]. Because our data included multiple judgments from each participant, a measure like Cohen’s d would be difficult to interpret. We instead used the method of Rouder, Morey, Speckman, and Province (2012) [54] and demonstrated by Westfall (2016) [55]: We used a mixed modeling framework to calculate an effect size, and standardized this effect size by the standard deviation of the residuals of this model. More specifically, we estimated a linear mixed model of the ratings data: We considered the set of each of the individual ratings to be the endogenous variable. We included a fixed effect on a dummy variable indicating whether the word was Democratic or Republican, and considered the estimated coefficient on this variable as our effect size. In addition, we included random effects corresponding to individual participants and items. (We incorporated both participant-level random intercepts, participant-level random slopes on the fixed effect and item-level random intercepts. Since items do not overlap between the groups indicated by our main independent variable—i.e. the sets of Democratic and Republican words are mutually exclusive—it would have been meaningless to specify item-level differences in the fixed effect). We estimated a standardized effect size of.41, which can be interpreted as the effect of the direction of politically conditioned variation on a listener’s judgment of the speaker’s political identity. (Using a version of the model that includes just the fixed effect—which is equivalent to estimating the traditional Cohen’s d—we calculated a standardized effect size of.36). We therefore consider the effect of the direction of politically conditioned variation to be of moderate strength [53, 56].

Item-level analyses.

While clustering standard errors at the participant- and item-level provides some assurance that our effects are not simply driven by high performance on a single item or an especially discriminating participant, it does not completely rule out these possibilities. In the following two subsections, we describe and analyze sensitivity to the direction of politically conditioned variation at the item- and participant-level, respectively. In both sections, we consider a judgment to be accurate if it is above [below] 3.5 and the item is more likely to be said by a Republican [Democrat]. Item-level accuracies are computed on the basis of the mean of all judgments collapsed across participants.

50 of the 75 words (66.67%) were accurately classified (24, or 64.86%, of the Democratic words, and 26, or 68.42%, of the Republican words). The average item-level accuracy was 58.62% (SE = 2.33%), which was significantly different from chance performance of 50% (t₇₄ = 3.70; p <.01).

We also ran a Mann-Whitney U test, a non-parametric alternative to the t-test. We ranked all 75 words by their associated mean judgment, and tested against the null hypothesis that the Republican words were as likely to be ranked above the Democratic words as to be ranked below. The U statistic was 416 (p <.01), leading us to reject the null hypothesis.

In summary, the item-level analyses reaffirmed our conclusion that participant judgments aligned with the direction of politically conditioned variation more often than chance. Fig 2 shows the correspondence between the logodds_R of each stimulus and its corresponding average participant rating. In general, words with a higher logodds_R—words that are more likely to be spoken by a Republican—tend to be judged as more likely be spoken by a Republican.

Download:

Fig 2. Results of Study 1.

The logodds_R of each word against the average rating given to the word. A rating of 6 corresponds to a judgment of “I am almost certain the speaker is a Republican,” while a rating of 1 corresponds to “I am almost certain the speaker is a Democrat.” Words are colored by logodds_R. Vertical bars mark one standard error around the mean of the ratings. The blue and red lines show the estimated linear trends using only the Democratic items and Republican items, respectively.

https://doi.org/10.1371/journal.pone.0246689.g002

Participant-level analyses.

On average, participants accurately classified 58.62% (SE = .59%) of the items (or about 44 out of 75 words), with most participants (127 out of 147) performing better than chance. This number was significantly different from chance performance (t₁₄₆ = 14.66; p <.01). They classified 57.52% (SE = 1.10%) of the Democratic words correctly, and 59.69% (SE = .91%) of the Republican words correctly. Both of these percentages were significantly higher than chance performance.

We also defined a measure of participant-level discriminability as the Cohen’s d between each participant’s judgments on the Republican stimuli and their judgments on the Democratic stimuli. Recall that higher ratings indicate that the participant judged the speaker as more likely to be a Republican, so a participant with a positive Cohen’s d tended to make judgments in the right direction. The mean individual-level Cohen’s d is.40 (SE = .03), which is significantly higher than 0 (t₁₄₆ = 15.56; p <.01).

Discussion

From Study 1, we concluded that participants’ judgments aligned with the direction of politically conditioned variation, and that this effect held for most participants and most items. However, recall that Study 1 did not control for our main theoretical confound: word sense. Studies 2 and 3 examine whether the alignment between human judgment and politically conditioned variation holds even when word sense is controlled for.

Study 2: Controlling for word sense using cosine similarity

In Study 2 we test whether the sensitivity to politically conditioned variation we observed in Study 1 persists when word sense is held constant. We therefore wanted participants to make choices between pairs of words whose corresponding senses were as close as possible. But how does one measure the closeness of two words’ senses?

Work in linguistics and cognitive science suggests that a word’s context is an important contributor to its meaning: If words A and B appear near sets of words C_A and C_B respectively, A and B have similar senses when C_A and C_B are very similar [57, 58]. Consider the example given in the introduction: While the words “financial” and “monetary” may not be used concurrently (the phrase “financial monetary policy” sounds redundant and quite awkward), both words will likely co-occur with words like “policy,” “markets,” and “banks.” This perspective can be summed up in a famous quote by Firth (1957): “You shall know a word by the company it keeps” [59].

Distributional semantics (DS) models are computational instantiations of this perspective: DS models build representations of word meaning on the basis of information about how the words co-occur in natural language data [60]. One popular DS model is word2vec, which projects each word of its input into a common, lower-dimensional space [61, 62]. Words that are closer together in this space are more semantically similar [63, 64]. One common way to measure this distance is using cosine similarity, or the cosine of the angle between the vector representations of two words. We therefore operationalized the word sense similarity of two words as their cosine similarity.

Participants

175 subjects completed Study 2 on MTurk. After excluding 79 participants for failing our attention check, our analyzed sample contains 96 participants, including 50 self-identifying Democrats and 19 self-identifying Republicans. These exclusions do not affect our main results. Participants in the analyzed sample had a mean self-reported age of 40.68 (SE = 1.12), and included 37 men and 59 women. 80.21% of participants reported having voted in the 2016 presidential election. Participants completed the demographics questionnaire after the main survey.