Abstract social categories facilitate access to socially skewed words

Jennifer Hay; Abby Walker; Kauyumari Sanchez; Kirsty Thompson

doi:10.1371/journal.pone.0210793

Abstract

Recent work has shown that listeners process words faster if said by a member of the group that typically uses the word. This paper further explores how the social distributions of words affect lexical access by exploring whether access is facilitated by invoking more abstract social categories. We conduct four experiments, all of which combine an Implicit Association Task with a Lexical Decision Task. Participants sorted real and nonsense words while at the same time sorting older and younger faces (exp. 1), male and female faces (exp. 2), stereotypically male and female objects (exp. 3), and framed and unframed objects, which were always stereotypically male or female (exp. 4). Across the experiments, lexical decision to socially skewed words is facilitated when the socially congruent category is sorted with the same hand. This suggests that the lexicon contains social detail from which individuals make social abstractions that can influence lexical access.

Citation: Hay J, Walker A, Sanchez K, Thompson K (2019) Abstract social categories facilitate access to socially skewed words. PLoS ONE 14(2): e0210793. https://doi.org/10.1371/journal.pone.0210793

Editor: Shane Lindsay, University of Hull, UNITED KINGDOM

Received: November 8, 2018; Accepted: December 24, 2018; Published: February 4, 2019

Copyright: © 2019 Hay et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Data are available from https://github.com/jenniferhay/hayetal-plos2019.

Funding: A Rutherford Discovery Fellowship, (https://royalsociety.org.nz/what-we-do/funds-and-opportunities/rutherford-discovery-fellowships/), grant number RDF-UOC1001, contributed to salary for the 1st, 3rd and 4th author, to participant payment, and travel money for the 2nd author. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The distribution of word usage across different speaker groups is not even. Some groups may talk about certain topics more than others or use different words to talk about the same topics. This leads to certain words being used more by some types of speakers than others. For example, the word children is more likely to be said by women than men [1]. Therefore each word has a unique social distribution in which it is encountered, or–in the words of Bakhtin–“[e]ach word tastes of the context and contexts in which it has lived its socially charged life” [2](p.293). Recent results suggest that speakers and listeners are sensitive to the differing social distributions of individual words. For example, in speech perception, words are accessed faster when produced in experience-congruent voices [3, 4], and in speech production, these words are more likely to be produced with experience-congruent phonetic variants [5, 6].

Such findings have been used to argue for exemplar accounts of language and memory, in which cognitive representations of words consist of accumulated distributions of past experiences (exemplars) of the word, complete with acoustic and contextual detail [7–10]. In such models, a person’s mental representation of the word bead, for example, would be a distribution of remembered exemplars of previously experienced “bead” utterances. Speech production involves selecting an exemplar from this distribution, or the average of many exemplars, to emulate [10], and speech perception involves matching the incoming signal to existing exemplars based on acoustic and contextual similarity [11]. Such models are supported by specificity effects. For example, German et al. [12] found that North American participants trained on the speech of a Glaswegian speaker were best at producing Glaswegian variants in the particular words used at training (see also [13–15], and others). This is difficult to explain without arguing that the representations of the specific words heard in the Glaswegian accent were individually affected by the training experience.

In a similar way, results showing listener and speaker sensitivity to word distributions can also be explained by word-specific, phonetically-detailed memories. If a listener has encountered a word most often from older speakers, then hearing that word in an older voice should facilitate lexical access due to a phonetic match between the voice of the stimuli and the voices that dominate the word’s cognitive representation. Such an account can predict congruency effects between word experience and stimulus voice, as shown in Walker and Hay [3]. Similarly, in production, a word used more often by older speakers will be stored with a higher proportion of ‘older’ productions, and thus be more likely to be produced with a variant that is associated with older speakers, as shown by Hay & Foulkes [5].

Exemplar models contrast with models that posit that words are represented by abstract, phonological entries [16]. For example, in terms of pronunciation, a person’s mental representation of the word bead would be /b/ + /i/ + /d/. Speech production involves articulating these phonemes in order, and speech perception involves normalizing or filtering variation in the signal to arrive at the correct phonological parse. Such models are supported by generalization effects, where someone applies learning or associations beyond what they have directly experienced. For example, in the previously mentioned German et al. [12] paper, while participants produce Glaswegian variants most often with words they had directly experienced in the Glaswegian accent, they critically also showed generalization to words not used in training. Such generalization has been found to occur across allophonic boundaries [17, 18], across speakers [19], and across features [20, 21].

Effective models of language processing then must be able to account for both specificity and generalization, and many researchers argue for hybrid models, where word representations contain both specific, detailed phonetic memories of words (exemplars) and also include higher level abstractions (i.e., the phoneme category [i]), both of which serve to impact speech production and perception [12, 22–24]. Thus, for any particular word, we have a phonetically detailed distribution shaped by past experience as well as stored associations with the relevant higher-level categories: the word bead has a detailed distribution which is word-specific, and is also associated with the more abstract categories of /b/, /i/, and /d/. In such accounts, encountering a new entry of “bead” would update the representation of both the specific word, and also the associated phonemic categories.

The motivation for hybrid models has been the clear impact of both detailed acoustic memories and abstract phonological categories in speech processing; that is, the argument for hybrid models has been based on sounds. But in exemplar models, memories are not simply rich in acoustic detail, but also rich in other sensory and contextual details: physical sensations [25], physical location [26], visual presentation [27–29], and critically for this paper, social information about the speaker (see Foulkes & Docherty [30] for a review). It is therefore likely that people also generalize from statistical associations between a word and particular types of speakers to associations between that word and more abstracted social categories. Indeed, a number of recent models of lexical representation and perception very explicitly assume that our experiences of words are simultaneously interpreted for both linguistic and social meaning–activating, for example, both phonological categories, and social categories. For example, Munson [31] provides a schematic of a rich hybrid exemplar theory, in which the incoming signal is simultaneously indexed to multiple levels of representation–both social and phonological. And Sumner et al.’s “dual route approach to speech perception”, assumes that “learned acoustic patterns are mapped simultaneously to linguistic representations and to social representations” [32](p.1).

While often assumed, the role of multiple levels of social representation for words has not been very well explored, though there are many reasons to believe such a relationship exists. Just as we organize continuous, phonetic information into categorical, phonological units [33, 34], there is evidence for similar, categorical perception of social information [35, 36]. Moreover, attributes of a social group generalize beyond the individuals in a group to objects associated with the group. For example, Lemm, Dabady, & Banaji [37] conducted a priming experiment to test whether images that only connotatively referenced gender (i.e., oven mitts and baseball mitts) could prime FEMALENESS and MALENESS similar to the way that words and images with denotative gender (images of faces, words like congresswoman) do. They found that images that connotatively referenced gender did indeed prime notions of gender (though to a lesser degree than denotative gender items), arguing that

…masculinity and femininity can be activated even by stimuli that are not exclusively male or female (e.g., sports cars or frilly lampshades, or job titles such as pilot or nanny). Gender concepts pervade ordinary objects and words; even stimuli that do not have an intrinsic gender can bring gender concepts readily to mind upon the briefest exposure. (p. 236)

Their finding is consistent with other work that indicates that stereotypes abstract beyond members of that group to objects or words related to the stereotype [38–40].

Therefore, just as we see generalization from experienced exemplars based on phonemic labels, we should also see evidence of generalization based on social labels. There is already some indirect evidence for such social abstractions at the sound level (as opposed to the word level). Hay & Drager [41] used stuffed toys associated with New Zealand or Australia and were able to prime New Zealand and Australian dialects respectively in a speech perception task. It is unlikely that participants’ experiences of Australian and New Zealand English were always accompanied with stuffed toys, so this effect appears to stem from the shared association of both the toys, and certain phonetic features, with the labels AUSTRALIAN or NEW ZEALAND. Szakay, Babel, & King [42] demonstrate that if words in New Zealand English are produced with phonetic variants consistent with Māori English, their Māori translation equivalent is primed more than if they were produced with more Pākehā (non-Māori) variants.They argue that this is because the phonetic variant and the language (Te Reo) Māori are both associated with the social category of MĀORI.

In the current study we set out to test whether people generalize from experiences with specific speakers at the word level to social information about those speakers and to social categories more generally. For example, if certain words are experienced more in female voices compared to male voices, this should build a relationship not simply between the acoustic gender of the voice and the word (specificity, see [3]), but more generally between the word and other things that share the label FEMALE (generalization). That is, we should see a relationship between female gendered words and female gendered objects, like a handbag, even though the words and objects did not necessarily co-occur in people’s experiences.

We want to emphasize at this point that terms like “gendered words” in this paper refers to words that are relatively over-represented in corpora of female vs. male speech, not necessarily stereotypically or denotatively gendered words. Our previous work [3] suggests that while there is a correlation between these corpus-based ratios and participants’ conscious awareness of usage biases, the corpus counts predict processing behavior where ratings do not (though see [4]). In fact, sometimes usage-gender and denotative or connotative gender might actually conflict, such as in words such as husband, which is used more by women than men, but denotes a man.

We ran four experiments to test the primary hypothesis that participants abstract from speaker-specific associations of a word to category general associations. All of the experiments use a combination of the Implicit Association Task (IAT) and Lexical Decision Tasks (see 1.1). The first two experiments build on Walker & Hay [3], which examined congruency effects between word age/gender in a lexical decision task. Our experiments test whether participants show evidence of an implicit association between age-skewed words and OLD/YOUNG (experiment 1), and between gender-skewed words and MALE/FEMALE (experiment 2), using faces in the sorting task. The third experiment takes a more abstract step, again examining whether participants have an implicit association between gendered words and MALE/FEMALE, but using gendered objects instead of faces in the sorting task. The fourth experiment also tests the relationship between gendered objects and words, but without explicitly mentioning gender in the task.

Investigating lexical access using the Implicit Association Task

The Implicit Association Task (IAT) [43] aims to measure the associative strength between different concepts/objects/groups. In the IAT, participants need to pair two different dimensions, and the ease with which they do so is taken as evidence of the associative strength of these dimensions. For example, Greenwald and colleagues used this procedure to test implicit attitudes toward ethnicity. Participants are asked, across separate tasks, to sort names of objects as PLEASANT or UNPLEASANT (e.g. flower, insect), or to sort proper names as BLACK and WHITE (e.g. Latonya, Meredith). The critical portion of the experiment involves completing both tasks within a single block. If participants are faster and more accurate at the task when BLACK and UNPLEASANT are responded to with the same hand than when BLACK and PLEASANT are paired, this is understood to reveal the participants’ implicit, negative attitude or association with black people (the sorting task is easier because the participant implicitly sees BLACK and UNPLEASANT as being similar categories). Across many studies, researchers have used variants of this task to attempt to tease out implicitly held associations, most commonly testing implicitly held attitudes of their participants.

Social psychologists often use words as stimuli to invoke the social concepts under comparison, and linguists have used IATs to measure underlying social prejudices of speakers [44–46]; however, little research has directly used the IAT to investigate linguistic processing. An exception is Campbell-Kibler [47, 48], who has used the IAT to investigate sociolinguistic meaning, showing that participants implicitly associate sociophonetic variants with various social dimensions (region, profession).

In the current study, we combine the IAT with a visual Lexical Decision Task. In Lexical Decision Tasks, participants are presented with a word and asked to decide, as quickly as possible, whether the word is a real word or not. Their accuracy and response times are thought to reflect ease of lexical access, and a common finding is that people recognize high frequency words as real words faster than low frequency words [49, 50]. Of the most relevance to our study, researchers have shown that access to words that are more often used by older speakers [3, 4] or women (see Section 3.1.1) is facilitated when the word is presented in an older or a female voice respectively.

In our combined IAT and Lexical Decision Task, participants simultaneously sort words into real words or nonsense words, and images/text into categories (in our study, age or gender categories). At a given block in the experiment, participants use the same hand to sort real words and, for example, female faces. Historically, when psychologists have combined these tasks, they have usually been exploring the mechanisms behind IAT tasks [51, 52]. For example, Rothermund & Wentura [53] use a non-word/word sorting task in place of a positive/negative sorting task in an old-young IAT. They show that they get NON-WORD+OLD facilitation much akin to the well attested UNPLEASANT+OLD effect [54], and use this finding to argue that the symmetrical salience of paired categories might be driving facilitation in the IAT, more so than shared semantic or evaluative associations. That is, the reason participants do better with both the NON-WORD+OLD and the UNPLEASANT+OLD pairing is because in both cases, the salient category old is paired with the salient categories of non-words, or negative things. In contrast, real words, positive things, and youth share being non-salient, or unmarked. Greenwald et al. [55] respond that regardless of what exactly is being measured, “the implications for construct validity of IAT measures are the same” (p. 425), though debates continue [56].

Our study differs from these previous studies that have used a Lexical Decision Task as part of the IAT in that we analyze the data like a Lexical Decision Task: we are interested in the differences in responses to different real words (i.e., a within-category difference), which we take to reflect speed of lexical access, and we investigate this by looking at accuracy rates and response times in real words trials only, as a function of what other category they are paired with. Critically, our aim is not to investigate the associative strength between the categories of REAL WORD and FEMALE, etc., though this is a byproduct of our design. Rather, we want to test the overarching association between individual stimuli (words) and a social category (e.g. OLD/YOUNG or FEMALE/MALE) using primary (faces) and secondary (objects) associations of these social categories.

We predict that participants will be faster at recognizing older/female words when the hand they use to sort older/female faces is the same hand they use to sort real words, reflecting an implicit association between the word and older/female speakers (and vice versa for young or male words). Moreover, we expect that this association will hold even when the sorting task involves gendered objects, not faces (experiment 3) and when we remove any mention of “gender” from the task (experiment 4). Our predictions are based on two assumptions: first, that people have access to the statistical associations between certain words and the populations of speakers who use them [4, 5]; second, that beyond this, they generalize to make associations between words and abstract social categories.

Experiment 1: Older and younger words, older and younger faces

This experiment tests for an implicit association between word age and the categories OLD and YOUNG. In this experiment, participants engaged in a combined IAT and Lexical Decision Task where they sorted words and non-words while also sorting old and young faces. The words came from Walker & Hay [3] and were words that were skewed in their usage across speakers of different ages, such that older speakers were more likely to use some of these words than younger speakers, and vice versa. We predict that participants will be faster at accessing real, older words if they are using the same hand to sort real words and older faces. If we find a significant facilitation effect of face and word congruence, this would provide support that words used more by older or younger speakers are associated with older and younger people respectively.

Methodology

Stimuli.

The stimuli in this experiment consisted of photographs of older and younger faces, and single, orthographically presented words. The 80 words in this study consisted of 40 real words and 40 non-words, and were a subset of the words used by Walker & Hay [3]. The real words were sourced from two corpora from the Origins of New Zealand English (ONZE) archives [57], a growing repository of recorded and transcribed interviews of native speakers of New Zealand English, housed at the University of Canterbury.

Conceptually, we use this corpus to represent the experienced speech of our participants, a crude but common tactic used to explore frequency effects [3,10]. It is crude for two reasons. First, experience is individual: no two people will have experienced the same samples of speech in their lifetime, and certainly no one would have received only and all of the speech represented in ONZE. Second, there is ample evidence that not all linguistic experience is attended to or encoded in memory in the same way [see 32 for a review]. Our use of a corpus then represents a methodological necessity rather than a theoretical claim about what sort of speech is committed or not committed to memory. Therefore, we proceed as if the corpus reflects our participants’ experiences, and as if all of these experiences are equally well stored in memory.

To choose age-skewed words, we compared word frequencies in the Intermediate Archives (IA) (> 515,000 words from ~90 speakers born between 1890–1930) relative to the Canterbury Corpus (CC) (> 815,000 words from ~400 speakers born between 1930–1984 at the time we extracted data). The 40 real words were selected by comparing the relative frequency of words in the IA and CC, and noting when words were overrepresented in one corpus relative to the other. 20 words that skewed old in their usage were selected, which were between 30:1 (fireworks, confectionery, frighten, idle, mittens, pencils, willow) and 7:1 (fried) times more common in the IA than the CC, and had an in-IA corpus frequency range of 10 to 64 ppm (parts per million). The 20 young words ranged from being around 22:1 (bitten, chemistry, depressing, impressive, intellectual, physics, spirits) to 5:1 (nicest, environment) times more common in the CC versus IA, and had a within-CC frequency range of around 9–121 ppm in the CC corpus. There was no significant difference between the frequency of old words in the IA (mean = 24.72ppm, std.dev = 15.75), or young words in the CC (mean = 32.28ppm, std.dev = 31.94) in a Wilcoxon rank sum test (W = 190.5, p = 0.8075).

Forty non-words were created from the real words, maintaining stress pattern, syllable structure, and orthographic length, and with legal English phonotactics (see Appendix A for the list of real and non-words). A task completed at the end of the experimental session asked participants to rate the real words on a 6-point scale of age usage, from 1, meaning that this words is “much more likely to be used by younger speakers” to 6, meaning this words is “much more likely to be used by older speakers”. Old words received an average rating of 4.04 (sd = .76), and young words received an average rating of 3.66 (sd = .58). A Wilcoxon rank sum test did not find this difference in overt perceptions of word use depending on age to be significant (W = 268.5, p = 0.07), suggesting that listeners are not consciously aware of the patterns of the skewed usage of these words by older/younger speakers.

The photographs in this experiment depicted young and old faces. To produce the photos, candidate subjects were recruited via personal contacts. Photographs of their faces were taken in front of a plain white background while they directly faced the camera with a neutral facial expression and both eyes open. To avoid recognition of the identity of the face, which may alter the intended results of the experiment [58], novel faces were created using the software Abrosoft FantaMorph [59]. Each original face was morphed with another face of his or her own sex to create a unique face. In total, 20 unique photographs were created depicting unique faces (10 old, 10 young). There were five photos of each sex for each age classification. All photos appeared to be natural to the authors (examples available at https://github.com/jenniferhay/hayetal-plos2019/supplementaryfigures.pdf).

Participants.

Forty-two participants (32 female, 10 male) were recruited at the University of Canterbury and compensated with a $10 voucher for their time. Participants were aged between 18 and 56 (median year of birth = 1991). All participants were native or near-native speakers of New Zealand English (moved to New Zealand before the age of seven).

Procedure.

Participants individually engaged with the experiment in a quiet room over a single session. The fourth author (a young, female, native speaker of NZE) ran all participants. The experiment was programmed in E-Prime 2.0 software (Psychology Software Tools, Pittsburgh, PA) and was run on a dual operating system Macintosh computer while in the Windows operating system. Participants were instructed that they were in a reaction time experiment and were encouraged to respond as quickly and accurately as possible.

There were seven blocks in the experiment. In the first block, participants sorted the 20 morphed photographs of faces by the age of the face. For half of the subjects, the word “Older” appeared on the top left side of the computer screen and the word “Younger” appeared on the top right side, which was reversed for the other half of subjects. The randomly selected target stimulus (i.e. photo of a face) appeared in the center of the screen. Participants were asked to quickly categorize the photo with the left (q) key if it was an older face or the right (p) key if it was a younger face. For trials in this and all subsequent blocks in which participants sorted on a single dimension, participants were allowed 1500ms to categorize each photo before they were prompted to “respond faster”. Similarly, if the participant responded correctly the next trial would commence. An incorrect response would prompt a red “X” to appear at the bottom center of the screen, with the next trial commencing only when the participant selected the correct response for the trial.

In the second block, participants sorted the words by whether they were real or nonsense words (i.e., a Lexical Decision Task). For all subjects the label “Real” appeared on the top left side of the computer screen and the word “Not Real” appeared on the top right side. The randomly selected target stimulus (i.e. word or non-word) appeared in the center of the screen. Participants were asked to quickly categorize the word or non-word with the left (q) key if it was a “Real” word or the right (p) key if it was a word that was “Not Real”.

The format for the third and fourth blocks was identical, with the short third block serving as practice for the fourth block. Here, the categories from blocks one and two were combined (i.e., an IAT), so that participants sorted REAL words with their left hand, NOT REAL words with their right hand, and the photos into OLDER or YOUNGER with the hands they had used for the first stage of the experiment. The third block contained four trials, one per each type of stimulus label. During all practice trials, participants were given an infinite amount of time to respond. The fourth block contained 160 randomized trials: the 80 words and the four presentations each of the 20 photos. To allow for the increased difficulty of sorting on two dimensions, participants were given 2000ms in this block (and in block 7), before they were prompted to respond faster.

The fifth block was the same as the first block, but with the category labels switched, so that if a participant originally categorized a face as OLDER with their left hand, they now categorized it with their right. The sixth and seventh blocks were identical to blocks three and four, respectively, but with the new age-categorization labels learnt in block five. This experiment, and all subsequent experiments reported in this paper, were reviewed and approved by the Human Ethics Committee of the University of Canterbury.

Analysis.

All data were analyzed using R [60] and the R packages lme4 [61] and languageR [62, 63]. Mixed effects models were fit by hand, using model comparison to select the best-fit model. The dependent variables consisted of accuracy (logistic regression) and log reaction time (linear regression), run in separate analyses. The random intercepts in the analyses were always Subject and Word [64, 65]. The tested fixed effects consisted of Participant Age (a median split: born in 1991 or later = younger participants; born before 1991 = older), Word Age (of stimulus as selected via the ONZE Corpus: older or younger–see [3], Pairing (if the age of the photo and the REAL label is paired with the same hand: OLDER-REAL, YOUNGER-REAL), Handedness (of the participants, right or left), Block, and Trial number (scaled and centred–within block). Model fitting began with all listed fixed effects, and all three-way interactions between Trial, Participant Age, Word Age, and Pairing. Non-significant interactions and fixed effects were iteratively dropped from the model, based on anova comparison. In all models, random slopes were included for Word Age and Pairing (and their interaction) on the participant intercept.

Reaction time cut-offs were used to filter the data. The lower bound cut-off was set to responses faster than 250ms, while the upper bound cut-off was set to responses slower than two standard deviations above the mean within a participant’s data. In experiment 1, 6.7% of the data were removed with this method.

Results

Accuracy.

The overall accuracy rate was 94.1%. The best fit logistic regression model for word accuracy contained a main effect of Block (with the later block having lower accuracy), and an interaction between Word Age and Trial: Younger words improved in accuracy as trials progressed (β = 0.30802, Std. Error = 0.13558, z = 2.272). Critically, there was no significant interaction between Word Age and Pairing, although the numbers trended in the predicted direction (Table 1), carried by higher accuracy to young words when REAL and YOUNGER were paired. Participant Age was not significant, either in isolation or in an interaction.

Download:

Table 1. Mean accuracy rates and standard deviations (by speaker) for different word types across different real word pairings in experiment 1.

https://doi.org/10.1371/journal.pone.0210793.t001

Reaction times.

Our reaction time analyses exclude incorrect trials. The dependent variable in our reaction time analyses is log RT. The best model of experiment 1 reaction times is shown in Table 2.

Download:

Table 2. Summary of best fit-model for response times in experiment 1.

https://doi.org/10.1371/journal.pone.0210793.t002

Like the accuracy model, we see a main effect of Block (participants are slower in the second critical Block), and Trial (participants get faster within a block as it progresses). There is also a significant three-way interaction between Pairing, Word Age, and Participant Age: younger participants show the predicted association between young words and young faces, but older participants do not. In order to investigate the nature of this interaction further, we fit two separate models, one to the older participants and one to the younger participants. The pairing by word-type interaction did not reach significance for the older participants, but it did reach significance for the younger participants (β = -0.042924, Std. Error = 0.018292, t = -2.35). Fig 1 shows this interaction for younger participants: they are generally faster when YOUNG and REAL are paired, but significantly more so for words that are used more often by younger speakers. However, responses to old-words do not become faster when paired with OLD faces. Note that our RT models all use log RT as the dependent variable, but in our figures we use an exponential function to convert model predictions back to RT, for interpretability.

Download:

Fig 1. Interaction between word type and IAT pairing (condition) for younger participants only (born > 1991).

Left panel shows model predictions, and right panel shows raw distributions of responses.

https://doi.org/10.1371/journal.pone.0210793.g001

Summary of experiment one

This study complements the findings of Walker and Hay [3] by showing that listeners appear to be sensitive to the aged distribution of words, since there is evidence that they implicitly associate younger words with younger faces. The evidence for this effect comes from the reaction time data, and from younger participants only. There is also a more general effect that participants (older and younger) are faster when REAL and YOUNG are paired, replicating Rothermund and Wentura’s findings [51, 53], and supporting their argument that salience congruence may account for some of the facilitation in the IAT.

While younger participants were significantly faster responding to young-words compared to old words when REAL and YOUNG were paired, there was not the predicted crossover effect for old-words when REAL and OLD were paired: access of old-words was not facilitated by a “congruent”-face pairing. It’s possible that this is due to the general difficulty participants have in pairing OLD and REAL (compared to YOUNG and REAL) [51, 53], which is delaying lexical access more generally and cancelling any facilitation that might come from congruence. That is, the lack of effect isn’t because of a lack of an association between old faces and old words, but a product of this specific methodology, and the fact that there was a crossover effect using these same words in [3] supports this interpretation. The alternative is that this asymmetry in our results reflects a true asymmetry in lexical representation and/or access: our young listeners might attend differently to words coming from older and younger speakers, and encode stronger relationships between young-faces and young-words than they have been old-faces and old-words. Such an account would be consistent with recent work highlighting the ways in which raw experiences are encoded differently depending on social factors [32].

We did not hypothesize a distinction between the younger and older participants in terms of the Word Age and Pairing interaction, and so we would need to replicate these results before assigning much weight to them. There is certainly evidence in the social psychology literature that older and younger people differ in their conceptualizations of their own age, and age more generally [66, 67], and evidence from studies of language change suggests that young people, especially adolescents, are most heavily invested in aged-based language differentiation [68,69]. Therefore, it is conceivable that our older and younger participants either differed in how they responded to the labels and faces used within our task, or in how they socially encoded the words used in our study, and future work designed to test these hypotheses could be illuminating.

In experiment 2, we use the same methodology, but substitute the social category of age with gender. Gender has been identified as one of the most robustly attended to sociolinguistic categories; gendered variants in speech arise early, are attended to early [70], and adult learners are easily able to learn new linguistic associations with gender [71]. Thus, if there is any social generalization that is most robustly associated with word use, we should predict that it would be gender.

Experiment 2: Female and male words, female and male faces

In this experiment, we test for an implicit association between words used more often by men/women and male/female faces. The design is similar to experiment 1, except we used words that are biased in production frequencies such that they are used more often by men than women, or vice versa (as opposed to older and younger speakers). The purpose of this experiment is to conceptually replicate experiment 1, using a different social category of speakers.