What Homophones Say about Words

Isabelle Dautriche; Emmanuel Chemla

doi:10.1371/journal.pone.0162176

Abstract

The number of potential meanings for a new word is astronomic. To make the word-learning problem tractable, one must restrict the hypothesis space. To do so, current word learning accounts often incorporate constraints about cognition or about the mature lexicon directly in the learning device. We are concerned with the convexity constraint, which holds that concepts (privileged sets of entities that we think of as “coherent”) do not have gaps (if A and B belong to a concept, so does any entity “between” A and B). To leverage from it a linguistic constraint, learning algorithms have percolated this constraint from concepts, to word forms: some algorithms rely on the possibility that word forms are associated with convex sets of objects. Yet this does have to be the case: homophones are word forms associated with two separate words and meanings. Two sets of experiments show that when evidence suggests that a novel label is associated with a disjoint (non-convex) set of objects, either a) because there is a gap in conceptual space between the learning exemplars for a given word or b) because of the intervention of other lexical items in that gap, adults prefer to postulate homophony, where a single word form is associated with two separate words and meanings, rather than inferring that the word could have a disjunctive, discontinuous meaning. These results about homophony must be integrated to current word learning algorithms. We conclude by arguing for a weaker specialization of word learning algorithms, which too often could miss important constraints by focusing on a restricted empirical basis (e.g., non-homophonous content words).

Citation: Dautriche I, Chemla E (2016) What Homophones Say about Words. PLoS ONE 11(9): e0162176. https://doi.org/10.1371/journal.pone.0162176

Editor: Philip Allen, University of Akron, UNITED STATES

Received: January 9, 2016; Accepted: August 18, 2016; Published: September 1, 2016

Copyright: © 2016 Dautriche, Chemla. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All files are available from the OSF database (url https://osf.io/u473e/).

Funding: ID was supported by a Graduate Fellowship from the Direction Générale de l’Armement (PhD program Frontières du Vivant). The research was supported by grants from the European Research Council under FP/2007-2013-ERC n°313610 to EC, and from the Agence Nationale de la Recherche ANR-10-IDEX-0001-02, ANR-10-LABX-0087 for the facilities provided by the Departement d'Etudes Cognitives. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Learning the word “cat” implies associating the sequence of sounds /kaet/ to the set of all cats and only cats. Quite generally one description of the meaning of a content word is its “extension”, i.e. the set of all entities to which that word refers (an idea discussed in detail in the tradition of formal semantics at least since [1]). But language learners need to infer the extension of a word based on a set of exemplars that surely do not exhaust that extension. The underlying inference problem would be unsolvable without prior knowledge, most notably some that could constrain the hypothesis space, which is the set of potential meanings for words (e.g., [2]–[5]; and [6] for a formal proof).

One way in which learners may reduce their hypothesis space is by privileging some meanings over others. For instance, toddlers and preschoolers prefer to extend a novel word (e.g., assume “blicket” is first associated with a dog) to an object of the same kind (e.g., a cat) rather than to an object of a different kind (e.g., a bone) (e.g., [7], [8]; see also the “shape bias”, showing that infants extend a label on the basis of the shape, [9]).

This follows if learners assume that those concepts that have words associated with them, are convex (e.g., [10]). A concept is convex if its members form a group that share a common set of properties that holds them to be contiguous in conceptual space (e.g., [10], see also [11] for the idea of “conceptual coherence”). For instance, the category dog or bone is not a possible concept because it does not form a coherent class of objects. Thus, if concepts are expected to be convex and words label concepts, learners may more readily extend the extension of a word (e.g., “blicket” designing a dog) to neighboring objects in conceptual space (cat) rather than to more distant objects (bone).

Thus, current experimental results provide evidence that a convexity constraint guides learners' inferences about word meanings: if A and B can be labeled using the sound /bliket/, then all objects falling “in between” A and B in conceptual space can also be labeled with the sound /bliket/. But these experiments do not distinguish between words and word forms, hence it is unclear whether this constraint applies at the level of the words or at the level of the word forms.

Yet words and word forms can be dissociated. A homophone is a phonological form associated arbitrarily with several meanings (contrary to polysemy, see e.g., Bréal, 1904), which together form a discontinuous set in conceptual space. For instance, the English word form “bat” applies both to the convex concept of animal bats and to the convex concept of baseball bats, but, regardless of how the conceptual space is constructed, not all intervening objects sharing a common property of animal bats and baseball bats count as bats.

Although the domain of application of the convexity constraint, words or word forms, is rarely specified explicitly, all current models of word learning (e.g., [12], [13], [14]) practically implement a convexity constraint at the level of word forms. It is a virtue of these word learning models that they work at the level of word forms, because this is the visible layer of the input, learners hear word forms, not words. But note that as a consequence of this implementation, these accounts mechanically predict that when encountering a word form that applies to animal bats and baseball bats, English learners should conclude that bat applies to any intervening object, as would words that apply very broadly, such as “thing” or “stuff”. The very existence of homophony in human languages thus shows that learners do not adhere blindly to a convexity constraint on word forms. In sum, learners make inferences about the meaning of words, based on the occurrence of some word forms across different situations, We will show how learners rely on the (hidden) word level of representation to master a lexicon and how they may capitalize on a proper convexity constraint at this level to learn homophones.

Concretely, our point of departure will be work by Xu and Tenenbaum (2007) [12]. One advantage of their study is that it implements the convexity constraint on word forms in a predictive model, but it also provides the means to test it in a non-circular way. To do so, they first gathered similarity judgments between pairs of objects, and inferred a tree-structure over the whole set. This tree structure represents the taxonomy between the objects: different dogs are close together and form a subtree, mammals form a (bigger) subtree, etc. Such a hypothesis space reflects the taxonomic assumption [5] that requires words to label the nodes of a tree-structured hierarchy of natural concepts, in line with developmental data (e.g., [4], [5], [7], [8]. Crucially, Xu and Tenenbaum (2007) [12] used this structured conceptual space to test a model of word learning according to which the extension inferred for a given word label should be a set of objects with no gap in conceptual space and which minimally includes all exemplars. Thus, intervening objects, i.e., objects that are in between two learning exemplars, are defined as all objects in the smallest subtree that includes both exemplars (their convex hull). Accordingly, the authors demonstrate that, when exposed to a set of learning exemplars, participants extend the exemplars’ label to all intervening objects belonging to the smallest subtree that contains all these exemplars. For example, when presented with three Dalmatians as exemplars for a new word “fep”, adults readily extend “fep” to the set of all Dalmatians; would they be presented with a Dalmatian, a Labrador and a German-shepherd for the word “fep”, they would extend the label to the set of all dogs. In other words, participants pick the smallest generalization that satisfies the convexity constraint on word forms.

The present study explores the situations that lead language learners to postulate homophony for a new word using the word learning paradigm used by Xu and Tenenbaum (2007) [12]. In Experiment 1, we manipulate two factors that should invite learners to favor a homophone interpretation of a novel label:

The size of the gap, in conceptual space, that separates different learning exemplars of a given word. To learn a homophone, language learners are exposed to a discrete set of learning exemplars. For instance, for the word bat, they would observe several animal-bats and several baseball bats. However if the underlying true concept were the broad category that encompasses animal-bats, baseball-bats and all intervening objects (e.g., “thing”), then presumably learners would not observe exemplars confined to two corners of this set. Rather, they would observe a set of learning exemplars randomly (uniformly) sampled from the broad category. Observing exemplars clustered at two distant positions in the hypothesis space, i.e., observing a large gap between the exemplars may boost the likelihood that the exemplars are sampled from two independent categories, favoring a homophone interpretation.
The intervention of other lexical items in that gap. Evidence for homophony may also come from other words in the lexicon. There has been much evidence that words and their underlying concepts mutually constrain each other. For instance, language learners assume that words do not overlap in meaning (the “mutual exclusivity effect”; e.g., [15]). Having evidence that an additional label point towards an intervening region of the conceptual space (e.g., between animal-bat and baseball bats) may help learners discover more subtle configurations about how words map onto meanings.

Our results show that participants refrain from associating a label to a broad concept encompassing all the exemplars. Yet it does not entail that learners postulate homophony in these cases: Learners could have accepted that a word map onto a discontinuous concept (e.g., dog or bone) therefore violating concept convexity. We address this question more directly in Experiment 2. All in all, our results suggest that the effects documented in Experiment 1 are the footprints of homophony: Learners prefer to associate a single word form to several words and associated convex concepts, thus preserving concept convexity at the expense of word form convexity. This shows that current accounts of word learning face new challenges when incorporating homophony into the picture and that homophony can reveal (some of) the existing constraints learners deploy while learning words.

Experiment 1: Gap in Conceptual Space and Overall Structure of the Lexicon

We used a word learning paradigm à la Xu and Tenenbaum (2007) [12]: participants were exposed to a new label through a couple of learning exemplars and asked whether the label should be extended to test items. We introduced a) a large gap in conceptual space between learning exemplars b) an intervening exemplar with a different label in that gap. We predicted that these two manipulations would lead to a breaking point after which participants would violate a convexity constraint on word forms, i.e., exclude items in the gap from the extension of the label.

Method

Ethic Statement.

All research was approved by the Comité d'Ethique de la Recherche en Santé (2013/46). Following the committee's recommendations, prior to accepting to participate in the online studies, participants were presented with the informed consent document and instructions stating that by clicking “Agree” they indicated their consent to participate in the study.

Participants.

One hundred and five adults were recruited through Amazon’s Mechanical Turk (45 females; M = 33 years; 102 native speakers of English) and compensated $0.50 for their participation. We excluded participants for lack of engagement in the task (criterion: participants who selected no test item in more than 50% of the “attractive” trials, in which at least 3 items should have been selected, see below; n = 0 in Experiment 1A, n = 16 in Experiment 1B) and participated in both versions of the experiment or in a previous pilot version (n = 3 and 5). This resulted in 41 participants in Experiment 1A and 40 participants in Experiment 1B. Data collection was stopped when each of the experiment had at least 40 participants. The number of participants was established before data collection began.

Procedure and display.

Participants were tested online. They were instructed that they would be exposed to words from an alien language and would have to select images that correspond to those words. In the instructions, participants were shown an example of a trial with pictures and a label that would not appear during the test. In each trial, participants first saw 3 or 4 learning exemplars, presented as the combination of a picture and a sentence. The first three learning exemplars (referred to as le1, le2 and le3 below) were presented in random order and labeled with a novel word, e.g., blicket, via a prompt of the form “This is a blicket” underneath each of them. The fourth learning exemplar (leX below), if present, was labeled with another novel word highlighted in red, as in e.g., “This is a bosa” and was always the right-most exemplar. Once participants pressed a button “Show”, they would see a set of 4 pictures below the learning exemplars and be asked to select from these test items which one(s) could be labeled with the first novel word: “Do you see any other blicket(s)?” (see Fig 1A). They responded by clicking to select none, one or multiple test items. When a picture was selected, its frame became green. Participants could unselect their choice by clicking on it again. Once a response was validated, the set of selected pictures was recorded and the test continued to the next trial.

Download:

Fig 1.

A) Screenshots from Experiment 1A. Participants first see the 3 learning exemplars for the word “blicket” and one optional learning exemplar for the word “bosa”. After pressing the “show” button they then see the test pictures and are asked to find the other blickets. Once the pictures are selected (green frame), participants submit their answers by pressing the “done” button. B) Schema of the structure of a trial in conceptual space. The first row of pictures corresponds to the learning exemplars (le1, le2, le3, leX) and the second row to the test items. The intervening item leX appeared only in half of the test trials (hence the parentheses).

https://doi.org/10.1371/journal.pone.0162176.g001

Conditions.

Each participant saw 12 test trials and 10 filler trials.

Test trials. The structure of test trials is represented schematically in Fig 1A, the key factor is how the learning exemplars (le1, le2, le3 and optionally leX) were spread in conceptual space (here a tree-structure) and how the test items were distributed between them. As shown in Fig 1B, there were two gaps between the exemplars: one small gap between le1 and le2 and one much larger gap between le2 and le3. Test items were picked somewhere in the middle of the first small gap (middle-small-gap), of the large gap (middle-large-gap), in the large gap but close to the corresponding exemplars (border-large-gap) or out of all the exemplars altogether (out).

Six of the test trials, “Gap trials”, were designed solely to test the effect of the size of a gap between learning exemplars. They displayed three learning exemplars (le1, le2, le3) associated with a to-be-learned label. According to the convexity constraint on word forms, participants should select all test items in the minimal subtree containing all learning exemplars, but we expected that participants would be willing to violate this constraint and exclude middle-large-gap (or not as much as middle-small-gap).

Another 6 test trials, “Gap+Intervention trials”, had a fourth learning exemplar with a secondary label (the leX bosa exemplar in Fig 1). The convexity constraint on word forms applies to single lexical entries and is in principle blind to the rest of the lexicon, but we expected that participants would select the middle-large-gap test item less in these trials with an intervening label than in the test trials without this intervening label.

Filler trials. One filler trial was presented first so that participants could familiarize themselves with the task (with no particular indication of it however). Nine other fillers were randomly interspersed between the test trials. 6 “attractive” fillers were designed such that participants would select at least 3 of the 4 test pictures (3 of these filler trials contained three learning exemplars, all with the same label as in the Gap test trials, and 3 others included a fourth learning exemplar with a secondary label as in the Intervention test trials). 3 “repulsive” fillers implemented the opposite bias: participants were expected to select one or no test picture.

Materials.

Our stimuli relied on a set of to-be-learned labels and taxonomically organized objects.

Labels. 28 phonotactically legal non-words of English were used for both experiments and were not repeated across trials.

Objects in conceptual space. We tested participants on two sets of objects organized into drastically different taxonomic hierarchies: natural objects, with a similarity measure based on phylogenetic trees (Experiment 1A) and artificial objects constructed in a parametric fashion, so that a similarity measure between these objects can be defined in a canonical way (Experiment 1B; Fig 2). Objects from this artificial taxonomy do not exist such that the actual lexicon of our participants cannot influence our experimental results.

Download:

Fig 2. Stimuli of Experiment 1B.

Examples of the artificial stimuli used in Experiment 1B, out of a set of 1024 possible unique combinations obtained from 5 parameters (core pattern, core pattern occurrences, size of the core pattern, number of radial lines, number of bumps in the radial lines) with 4 levels each.

https://doi.org/10.1371/journal.pone.0162176.g002

One important difference with Xu and Tenenbaum’s [12] paradigm is that our conceptual space did not rely on subjective, experimentally-gathered similarity judgments, but rather on objective similarity measures: one based on the distance in the phylogenetic tree and the other based on the parameterization of the objects. Surely these measures are only a proxy for participants’ representation of the similarity relationships between the objects. Yet, any effect that can be detected from these imperfect objective measures will retrospectively validate that it is a good approximation of the underlying subjective measure. We describe the two sets of objects at the basis of Experiments 1A and 1B, their structure, and how our experimental conditions were obtained in each case in S1 Supplemental Material. The experimental material for both experiments is available at https://osf.io/u473e/?view_only=33576a1ac18746b08d7e3fcc96e10e9a

Presentation and trial generation.

The order of the trials as well as the pairing between the labels and the set of learning exemplars was fully randomized and differed for each participant. All trials were generated automatically following the algorithmic constraints described in S1 Supplemental Material for each stimuli type.

Statistical analysis.

In a mixed logit regression [16], we modeled the selection of a test item (coded as 0 or 1) for each experiment (natural or artificial stimuli). Both models included two categorical predictors with their interaction: Test Item (middle-small-gap, border-large-gap, middle-large-gap, out) and Trial Type (Gap vs. Gap+Intervention) as well as a random intercept and random slopes for both Test Item and Trial Type for participants. We coded our predictors such that selection of middle-large-gap for Gap trials served as a baseline (unless otherwise mentioned) against which we compared a) responses to the other test items, b) the responses to middle-large-gap in Gap+Intervention trials.

All analyses were conducted using the lme4 package [17] of R.

Results

Fig 3 reports the average proportion of selection of each test item by Trial Type (Gap vs. Gap+Intervention trials) and Experiment (1A or 1B).

Download:

Fig 3. Results of Experiment 1.

Proportion of choice of each test item averaged across Experiment 1A with natural objects (upper panel) and Experiment 1B with artificial objects (lower panel) for each trial type (Gap vs. Gap+Intervention trials). The x-axis follows (with some simplification) the structure in conceptual space: the position of the learning exemplars is indicated among the bars for the test items with the dashed lines. Error bars indicate standard errors of the mean.

https://doi.org/10.1371/journal.pone.0162176.g003

For Gap trials (Fig 3Aa and 3Ab), we replicate the minimal category effect seen in previous results (i.e., [12]) showing that participants are more likely to select a test item belonging to the category which is minimally consistent with the exemplars (middle-small-gap, border-large-gap, middle-large-gap) than a test item outside of this category (out), both for Experiment 1A (β = -3.75, z = -12.20, p < .001) and Experiment 1B (β = -5.17, z = -11.46, p < .001; We Helmert-coded the predictor Test Item to compare the choice of out to the choice of the rest of the test items as a group). Crucially, the size of the gap between learning exemplars modulated participants’ responses. That is, participants selected middle-small-gap items more than middle-large-gap items both in Experiment 1A (M_{middle-large-gap} = 0.59, M_{middle-small-gap} = 0.98; β = 3.72, z = 7.53, p < .001) and in Experiment 1B (M_{middle-large-gap} = 0.44, M_{middle-small-gap} = 0.94; β = 3.45, z = 10.38, p < .001). Participants were sensitive to the distribution of the learning exemplars with natural stimuli but also with unfamiliar stimuli. This latter case shows that familiarity with the categories (e.g., mammals, carnivores, animals) and possible existing labels for them cannot fully explain the results.

For Intervention trials (Fig 3Ba and 3Bb), we first replicate the effect described above: participants were sensitive to the size of the gap between the exemplars, that is, they selected middle-small-gap more than middle-large-gap in Experiments 1A (M_{middle-large-gap} = 0.43, M_{middle-small-gap} = 0.97; β = 3.95, z = 9.75, p < .001) and in Experiment 1B (M_{middle-large-gap} = 0.32; M_{middle-small-gap} = 0.82; β = 2.70, z = 10.32, p < .001). Crucially, we expected that the presence of an intervening item would increase participants’ violation of a convexity constraint on word forms.

Indeed, in Experiment 1A, participants selected middle-large-gap less in Gap+Intervention trials than in Gap trials (β = -0.72, z = -3.80, p < .001). Yet, the presence of an intervening lexical item did not affect the choice of any other test items (all ps > 0.4) leading to an interaction effect: the difference between the selection rate of middle-small-gap and middle-large-gap was greater in Gap+Intervention trials than in Gap trials (β = 0.68, z = 2.51, p < .01).

In Experiment 1B, participants similarly selected middle-large-gap less in Gap+Intervention trials than in Gap trials (β = -0.61, z = -2.40, p < .05). But we should pause and note that the same was true for middle-small-gap items (β = -1.40, z = -3.96, p < .001; here the intercept reflected selection of middle-small-gap in Gap trials). This was because the intervening exemplar leX was sometimes close to middle-small-gap (and even closer than it was to middle-large-gap), thus introducing an independent reason not to select middle-small-gap in these intervention trials.

Overall, we did observe that intervening labels block the extension of a word to the minimal category including all observed exemplars, even though this effect was polluted for artificial stimuli.

Discussion

We highlighted two factors that disturb the association of a word form to the single category that minimally includes all its learning exemplars: a) the size of the gap between the exemplars; b) the presence of intervening lexical items. There may be three potential interpretations for these results:

Participants associated a label to two meanings that each satisfies concept convexity. That is, participants postulated homophony, a non-immediate way to bind labels and concepts. Note however that we did not test whether participants provide evidence that subjects generalized from the more distant trained exemplar, le3, a point that will be addressed in the next experiment.
Participants associated a label with a set covering entities from several disjoint concepts (e.g., as in dog or bone), breaking thus concept convexity, either because meaning discontinuity is acceptable or because the specific experimental task that we propose led them to do so.
Participants did not associate the new word with a meaning at all. Instead, they simply went by similarity of the test items to the learning exemplars: they selected more the objects close to the exemplars (middle-small-gap) than to the objects further away from them (middle-large-gap). The role of the intervening label may be harder to account for in this view, but one may imagine some strategic effect such that if an object is close to some irrelevant object X, it will decrease the tendency to say that this object belongs to a set that was not said to contain X.

Experiment 2 was designed to distinguish between these three interpretations.

Experiment 2: Linguistic Manipulations

Homophones interact with linguistic constructions in a characteristic way. Zeugmas are the typical rhetorical device used to pun on the different senses of ambiguous words (e.g., [18], [19]) and have been extensively used as a test to distinguish words with an extension that covers a broad category from polysemous and ambiguous words (e.g., [18], [20]). Consider for instance “John and his driving license expired last Thursday” [18], where the verb “expire” has two distinct, but related, senses (i.e. “died” and “not valid anymore”). If the zeugmatic sentence is acceptable, it shows that the relevant word is polysemous or ambiguous (the two meanings are distinct) rather than vague (the boundary between meanings are indistinct).

Interestingly, zeugmas can be used to distinguish between a homophone, where a label applies to two convex concepts, and a word associated with a disjunctive meaning, where a label would apply to a disjoint concept. For instance, if “blicket” maps onto a disjunctive concept, such as dog or bone, it should be possible to use a plural sentence “these are two blickets” when pointing to a dog and a bone, while it would be zeugmatic to say “these are two bats”, pointing at one animal-bat and one baseball-bat. This is explained in a theory of homophones in which two words, with different meanings, share the same form: one cannot use a single phonological form to refer to both meanings at the same time. However, different tokens of the phonological form may pick out different meanings: it may therefore be more natural to say in a situation as above “This is a bat (pointing at the animal-bat), this is also a bat (pointing at the baseball-bat)”.

We will use these two constructions to test whether the effects we documented in the experiments above are the signatures of homophony. If participants postulated homophony, the plural zeugmatic construction, which is not compatible with homophony, should increase the tendency to form a single convex category encompassing all learning exemplars (as dictated by the convexity constraint over word forms in the absence of homophony), compared to the also construction. This would be evidence that participants did not postulate that a label could map onto a discontinuous concept and that our effects are not solely driven by similarity, since the similarity of the test items to the exemplars is held constant across the two linguistic constructions.