Is “Huh?” a Universal Word? Conversational Infrastructure and the Convergent Evolution of Linguistic Items

A word like Huh?–used as a repair initiator when, for example, one has not clearly heard what someone just said– is found in roughly the same form and function in spoken languages across the globe. We investigate it in naturally occurring conversations in ten languages and present evidence and arguments for two distinct claims: that Huh? is universal, and that it is a word. In support of the first, we show that the similarities in form and function of this interjection across languages are much greater than expected by chance. In support of the second claim we show that it is a lexical, conventionalised form that has to be learnt, unlike grunts or emotional cries. We discuss possible reasons for the cross-linguistic similarity and propose an account in terms of convergent evolution. Huh? is a universal word not because it is innate but because it is shaped by selective pressures in an interactional environment that all languages share: that of other-initiated repair. Our proposal enhances evolutionary models of language change by suggesting that conversational infrastructure can drive the convergent cultural evolution of linguistic items.

In order to analyse these judgements, we combined the annotations into cumulative graded measures using the values in Table A. These measures were computed as follows, taking the Intonation category as an example. Annotators coded for three basic intonation contours: Rising (r), Level (l), and Falling (f). Every coding judgment was counted as a single vote upwards (for Rising), downwards (for Falling) or neutral (for Level). Three judgements of a token as Rising would thus amount to a value of +3. Two judgement of a token as Falling and one as Level would amount to a value of -2. Variance in judgments was allowed because phonetic realisations are always gradient and partly observer-dependent.
The combined scores allow us to visually present the gradient variation in product plots [1]. In product plots, the area of squares is propertional to token count: larger squares mean more tokens. Thus Figure 5 in the main text shows that most tokens in most languages have rising intonation, virtually none have level intonation, and the tokens of two languages (Cha'palaa and Icelandic) are falling. Similarly, Figure A below shows that most tokens in most languages show at least some degree of nasalisation. The procedure of independent coding and subsequent summation gives us graded analytic judgements that are better empirically grounded than simple descriptions in an arbitrary symbol system (e.g. the International Phonetic Alphabet).

2
Audio files The 196 audio files are available upon request to the first author.

3
Intonation Figure 6 in the paper displays pitch tracks for interjections in Spanish (mostly rising) and Cha'palaa (mostly falling). It shows that four of the Spanish contours did not have a complete final rise. In these cases the contours rise and then have a short fall to a mid tone. We inspected these contours and found that the final fall occurred in final parts of the signal with very weak intensity. This final fall is not very salient, and one may wonder whether the Spanish speakers targeted a final mid tone in these cases, as the figure might suggest, or whether they instead targeted and achieved an audible high boundary tone, and then inadvertently produced a mid tone as their glottis adopted their default non-speech configuration during the last portion of the interjections. Some Cha'palaa contours displayed a rather flat pattern. Inspection of these examples revealed that their final portions presented creaky voice contributing to the auditory impression of a low boundary tone. Pitch could not be measured in these portions, which explains why they appear as truncated flat contours in normalized time. Figure 7 in the paper displays onsets by language. The guidelines for coding focused on any hearable onset, and so the auditory judgements displayed in this figure do not provide information on the strength of articulation of the onset. However, we found that the interjection tokens differed quite markedly in this regard. Many Icelandic tokens have a strong breathy /h/ (e.g. Icelandic_01, Icelandic_13, Icelandic_14, Icelandic_18), whereas the Spanish tokens that were marked as having this same sound were much more subtle (e.g. Spanish_08, Spanish_15). The wide distribution of Spanish tokens over the space reflects the fact that the Spanish data consists of lab quality recordings [2] in which even slight glottal constriction or frication can be detected easily.

5
Nasality and mouth aperture Some degree of nasality was perceived in the great majority of cases ( Figure A).

6
"Other" values The coding protocol for vowels and consonants explicitly allowed observations outside the range found in preliminary observations. However, this possibility was used for only 3 out of 196 tokens: Chapalaa_21 (described by one coder as -other: bit higher and bit more central‖, but as /a/ by the other two coders), Siwu_18 (described by one coder as -other: very back from a‖ but as /a/ by the others), and Dutch_06 (a token in overlap with other speech, described as -mid central‖ and -can't tell‖ by the others).