Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Agreement on emotion labels' frequency in eight Spanish linguistic areas

  • Ana R. Delgado ,

    Roles Conceptualization, Data curation, Methodology, Writing – original draft, Writing – review & editing

    adelgado@usal.es

    Affiliation Facultad de Psicología, Universidad de Salamanca, Salamanca, Spain

  • Gerardo Prieto,

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliation Facultad de Psicología, Universidad de Salamanca, Salamanca, Spain

  • Debora I. Burin

    Roles Conceptualization, Data curation, Writing – original draft, Writing – review & editing

    Affiliation Facultad de Psicología, Universidad de Buenos Aires-CONICET, Buenos Aires, Argentina

Abstract

Various traditions have investigated the relationship between emotion and language. For the basic emotions view, emotional prototypes are lexically sedimented in language, evidenced in cultural convergence in emotional recognition and expression tasks. For constructionist theories, conceptual knowledge supported by language is at the core of emotions. Understanding emotion words is embedded in various interrelated constructs such as emotional intelligence, emotion knowledge or emotion differentiation, and is related to, but different from, general vocabulary. A clear advantage of Emotion Vocabulary over most emotion-related constructs is that it can be measured objectively. In two successive corpus-based studies, we tested the predictions of concordance and absolute agreement on the frequency of use of a total of 100 Spanish emotion labels in the eight main Spanish-speaking areas: Spain, Mexico-Central America, River Plate, Continental Caribbean, Andean, Antilles, Chilean, and the United States. In both studies, the intraclass correlation coefficient was statistically different from the null and very large, over .95, as was the Kendall's concordance coefficient, indicating broad consensus among the Spanish linguistic areas. From an applied perspective, our results provide supporting evidence for the similarity in frequency, and therefore cross-cultural generalizability regarding familiarity of the 100 emotion labels as item stems or as experimental stimuli without going through a process of additional adaptation. On a broader scope, these results add evidence on the role of language for emotion theories. In this regard, countries and regions compared here share the same Spanish language, but differ in several aspects in history, culture, and socio-economic structure.

Introduction

The traditional view of emotions posits that they are basic, universal, phylogenetically shaped processes that are engrained in human biological functioning, and thus organize cognitive, experiential, and behavioural reactions to changes in the environment [1]. Emotions encompass physiology, actions, facial, vocal, and postural expression, and cognitive processes, and have both a rapid response and a social interaction function. Emotional episodes and experiences would conform to universal prototypes, with cultural variations but within a general categorical similarity [2]. Emotional prototypes would be lexically sedimented in language, evidenced in cultural convergence in emotional recognition and expression tasks employing emotional words alone, or in short verbal statements [3, 4]. Although these tasks have been criticized because languages vary in words that refer to specific emotions, and some supposed basic emotions do not have a specific word in some cultures [5], studies on emotional recognition and labelling often find agreement.

A different view of emotions, the constructionist perspective, proposes that conceptual knowledge supported by language is at the core of emotions [57]. The summary representation of any emotion category is an abstraction, not a denomination of a natural object such as a body or brain state. Interoceptive sensations, experienced as lower dimensional feelings of affect (valence and arousal), are assumed to be in continuous interpretation, along with other sensory and motor inputs and outputs, by a predictive brain that implements conceptual categories in its internal model to give them meaning [7]. The brain uses emotion concepts to categorize sensations, and to dynamically construct various instances of emotion in specific situations. Socio-cultural mechanisms, especially language, are responsible for organizing and differentiating emotional experiences [69]. Language is the “glue” that helps to link bodily states, perceptions of muscular movements in the face and body, and other sensory and motor experiences as instances of a particular emotion concept. For example, emotion words and their associated semantic knowledge have been shown to determine how facial configurations are predicted, encoded, and remembered as emotional expressions [10].

Emotion vocabulary grows in childhood and adolescence, with individual and group differences [1114]. Understanding of the emotion vocabulary is embedded in various interrelated constructs such as emotional intelligence [15, 16], emotion knowledge [17], or emotion differentiation [18]. Understanding emotion words is an integral part of emotional intelligence, and is related to, but different from, general vocabulary [15]. Being able to distinguish between affective experiences, and to label negative ones, is associated with several indices of mental health in adulthood, and more adaptive emotion regulation [18]. For instance, in a study with the experiential sampling technique, in which participants reported several times a day their emotional experience, patients with social anxiety disorder had less differentiated negative emotions compared to controls, controlling for intensity and comorbidity, suggesting an association between the anxiety disorder and understanding emotions at a given moment in daily life [19]. For people who experience a traumatic or negative episode, talking and writing about their emotion acts as a buffer for mental health [20].

The relevance of emotion vocabulary in emotion theory and in individual differences has led to various measurement instruments (e.g. for adults, vignette-based: MSCEIT [16]; STEU [15]; or word definition: GEMOK-Features [21]. In Spanish, the Emotion Vocabulary Test (EVT) was recently developed [22, 23]. Each of the 40 multiple-choice items of the EVT is composed of a Spanish emotion label (the item stem) and five response options corresponding to the five broad emotion "families" of happiness, sadness, anger, fear, and disgust.

In principle, the use of these 40 Spanish emotion words (the EVT item stems) as psychometric or experimental stimuli in other Spanish-speaking zones would require an adaptation procedure. Under the unitary umbrella of construct validity, content validation strategies are appropriate when the boundaries of a domain can be described [24], as is the case with emotion vocabulary. It is common to have experts to adapt test content to other languages or cultures. Here, we propose a less subjective procedure based on a corpus approach. Linguistic corpora analyses have been employed for studying language and cultural comparisons [25] and have gained traction in this century due to the vast linguistic information online and the availability of computerized and big data analytic tools [26, 27]. For example, [28] calculated the co-occurrence in a corpus of unselected text from USENET discussion groups, of emotion words taken from basic emotion models.

In the present case, we have focused on the lexical level, and the CORPES XXI corpus [29]. Spanish is the second most spoken mother tongue, with 460 million native speakers in 31 countries [30]. There are eight main Spanish-speaking areas: Spain, Mexico-Central America, River Plate, Continental Caribbean, Andean, Antilles, Chilean, and USA; they all are represented into the Spanish Corpus of the Royal Academy, CORPES XXI with about 300 million forms from oral (10%) and written text (40% from books, 40% from periodicals, 7.5% internet material, and 2.5% miscellaneous). Of the texts, 30% are from Spain [29]. Note that the absolute frequency of a word in one linguistic area should not be compared with the absolute frequency of that word in another area because they are not equally represented in the corpus. This is why CORPES XXI also offers the possibility of obtaining normalized frequencies per million words, i.e., relative frequencies in each area multiplied by one million (fpmw).

Initial steps in corpora analyses generate frequency lists, to map out and compare word frequency across either an entire corpus or across particular sub-sets (sub-corpora). Although this would not constitute a deep semantic analysis, it is a first step in comparing the lexical structure, and possible lexical (and cultural) differences. A positive answer to the question "Is there consensus in frequency for the 40 Spanish emotion labels (EVT stems) comparing the eight Spanish speaking linguistic areas?" would provide supporting evidence for the use of the EVT item stems as psychometric or experimental stimuli in any Spanish-speaking area before additional adaptation. In other research settings, it could provide a set of emotion labels with similar frequency across Spanish speaking countries. Frequency is one of the main factors affecting several experimental psycholinguistic outcomes, such as lexical decision, word naming, language comprehension, and memory recall and recognition [31, 32].

On a broader scope, it would add evidence regarding the role of language for emotion theories. In this regard, countries and regions compared here share the same Spanish language, but differ in several aspects in history, culture, and socio-economic structure. Although frequency does not reflect semantic meaning, it is one of the basic dimensions of a lexicon, indicating ease of access; its effects reflect in part semantic activation, given that lexical access is mediated by the number of contexts in which a word tends to occur rather than pure repetition of occurrence [32].

A second study, if consensus results were replicated, would help to content-validate new items/stimuli as well as to reinforce the conclusions of the first study.

Thus, two successive corpus-based studies (CORPES XXI [29]) were carried out to test the predictions of concordance and absolute agreement on the frequency of use of a total of 100 Spanish emotion words –40 emotion labels from the EV test (Study 1) and 60 new emotion labels (Study 2)–in the eight main Spanish-speaking areas (Spain, Mexico-Central America, River Plate, Continental Caribbean, Andean, Antilles, Chilean, and the United States).

Materials and methods

The geographical distribution of the forms in CORPES XXI v. 0.91 for the eight main areas was: Spain (32%), Mexico-Central America (19%), River Plate (14%), Continental Caribbean (12%), Andean (8%), Antilles (7%), Chilean (6%), USA (1%). We did not take into account areas whose representation was under 0.5% (Guinea and the Philippines).

A simple way of testing concordance among areas ("judges") regarding the order of emotion words ("objects") is by using the Kendall Coefficient of Concordance (W): Let us think of the emotion words as "objects" and then think of the various areas as the "judges" that rank them. Only the ranks are now less subjective, not coming from expert judgement but from word frequency. The W statistic does not require the assumption of quantitative scaling. Considering fpmw as quantitative, we can also assess absolute agreement by means of an Intra-class Correlation Coefficient (ICC), a measure of the proportion of variance that can be attributed to the measurement objects [33].

There are various ICC kinds depending on the answers to three questions: Do the same "judges" score every "object"? Are "judges" a sample or a population? Is reliability of a single "judge" or of their average? For our data (i.e., "judges" are the 8 main Spanish linguistic areas, "objects" are the 40 words (Study 1) or 60 words (Study 2), each "receiving" a fpmw), ICC kinds would correspond to the following models:

  • One-way random effects: each word fpmw is given in different areas that are sampled from a larger pool of potential areas that are treated as random effects.
  • Two-way random effects: all word fpmw are calculated in all areas; both factors–areas and words–are random effects. It is a consistency coefficient (C-type ICC).
  • Two-way mixed effects: areas are considered as fixed effects but words are treated as random effects. It is an absolute agreement coefficient (A-type ICC).

They are called ICC(1), ICC(C,1), and ICC(A,1) respectively when the unit of analysis is the individual, and ICC(k), ICC(C,k), and ICC(A,k) when it is an average (of k "judges"). Because our objective was to test the hypothesis of consensus regarding the frequency of use of the emotion labels among the 8 linguistic areas, finding ordinal concordance would constitute soft evidence. A large-sized absolute agreement ICC value, over .90, would be considered as strong evidence to corroborate our hypothesis.

The Kendall coefficient of concordance (W), and ICC(A,8) for absolute agreement (Spanish linguistic areas are a fixed-effect factor) were calculated by means of the R package [34] "irr" [35] on the RStudio environment [36]. In addition to these two statistics, and just for comparison purposes, we report results for the remaining ICC two-way models.

Study 1

Materials and procedure

CORPES XXI normalized frequencies per million for the 40 Spanish emotion labels (the stems of the 40 multiple-choice items of the EVT) in each of the 8 main linguistic areas were retrieved on December the 5th, 2018. They can be seen from Table 1, where both words and areas are in alphabetical order.

Results

The Kendall coefficient of concordance was statistically different from the null, W = .960, Chi-squared(39) = 300, p < .001, and very large-sized, as was the ICC (A,8) = 0.995 [F(39,231) = 226, p < .001, 95% CI: 0.993 < ICC < 0.997] indicating absolute agreement, i.e., broad consensus among the eight Spanish linguistic areas.

Different assumptions regarding the various ICC kinds would not change this conclusion, as can be seen from Table 2. The 95% confidence intervals make clear that they all are well over the .90 that we consider would show strong evidence of consensus among areas for the frequency of use of the 40 EVT stem words.

thumbnail
Table 2. Intra-class correlation coefficient two-way models (40 Words, 8 Areas).

https://doi.org/10.1371/journal.pone.0237722.t002

Study 2

Materials and procedure

A list of another 60 Spanish emotion labels was made by looking for synonyms of the EVT stems as well as words from the Spanish semantic field of the empirically-derived English emotion labels [37, 38]. Note that, in any language, the number of emotion labels, as opposed to the number of emotion-laden words [39] is very limited. On March the 23rd, 2019, CORPES XXI normalized frequencies per million for these 60 Spanish emotion words in each of the 8 main linguistic areas were retrieved (Table 3, in alphabetical order).

Results

The Kendall coefficient of concordance was statistically different from the null, W = .963, Chi-squared (59) = 454, p < .001, and very large-sized, as was the ICC (A,8) = 0.996 [F(59,420) = 285, p < .001, 95% CI: 0.995 < ICC < 0.998] indicating absolute agreement, i.e., broad consensus among the eight Spanish linguistic areas.

Different assumptions regarding the various ICC kinds would not change this conclusion, as can be seen from Table 4. As in Study 1, the 95% confidence intervals show that they all are over the .90 that we consider would show strong evidence of consensus among areas for the frequency of use of the 60 emotion labels.

thumbnail
Table 4. Intra-class correlation coefficient two-way models (60 Words, 8 Areas).

https://doi.org/10.1371/journal.pone.0237722.t004

Discussion

This study employed a linguistic corpus analysis approach to compare the relative frequency of emotion labels in the eight main Spanish-speaking areas (Spain, Mexico-Central America, River Plate, Continental Caribbean, Andean, Antilles, Chilean, and USA) as provided by the CORPES XXI normalized frequencies [29]. We found very high levels of agreement among areas for the frequency of use of the 40 EVT stem words. The reference corpus is the biggest in Spanish, has high representativeness and balance [26], including oral transcriptions, from the XXI century [29], so this result is a first step to establish a lexical agreement over these words between these regions, with a big and representative reference corpus.

Our results constitute a first step in validation of the EVT test to be used in any of the Spanish speaking regions, allowing for a further semantic adaptation process. As a measure of vocabulary knowledge, word frequency is one of the main factors for item difficulty [4042]. These results suggest an agreement in frequency, and thus difficulty, for the five broad emotion "families" of happiness, sadness, anger, fear, and disgust and their associated 40 items presented in the test. However, in multiple-choice formats, semantic similarity between the correct answer and distractors, and distractor word frequency and other properties, are also relevant for item difficulty. As a test the EVT might need finer tuning. Future corpora studies can study lexical associations between item words, within and between Spanish speaking regions (e.g. [28]) or compare those results with different participant samples.

Frequency is one of the main factors affecting several psycholinguistic and memory tasks [31, 32]. Our results also provide other experimental researchers with a set of items calibrated for frequency in most Spanish speaking countries.

From a theoretical perspective, these results, together with those from the replication study, would suggest that people speaking a particular language, although in different countries (thus differing in some cultural aspects), share lexical properties of emotion words. Empirical examination of frequency effects show that its effects reflect in part semantic activation, given that lexical access is mediated by the number of contexts in which a word tends to occur rather than pure repetition of occurrence [32]. Thus, these similarities in frequency would tend to agree with the view that emotions constitute basic prototypes [14]. Further investigation of empirical semantic judgments in different Spanish speaking countries could evaluate whether there are, in effect, basic semantic similarities, and /or particular nuances in meaning of emotional vocabulary.

References

  1. 1. Tracy JL, Randles D. Four models of basic emotions: A review of Ekman and Cordaro, Izard, Levenson, and Panksepp and Watt. Emot Rev. 2011; 3: 397–405.
  2. 2. Keltner D, Sauter D, Tracy J, Cowen A. Emotional Expression: Advances in Basic Emotion Theory. J Nonverbal Behav. 2019; 43: 133–160. pmid:31395997
  3. 3. Cordaro DT, Sun R, Keltner D, Kamble S, Huddar N, McNeil G. Universals and cultural variations in 22 emotional expressions across five cultures. Emotion. 2018. 18: 75–93. pmid:28604039
  4. 4. Elfenbein HA, Ambady N. On the universality and cultural specificity of emotion recognition: A meta-analysis. Psychol Bull. 2002; 128: 203–235. pmid:11931516
  5. 5. Russell J. A. (1991). Culture and the categorization of emotion. Psychol Bull. 1991; 110: 426–450. pmid:1758918
  6. 6. Barrett LF. Emotions are real. Emotion, 2012; 12: 413–429. pmid:22642358
  7. 7. Barrett LF. The theory of constructed emotion: an active inference account of interoception and categorization. Soc Cogn Affect Neurosci. 2017; 12: 1–23. pmid:27798257
  8. 8. Lindquist KA, Satpute AB, Gendron M. Does language do more than communicate emotion? Curr Dir Psychol Sci. 2015; 24: 99–108. pmid:25983400
  9. 9. Doyle CM, Lindquist KA. When a word is worth a thousand pictures: language shapes perceptual memory for emotion. J Exp Psychol Gen. 2018; 147: 62–73. pmid:29309197
  10. 10. Betz N, Hoemann K, Barrett LF. Words are a context for mental inference. Emotion. 2019; 19: 1463–1477. pmid:30628815
  11. 11. Bazhydai M., Ivcevic Z, Brackett MA, Widen S C. Breadth of emotion vocabulary in early adolescence. Imagin Cogn Pers. 2018; 38: 378–404.
  12. 12. Li Y. Yu D. Development of emotion word comprehension in Chinese children from 2 to 13 years old: Relationships with valence and empathy. PLoS One. 2015; 10(12):e0143712. pmid:26647060
  13. 13. Nook EC, Sasse SF, Lambert HK, McLaughlin KA, Somerville LH. Increasing verbal knowledge mediates development of multidimensional emotion representations. Nat Hum Behav. 2017; 1: 881–889. pmid:29399639
  14. 14. Widen SC. Children's interpretation of facial expressions: The long path from valence-based to specific discrete categories. Emot Rev. 2013; 5: 72–77.
  15. 15. MacCann C, Roberts RD. New paradigms for assessing emotional intelligence: Theory and data. Emotion. 2008; 8: 540–551. pmid:18729584
  16. 16. Mayer JD, Salovey P, Caruso DR, Sitarenios G. Measuring emotional intelligence with the MSCEIT V2.0. Emotion, 2003; 3: 97–105. pmid:12899321
  17. 17. Izard CE, Woodburn EM, Finlon KJ, Krauthamer-Ewing ES, Grossman SR, Seidenfeld A. Emotion knowledge, emotion utilization, and emotion regulation. Emot Rev. 2011; 3: 44–52.
  18. 18. Kashdan TB., Barrett LF, McKnight PE. Unpacking emotion differentiation transforming unpleasant experience by perceiving distinctions in negativity. Curr Dir Psychol Sci. 2015; 24: 10–16.
  19. 19. Kashdan TB., Farmer AS. Differentiating emotions across contexts: comparing adults with and without social anxiety disorder using random, social interaction, and daily experience sampling. Emotion. 2014; 14: 629–638. pmid:24512246
  20. 20. Pennebaker JW, Chung CK. Expressive writing, emotional upheavals, and health. In: Friedman HS, Silver RC, eds. Foundations of Health Psychology. New York: Oxford University Press; 2007. p. 263–284.
  21. 21. Schlegel K, Scherer KR. The nomological network of emotion knowledge and emotion understanding in adults: evidence from two new performance-based tests. Cogn Emot. 2018; 32: 1514–1530. pmid:29235929
  22. 22. Delgado AR, Prieto G, Burin DI. Constructing three emotion knowledge tests from the invariant measurement approach. PeerJ. 2017; 5:e3755. pmid:28929013
  23. 23. Delgado AR, Burin DI, Prieto G. Testing the generalized validity of the Emotion Knowledge test scores. PLoS ONE. 2018; 13(11):e0207335. pmid:30427923
  24. 24. Gignac GE. Psychometrics and the Measurement of Emotional Intelligence. In: Parker J, Saklofske D, Stough C, eds. Assessing Emotional Intelligence. The Springer Series on Human Exceptionality, Boston: Springer; 2009. p.9–40. https://doi.org/10.1007/978-0-387-88370-0_2
  25. 25. Michel JB, Shen YK, Aiden AP, Veres A, Gray MK; Google Books Team, Pickett JP, Hoiberg D, Clancy D, Norvig P, Orwant J, Pinker S, Nowak MA, Aiden EL Quantitative analysis of culture using millions of digitized books. Science. 2011; 331: 176–182. pmid:21163965
  26. 26. McEnery T, Xiao R, Tono Y. Corpus-Based Language Studies: An Advanced Resource Book. London: Routledge; 2006.
  27. 27. Flowerdew L. Corpora and Language Education. Basingstoke: Palgrave Macmillan; 2012.
  28. 28. Westbury C, Keith J, Briesemeister BB, Hofmann MJ, Jacobs AM. Avoid violence, rioting, and outrage; approach celebration, delight, and strength: Using large text corpora to compute valence, arousal, and the basic emotions. Q J Exp Psychol. 2014; 68: 1599–1622. pmid:26147614
  29. 29. Real Academia Española. Corpus del Español del Siglo XXI, CORPES XXI. 2018. [internet] http://web.frl.es/CORPES/view/inicioExterno.view
  30. 30. Eberhard DM, Simons GF, Fennig CD. Ethnologue: Languages of the World. Twenty-second edition. Dallas, Texas: SIL International; 2019. [internet] http://www.ethnologue.com.
  31. 31. Brysbaert M, Buchmeier M, Conrad M, Jacobs AM, Bolte J, Bohl A. The word frequency effect: a review of recent developments and implications for the choice of frequency estimates in German. Exp Psychol. 2011; 58: 412–424. pmid:21768069
  32. 32. Plummer P, Perea M, Rayner K. The influence of contextual diversity on eye movements in reading. J Exp Psychol Learn Mem Cogn. 2014; 40: 275–283. pmid:23937235
  33. 33. McGraw KO, Wong SP. Forming inferences about some intraclass correlation coefficients. Psychol Methods. 1996; 1: 30–46.
  34. 34. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2018; [internet] https://www.R-project.org/
  35. 35. Gamer M, Lemon J, Fellows I, Sing P. irr: Various coefficients of interrater reliability and agreement (Version 0.84.1) [software]. 2019; [internet] https://CRAN.R-project.org/package=irr
  36. 36. RStudio Team (2017). RStudio: Integrated Development for R. RStudio, Inc., Boston, MA. 2017; [internet] http://www.rstudio.com/
  37. 37. Cowen AS, Keltner D. Self-report captures 27 distinct categories of emotion bridged by continuous gradients. PNAS. 2017; 114(38) E7900–E7909. pmid:28874542
  38. 38. Keltner D. Toward a consensual taxonomy of emotions. Cogn Emot. 2019; 33: 14–19. pmid:30795713
  39. 39. Pavlenko A. Emotion and emotion-laden words in the bilingual lexicon. Biling: Lang Cogn. 2008;11: 147–164.
  40. 40. Forster KI, Chambers SM. Lexical access and naming time. J Verbal Learning Verbal Behav. 1973; 12: 627–635.
  41. 41. Monaghan P, Chang YN, Welbourne S, Brysbaert M. Exploring the relations between word frequency, language exposure, and bilingualism in a computational model of reading. J Mem Lang. 2017; 93: 1–21.
  42. 42. Vonk JMJ, Flores RJ, Rosado D, Qian C, Cabo R, Habegger J, et al. Semantic network function captured by word frequency in nondemented APOE ε4 carriers. Neuropsychology. 2019; 33: 256–262. pmid:30489116