Emotional Valence, Arousal, and Threat Ratings of 160 Chinese Words among Adolescents

This study was conducted to provide ratings of valence/pleasantness, arousal/excitement, and threat/potential harm for 160 Chinese words. The emotional valence classification (positive, negative, or neutral) of all of the words corresponded to that of the equivalent English language words. More than 90% of the participants, junior high school students aged between 12 and 17 years, understood the words. The participants were from both mainland China and Hong Kong, thus the words can be applied to adolescents familiar with either simplified (e.g. in mainland China) or traditional Chinese (e.g. in Hong Kong) with a junior secondary school education or higher. We also established eight words with negative valence, high threat, and high arousal ratings to facilitate future research, especially on attentional and memory biases among individuals prone to anxiety. Thus, the new emotional word list provides a useful source of information for affective research in the Chinese language.


Introduction
Recent studies have provided unequivocal evidence that emotion and cognitive processes are closely coupled [1]. Various experimental paradigms have been developed to examine how stimuli with emotional content influence cognition. The dot probe task [2] and the emotional Stroop task [3] are most frequently used for investigating attentional processes. The directedforgetting paradigm [4] and implicit and explicit memory tasks [5,6] are frequently used to examine memory processes.
Studies such as these may present pictures, faces, or words as emotional stimuli. One advantage of verbal stimuli is the ability to control a series of quantifiable factors (e.g. number of syllables) known to affect word processing, which is not the case for other stimulus materials such as pictures or faces. However, creating highly controlled word-based materials for studying emotional information processing requires not only the careful matching of various factors known to influence word perception, but also a reliable measure of emotional content. For instance, Bradley and Lang [7] produced the Affective Norms for English Words (ANEW), which provide a set of normative emotional ratings for a large number of words. Studies investigating cognitive biases have also provided many appropriate words [8][9][10][11]. The availability of such stimuli makes the replication of results and scientific communication easier and comparative studies with different categories of stimuli and sensorial modalities possible. Affective word lists have been developed in other languages apart from English, including Finnish [12], Spanish [13,14], German [15], Italian [16], European Portuguese [17], and Dutch [18]. Like their English language counterparts, these non-English-language affective word lists have been widely used in memory and perception experiments. For example, Onraedt and Koster [19] used a Dutch affective word database in a study of working memory, whereas Siakaluk, Knol, and Pexman [20] adopted the valence and arousal ratings of affective words as the stimuli in the Stroop task.
Three components of emotion are commonly used [21]: the valence or pleasantness of the stimuli; the arousal or excitement provoked by the stimuli; and the dominance or degree of control exerted by the stimuli. Of these, emotional valence has been identified as the most powerful measure of the emotional nature of stimuli, and has also been shown to capture cognitive resources such as attention [22,23]. The level of arousal is the second major dimension of emotional affect [7,24], and there is increasing evidence for its importance [25]. Dominance has been less commonly used [18], and some studies have only included valence and arousal, not dominance [12,13,15].
In addition to the above three dimensions, studies on anxiety have found an association between threat-related attentional biases and anxiety [26]. However, although not all negative words are necessarily threatening, the two emotional traits are often mingled. Some studies support the idea that threat captures attention in all individuals only if it exceeds a critical threshold [27,28]. It would be useful, therefore, to distinguish between threat and negative emotional valence and the level of threat for each word, yet there is a dearth of information in this area. In the current study, ratings on valence and threat (or the potential harm induced by the stimuli) were obtained for each word to provide data on words that are unpleasant (negative), threatening, and high in arousal. Only one study, on spoken French words, has provided ratings of these three dimensions [29]. Nevertheless, threat word cues are important stimuli for studies on cognitive processing and memory biases [9,30].
Chinese is one of the most widely used languages in the world, yet to the best of our knowledge, no previous study has compared the emotional ratings of Chinese words with ratings of the corresponding words in English norms. Previous studies on Chinese subjects have typically used translated texts and adopted the emotional ratings of English words [31]. However, the polysemy or syntactic ambiguity of a word, or the lack of lexical parallelism between Chinese and English, may undermine the comparability of ANEW's word ratings. A well-known example is the word "crisis" in English which is depicted by two characters in Chinese (危機). The first character (危) represents danger while the second (機) represents opportunity. It has been suggested that the word crisis may not have a high negative valence in Chinese [32]. Furthermore, the perception of certain emotional words may change during adolescence. Considering the increased interest in the study of cognitive processing in children and adolescents, and the advantages associated with the use of words in terms of experimental manipulation and control, the aim of the present study was to validate the emotional valence, arousal level, and threat level of Chinese words. Specifically, we aimed to adapt the word stimuli to Chinese at an appropriate reading level for Chinese adolescents. Our findings provide a set of normative emotional ratings for a large number of words in the Chinese language, which will be a useful tool for use in future studies of emotion and cognition conducted among adolescents in Chinese community.
In summary, this study was conducted to (1) establish a list of affective words in Chinese to facilitate cognitive research, especially experiments on working memory and attentional bias for children and adolescents; (2) establish words with threat rating which are less available in the literature; (3) provide ratings of the threat dimension to examine the valence-threat and arousal-threat bi-dimensional relationships; and (4) create a list of words that are highly negative and elicit high arousal and threatening reactions among people for future research.

Methods Participants
The participants were 164 students recruited from two secondary schools in mainland China (n = 102): School A, n = 71, School B, n = 31, and one secondary school in Hong Kong (School C, n = 62). All of the participants were adolescents with no history of psychopathology or psychotic spectrum disorders. All were native Chinese speakers: participants from Hong Kong were able to read traditional Chinese and participants from mainland China were able to read simplified Chinese. As a result, for participants from Hong Kong, traditional Chinese was used for all assessment materials (words for rating, inventories etc.) whereas for participants from Mainland China, simplified Chinese was used. The contents of two Chinese versions of the materials are identical. Our previous studies have confirmed that these two forms of Chinese can be used interchangeably with little effect on the results [33,34].

Procedure
Written informed consent from the parents or guardians was first obtained through the school teachers. On the assessment day, students were informed that participation was voluntary and they could withdraw from the study at any time without any negative consequences. Anonymity and confidentiality of participations were assured. Written informed consent was obtained from the students before commencement of the experiment. Ethics approval was obtained from the Human Subjects Ethics Sub-Committee of the City University of Hong Kong.
Rating 300 words on 4 dimensions (valence, arousal, dominance, and threat) would have meant each participant rating 1,200 items, which could have led to fatigue and non-compliance behavior (e.g. missing responses). The following two strategies were adopted to reduce the above issues. First, we excluded dominance as ratings on this dimension are less commonly collected in other studies, according to the literature review. Second, the participants were randomly assigned to one of three groups, each of which rated the words on only one of the three emotional dimensions: 56 (Group 1: mainland China, n = 31; Hong Kong, n = 25) rated the emotional valence of words; 53 (Group 2: mainland China, n = 37; Hong Kong, n = 16) rated the emotional arousal of the same words; and another 51 (Group 3: mainland China, n = 30; Hong Kong, n = 21) rated the threat value of the same words. Gender and age were balanced across groups. As a result, each participant only needed to rate 300 items on one emotional dimension. This arrangement could also minimize the possibility of mutual influence between the emotional valence, arousal, and threat ratings.
The participants in groups 1 and 2 rated the emotional valence and arousal levels of the words, respectively, on a 9-point Self-Assessment Manikin scale (SAM) [35,36]. The participants in group 3 rated the threat levels of the words on a five-point scale. Details of these rating scales are provided in the Measures section below. A paper-and-pencil procedure was adopted for the affective rating task. The data were collected in the participants' classrooms. In each session, before the data collection, a research assistant with a psychology educational background presented the aim of the study to the students, and emphasized the voluntary nature of their participation and the confidentiality of the results. Students were allowed to drop out of the study at this stage if they did not want to participate. Subsequently, the research assistant explained the affective rating task by describing the use of the SAM scale to groups 1 and 2, and the use of the threat-value scale to group 3. The participants were reminded that a personal, subjective rating was required, and thus there were no correct or incorrect answers. The participants were also told that they could mark a specific response if they did not know the meaning of any of the presented words. Before the start of the assessment, six words (love, boat, bomb, duck, trust, and crime) were used as examples to give the participants a basic reference, and an additional self-created word was use to show how to respond in case the word was not comprehensible to them. Finally, questionnaire sets with the word list were distributed to each subject. No time limit was defined, but the subjects were encouraged to answer as quickly as possible. The entire process lasted for about 40 minutes.

Materials
We first selected 300 English verbs and nouns with reference to the ANEW database [36] and the word lists of previous studies investigating cognitive biases, which provide a source for words pertaining to social and psychological threat [7,9,30,37,38]. The 300 words comprised 104 negative, 73 positive, and 123 neutral words. A bilingual translator (native Chinese language speaker) with a psychology education background performed the forward translation from English to Chinese, and another bilingual individual with a psychology education background independently translated it back into English. To ensure the items in both versions achieved grammatical and colloquial appropriateness, differences between the original and back-translated versions were discussed and resolved with the agreement of both translators and the first author of this paper. Two versions of the same word list were created with the items presented in random order to exclude the influence of primacy and recency effects on the participants' ratings.

Measures
Valence and Arousal Scales. The valence and arousal rating scales of the Self-Assessment Manikin (SAM) [7,35] were used to measure valence and arousal. The SAM consists of two 9-point bipolar ratings scales (ranging from "1" to "9") with pictorial manikins representing the values on each dimension. In the present study, on the valence scale, "1" was accompanied by a frowning, unhappy figure representing extremely unpleasant, and "9" was accompanied by a smiling, happy figure representing extremely pleasant. On the arousal scale, "1" was accompanied by a relaxed, sleepy figure corresponding to feeling very calm, and "9" was accompanied by an excited, wide-eyed figure corresponding to feeling very excited and aroused. The presentation of the scales was based on empirical evidence that higher numbers intuitively go with positive anchors (e.g. sad to happy rather than happy to sad) [39].
Threat-value Scale. As in the study by Bertels, Kolinsky, and Morais [29], words were rated according to their threat value on a 5-point scale ranging from "1 = not threatening" to "5 = very threatening." Personal Data. Participants were also asked to provide demographic information (e.g., sex, age, education grade, country/place resided in most between birth and age 7) and answered questions about their language history (e.g., native language, second languages learned), handedness (right-handed, left-handed) and vision (normal, corrected to normal visual acuity). Participants were also asked to indicate whether they knew the meaning of each word ("yes" or "no").

Participant profile
Four participants (2.4%) with more than 10% missing data were excluded from the data analysis, thus the final sample consisted of 160 participants. The participants' demographic characteristics are presented by school in Table 1.
On average, the Hong Kong students were older than the mainland Chinese students, t(1) = -3.53, p < .01, and the Hong Kong sample had fewer girls than the sample from mainland China, χ²(1) = 4.24, p < .05.

Selection of Words
Ten words were excluded from the analysis because 10% or more of the participants (ranged from 10% to 15%) stated that they did not understand their meaning. The remaining 290 words were categorized into positive, neutral, or negative valence according to the rating scores of the 56 participants assigned to the valence rating group (see Procedure section) based on the criteria of Ferre et al. [13]: less than 4 = negative; 4 to 6 = neutral; 6 to 9 = positive. The resulting valence classification (positive, negative, neutral) of each word was then compared to its a priori classification according to previous studies (see Materials section). Only words that were rated the same in both classifications were selected for further analysis. For example, the word "crazy" (瘋狂 in Chinese) was classified as negative in English according to previous studies but positive in Chinese according to the ratings of the present sample. This word was excluded from the final list. This strategy ensured that the Chinese words in our final list had identical valence to the existing word lists in other languages to facilitate communication and comparison of results. Table 2 depicts the concordance rate of the three valence types. There were 160 words with valence identical to previous studies: 25 positive (15.6%), 90 neutral (56.3%), and 45 (28.1%) negative. These 160 words were included in our final word list. Note: High threat rating is defined as a mean score above 3 on a 5-point scale; High negative valence is defined as a mean score below 3 on a 9-point scale; and high arousal is defined as a mean score above 5 on a 9-point scale. High threat, high negative and high arousal words are highlighted in bold fonts.

Descriptive Statistics
After categorizing the words into positive, negative, and neutral valence, we categorized them as high or low arousal and high or low threat, according to the following strategies: 1. words with a mean arousal rating score above 5.0 according to the 9-point SAM scale were categorized as high arousal words; and 2. words with a mean score above 3.0 on the 5-point threat rating scale were categorized as high threat words.
According to the above strategies, 20 words (12.5%) were categorized as high arousal and 15 words (9.4%) as high threat. As threat ratings are not commonly available in the literature, we provide the means and standard deviation of the high threat words in Table 3. An Excel file with the data for the 160 words and their classification is available as a supporting information of the article. The top three highest-rated threat words were dying (mean = 4.00, SD = 1.23), surgery (mean = 3.90, SD = 1.33), and suffocate (mean = 3.84, SD = 1.36). Note: High threat rating is defined as a mean score above 3 on a 5-point scale; High negative valence is defined as a mean score below 3 on a 9-point scale; and high arousal is defined as a mean score above 5 on a 9-point scale. High threat, high negative and high arousal words are highlighted in bold fonts.
As discussed previously, not all negative words are necessarily threatening, and vice versa. The participants rated 30 words (18.8%) as negative but not as posing a social or psychological threat; for instance, avoid, unkind, ashamed, stupid, embarrass, and lazy. Conversely, all 15 words that were rated as high threat were also rated as negative words. Four words (2.5%) were classified as high threat but low arousal: collapse, idiotic, emergencies, and teased. Nine words (5.6%) were rated as high arousal but low threat: ashamed, stupid, embarrass, reject, terrific, terrified, despise, worried, and coward. Finally, 11 words (6.9%) were rated as high threat and high arousal: hazard, suffocate, assault, annoyed, beating, dying, pounding, cancer, horror, surgery, and violence (Table 4).
It is useful to develop lists of words that are highly negative, and elicit high arousal and threatening reactions among people. We selected words with average scores below 3 on the valence scale (very negative/unpleasant; 9-point scale), above 5 on the arousal scale (high arousal; 9-point scale), and above 3 on the threat scale (high threat, 5-point scale). Eight words fulfilled the above criteria; representing 17.8% of the negative words on our list (see Table 3).

Relationships between demographic characteristics and ratings
The means and standard deviations of each emotional dimension by gender and location are depicted in Table 5. The mean ratings for valence and arousal were 4.59 (SD = .62) and 4.21 (SD = 1.20) respectively on a 9-point scale whereas those for threat was 2.20 (SD = .62) on a 5-point scale. Gender (male versus female) and region (Hong Kong versus mainland China) did not affect the ratings, except that the mean valence ratings were higher for participants from Hong Kong than for their mainland counterparts, t(54) = -3.87, p< .01. One-way analysis of variance (ANOVA) revealed no differences in the three emotional dimensions between participants of different ages: positive, F(5,144) = 1.13, p = .35; neutral, F(5,144) = 1.02, p = .41; and negative, F(5,144) = 1.88, p = .10.

Reliability
We adapted the method of Moors and colleagues [18] in a similar study to calculate the reliabilities for each sample of valence (n = 56), arousal (n = 53), and threat (n = 51) ratings separately (see also [21]). As mentioned before (see Procedure), participants were allocated to one of three groups with participants in each group provided ratings on one of the three emotional dimensions (valence, arousal, or threat). Accordingly, participants in each rating group were split into halves according to their serial numbers (odd or even). Interclass correlation coefficients [40] were high for valence (.98), arousal (.84), and threat (.96). We also split each rating group according to gender and location. All reliabilities were good: gender (male versus

Bi-directional Relationships between the Variables
The Pearson's zero-order correlation coefficients between the emotional words are shown in Table 6. There were significant correlations for all of the bi-dimensional relationships in the expected directions. First, valence and threat were negatively correlated (r = -.79, p. < .001), showing that words that were more pleasant were less threatening. Second, a significant negative correlation was obtained between arousal and threat (r = .62, p < .001), i.e. words that were more threatening tended to arouse more excitement among the participants. Finally, a negative linear relationship was obtained between valence and arousal (r = -.43, p < .001), suggesting that words that were more negative tended to elicit more excitement. Bradley and Lang [7] and other studies of affective word adaptation in different languages, such as Spanish [13], European Portuguese [17], and Finnish [12] have reported a quadratic relationship between valence and arousal. This quadratic relationship was thus examined in the current study.A U-shaped distribution (R = .72, p < .000), indicating that very positive or very negative words tended to arouse more excitement among the participants, was obtained. This quadratic relationship also explained more of the variance (52.5%) than the linear relationship (18.6%), showing that it fit the data better than the linear model. The same U-shaped distribution was obtained in the abovementioned studies [7,12,13,17].

Discussion
We establish valence, arousal, and threat ratings for a list of 160 Chinese affective words in this study. As mentioned before (see Procedure section), rating 300 words on 4 dimensions (valence, arousal, dominance, and threat) might lead to fatigue and non-compliance behavior.
We have excluded power/dominance in the present study for the following reasons. First, power/dominance has been less commonly used and excluded in previous studies [12,13,15,18]. Second, it is our interest to explore words that are threatening but not negative and vice versa (see Table 3). Finally, we are motivated to establish a list of words that are highly negative, and elicit high arousal and threatening reactions among people for future research (see also First, we attempted to ensure that the valence classification (i.e., positive, neutral, or negative) of each of our Chinese words corresponded with the classification of their English counterparts in previous studies [7,30,37,38]. For example, the word confident (自信 in Chinese) was classified as positive both in our study (mean valence rating = 6.37, SD = 2.10) and in the ANEW manual [7](mean valence rating = 7.98, SD = 1.29). Similarly, the word despise (鄙視 in Chinese) was categorized as negative both on our list and the ANEW list (our sample: mean = 3.16, SD = 2.25; ANEW: mean = 2.03, SD = 1.38). Due to the lack of lexical parallelism between the Chinese and English languages, we contend that words translated from English into Chinese may not necessarily have the same valence. Our present findings, in fact, support this proposition. For instance, the word crazy was classified as a negative word in English according to a previous study [9]. However, the Chinese translation of this word (瘋狂) carries the connotation of elation or ecstasy, and the young people in our sample regarded it as a positive word. As expected, some words that were classified as neutral in English were considered as positive or negative in Chinese and vice versa. For example, Stewart and colleagues [30] classified the word motel (旅館 in Chinese) as neutral, but our participants considered it a positive word, perhaps because they associated it with traveling and holidays. Forty-seven words (16.2%, Table 2) with positive valence in English were rated as neutral words by our participants. A notable example is the word happiness (快樂 in Chinese), which was classified as positive in McCabe's [9] study, but as neutral in the present study. The reason for this difference may be that the word happiness carries a hedonic connotation that may not be regarded as positive in Chinese culture [41]. Finally, 54 words (18.6%) that were classified as negative in English were rated as neutral in the current study. Many of these words have marginal ratings in the ANEW database [7], for example, anxious (ANEW valence mean rating = 4.80) and shy (ANEW valence mean rating = 4.59). Our Chinese participants might also regard shy and anxious as adaptive reactions in some situations because there are notable differences in pro-social behavior between Western and Chinese cultures [41][42][43]. A detailed discussion of the possible reasons for the differences in classification between our findings and those of previous studies is beyond the scope of this paper. We attempted to synchronize the valence of our Chinese words with existing English language word lists, which is an important step in ensuring the consistency of valence across languages, although this attempt reduced the number of words in our list.
Second, our study is one of the few to provide ratings of the threat of emotional words. Highly threatening stimuli are important tools in experimental studies of cognitive biases, especially studies related to the etiology of anxiety [2,26,44,45] and intervention strategies to modify the attentional bias of anxiety prone adolescents [46][47][48][49]. Our list contained 15 words that the adolescents perceived as eliciting social and psychological threat, and eight of these were rated highly on threat, arousal, and negative valence: annoyed (煩惱); assault (毆打); beating (痛打); cancer (癌症); dying (垂死); horror (驚駭); suffocate (窒息); and surgery (手術). All of these except horror are related to physical suffering and pain, which may be related to adolescents' concerns about their physical well-being and appearance at this stage [50]. These words should be helpful for use in experiments on cognitive biases related to anxiety, particularly attentional and memory biases, among adolescents. Our results show that high arousal words do not necessarily have high threat values. For instance, the word terrific (優秀 in Chinese) was not considered as a high threat word, although it had a high arousal value, and also had a high positive valence rating (mean = 6.19, SD = 2.47). Similarly, not all negative words were rated as threatening. Words such as avoid (逃避), ashamed (羞恥), and embarrassed (尷尬) are obviously negative words, but the adolescents in our sample did not appraise them as possessing a high potential to harm.
Third, the words in our list were comprehensible to the majority (> 90%) of our 12-to 17-year-old secondary school participants. There are two forms of written Chinese: simplified and traditional. The former is used mainly in mainland China and Singapore whereas the latter is used in Hong Kong and Taiwan. These two forms of written Chinese have slightly different lexical and grammatical structures, and some expressions and words in simplified Chinese may not be comprehensible to people using traditional Chinese and vice versa. It is important to have a list of emotional words applicable to adolescents familiar with either of the two written forms of Chinese to facilitate comparison across communities. Furthermore, our words were highly comprehensible to junior secondary students, thus it is logical to expect older people with junior secondary education or above (e.g. senior secondary school students, university students, and adults) to understand the meanings of these words. This means that our emotional word list can be applied to people of different age groups, although the valence, arousal, and threat ratings of the words may need to be verified again in future studies.
Moors and colleagues [18] did not find a significant correlation between valence and arousal scores (r = -.01). Although we found a significant linear relationship between these two variables in our study (r = -.43, p < .01), a U-shaped pattern fit our data significantly better than a linear relationship (Fig 1). Consistent with other studies [7,12,13,17,21], words that are very positive or very negative tended to arouse more excitement among our adolescent participants. As expected, more negative words tended to arouse more threat (r = -.79, p < .01) and more threatening words elicited more arousal (r = .62, p < .01). Finally, the correlational results ( Table 6) showed that more negative words tended to be perceived as more threatening while more threatening words tended to elicit more arousal, as would be expected.
In sum, we have developed a list of emotional Chinese words with good reliability and with valence classifications that are consistent with emotional word lists in English. The emotional words were comprehensible to individuals with junior secondary education who read either Simplified or Traditional Chinese characters. We collected threat ratings in addition to valence and arousal ratings, which are available as supporting information (S1 Table) in MS Excel format. However, further studies should be conducted to examine whether the same ratings and classifications of words apply to other age groups. Dominance ratings should also be established in future. Finally, a future study could ask the same participants to rate the words according to different dimensions and compare the results to those of the present study.
Supporting Information S1