Validation of the Amsterdam Dynamic Facial Expression Set – Bath Intensity Variations (ADFES-BIV): A Set of Videos Expressing Low, Intermediate, and High Intensity Emotions

Tanja S. H. Wingenbach; Chris Ashwin; Mark Brosnan

doi:10.1371/journal.pone.0147112

Abstract

Most of the existing sets of facial expressions of emotion contain static photographs. While increasing demand for stimuli with enhanced ecological validity in facial emotion recognition research has led to the development of video stimuli, these typically involve full-blown (apex) expressions. However, variations of intensity in emotional facial expressions occur in real life social interactions, with low intensity expressions of emotions frequently occurring. The current study therefore developed and validated a set of video stimuli portraying three levels of intensity of emotional expressions, from low to high intensity. The videos were adapted from the Amsterdam Dynamic Facial Expression Set (ADFES) and termed the Bath Intensity Variations (ADFES-BIV). A healthy sample of 92 people recruited from the University of Bath community (41 male, 51 female) completed a facial emotion recognition task including expressions of 6 basic emotions (anger, happiness, disgust, fear, surprise, sadness) and 3 complex emotions (contempt, embarrassment, pride) that were expressed at three different intensities of expression and neutral. Accuracy scores (raw and unbiased (Hu) hit rates) were calculated, as well as response times. Accuracy rates above chance level of responding were found for all emotion categories, producing an overall raw hit rate of 69% for the ADFES-BIV. The three intensity levels were validated as distinct categories, with higher accuracies and faster responses to high intensity expressions than intermediate intensity expressions, which had higher accuracies and faster responses than low intensity expressions. To further validate the intensities, a second study with standardised display times was conducted replicating this pattern. The ADFES-BIV has greater ecological validity than many other emotion stimulus sets and allows for versatile applications in emotion research. It can be retrieved free of charge for research purposes from the corresponding author.

Citation: Wingenbach TSH, Ashwin C, Brosnan M (2016) Validation of the Amsterdam Dynamic Facial Expression Set – Bath Intensity Variations (ADFES-BIV): A Set of Videos Expressing Low, Intermediate, and High Intensity Emotions. PLoS ONE 11(1): e0147112. https://doi.org/10.1371/journal.pone.0147112

Editor: Cosimo Urgesi, University of Udine, ITALY

Received: July 27, 2015; Accepted: December 29, 2015; Published: January 19, 2016

Copyright: © 2016 Wingenbach et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: This work was supported by the Department of Psychology of the University of Bath and doctoral scholarships to TSHW, from the German Academic Exchange Service (DAAD), the FAZIT Stiftung, and the University of Bath Graduate School.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Humans are very social, so it is not surprising our attention is often drawn to faces (e.g. [1]). This can be explained by the multitude of information faces convey about others, which includes such invariant aspects as their sex [2], race [3], and age [4]. We also perceive dynamic features about others from the face, including facial expressions. Facial emotional expressions are utilised within social interactions because they serve several functions. Darwin [5] proposed that facial expressions of emotion are directly linked to the feeling of an emotion, so that facial expressions provide a visual display of the internal emotional states of others. Since these signals about emotional states can be interpreted by an observer, emotional expressions serve a communicative role [6]. Facial expressions of emotion can also be used to regulate the environment, by indicating people’s intentions and actions [7]. When used in a more functional way for social regulation, expressions do not necessarily have to accurately reflect the current emotional state [6]. Given the importance of facial emotional expressions for social interactions and for conveying crucial information about others and ourselves [8], they have attracted a vast amount of research investigating our ability to correctly interpret those expressions.

To date, most facial emotion recognition research has utilised static stimuli with a high intensity of the facial expressions [9–12]. This includes the “Pictures of Facial Affect” [13], which might be the most widely used standardised face emotion stimulus set for research [14]. Based on the Pictures of Facial Affect, the “Facial Action Coding System” (FACS) was developed, where all muscular movements underlying facial expressions are catalogued [15]. These facial muscle movements are called ‘action units’ and specific combinations of such action units have been attributed to specific prototypical emotional facial expressions. For example, a pattern of muscle movements in the face with the lip corners pulled up and crow’s feeds at the outer edges of the eyes corresponds to ‘happiness’ [16], which is known as the Duchenne smile [17]. The development and validation of facial emotion expression stimulus sets often involves coding according to the FACS (e.g. [18]).

Research using static stimuli has helped identify the ‘basic emotions’, which are the facial expressions thought to be distinct and recognisable by all humans independent of their culture or race [19, 20, 21], although this view has been questioned by some (e.g. [22, 23]). The six basic emotions consist of anger, disgust, fear, happiness, sadness, and surprise [19]. These emotions are usually well recognised in experiments, although the recognition rates generally vary between the emotions [24]. The high recognition rate of basic emotions has adaptive functions, since it allows for rapid responses to biologically relevant stimuli [19]. The emotions shown to be easiest to recognise throughout the literature are happiness and surprise (e.g. [25, 26]), with fear often being the hardest to recognise (e.g. [27]). Not only accuracy, but also response times are influenced by item difficulty. For facial emotion recognition that means that response times are shortest for clear and unambiguous, easy to recognise, facial expressions (e.g. [28]). Accordingly, facial expressions of happiness are faster recognised than negative emotions [29]. Nonbasic emotions are assumed to have their basis in basic emotions [30], but are paired with self-evaluations [31] and are more likely to be influenced by culture [32]; examples constitute pride and embarrassment, which are called complex emotions. Complex emotions are generally less well recognised than basic emotions within facial emotion recognition experiments (e.g. see [33]).

Thanks to technological advancements it has become possible to conduct research on facial expressions using dynamic stimuli, which are more aligned to the real-life emotional expressions that are being studied. The application of dynamic stimuli poses the advantage of increased ecological validity. Static images only capture one moment in time, while dynamic stimuli enable the display of the whole progression from neutral to the full apex of the facial expression of emotion. Static images only display the activated facial action units constructing the facial emotional expression, whereas dynamic stimuli provide additional cues, such as temporal characteristic of the activation of the facial features, which are used in decoding of facial expressions [9, 12, 34]. In line with that, it has been suggested that in addition to static characteristics also dynamic characteristics are embedded in our representations of emotional facial expressions [35, 36]. Accordingly, facial emotion recognition research has shown that dynamic stimuli lead to higher recognition rates than static stimuli (e.g. [12, 22, 37]).

There are two types of dynamic face stimuli, video sequences based on morphed images and true video recordings of real human faces. Morphed dynamic stimuli are created by morphing two original static images, which may include images of a neutral face and an emotional expression, gradually into each other by creating artificial images according to predefined linear increments (e.g. [35, 38, 39]). The morphing technique allows for a high level of standardisation, as the number of increments and therewith the number of images (frames) as well as the presentation time of each frame and thereby the exposure time can be kept constant across all sequences. Morphed sequences are especially useful when investigating sensitivity in emotion perception from faces (e.g. [40–42]). However, the forced simultaneous changes of facial features that come with morphing pose a limitation for application in facial emotion recognition experiments. The naturalness of computer-generated expressions is questionable, as it is unclear whether the created movements are anatomically feasible [36, 43]. This concerns the onsets of single facial action units, which can vary [44], and the speed of those action units in reaching apex, which varies between emotions (Hara and Kobayashi as cited by [45]). Conversely, true video recordings preserve and capture variations in onsets and speed of facial action units. This has sparked the development of video recordings where professional actors or untrained participants are filmed whilst displaying prototypical facial emotional expressions (e.g. the Amsterdam Dynamic Facial Expression Set, ADFES [33]; Geneva Multimodal Emotion Portrayals, GEMEP [46]; Multimodal Emotion Recognition Test, MERT [47]; [48]; the MPI Facial Expression Database [49]).

One important feature not typically included in published face emotion stimulus sets is variations in intensity level of expressions. Including varying intensities is important, as in social interactions the facial expressions that are displayed spontaneously are mostly of low to intermediate intensity [50] with full intensity expressions being the exception [51]. Subtle displays of face emotion are very commonly seen and therefore are a major part of facial emotion recognition [43]. It has been proposed that people generally are not overly good at recognising subtle expressions [52], and research from static morphed images of varying intensities showed that accuracy [44] and response time [53] are linearly associated with physical expression intensity. Ekman and Friesen [15] suggested intensity ratings in the FACS from trace to maximum highlighting the importance of considering the whole range of emotional expression intensity. Including subtle expressions allows for a broader and more reliable assessment of facial emotion recognition.

Moreover, face emotion stimuli of varying intensities of facial expressions allow for investigation of populations that are thought to have difficulties with facial emotion recognition (e.g. in Autism Spectrum Disorders; for a review see [54]), where it can be examined whether those difficulties present across all intensity levels or for example just for subtler displays. Performance in facial emotion recognition at varying intensities is not only of interest for clinical samples, but also for general populations. For example, a female advantage compared to males in facial emotion recognition is frequently reported (e.g. [55]), however, this is mostly based on full intensity and/or static expressions. A potential research question to investigate is whether females are consistently better than males at recognising facial emotional expressions or whether the advantage is more prominent at certain intensities. Together, stimuli of varying intensities of facial emotional expressions have the advantage to allow for a more specific investigation of group differences in facial emotion recognition or emotion perception.

To our knowledge, there are only a very limited number of stimulus sets including varying intensity of emotional expressions based on dynamic stimuli. One stimulus set containing computer-morphed videos for the six basic emotions at varying intensities based on morphing neutral and emotional expressions has been published (the Emotion Recognition Task, ERT; [40]) and two true video stimulus sets including varying intensities; the Database of Facial Expressions, DaFEx [56]; and the MPI Facial Expression Database [49]. The DaFEx [56] includes three intensity levels of expression for the six basic emotions, however, is limited in the use for emotion recognition research from faces, as emotions are expressed not only facially but also with body posture providing further cues useful for decoding. Additionally, the stimuli vary in length by up to 20 seconds. The MPI contains 55 facial expressions of cognitive states (e.g. remembering) and five basic emotions with context variations at two intensity levels each. However, in the MPI the people expressing the emotions facially (encoders) wear a black hat with several green lights on for head tracking purposes which also covers facial features generally visible in social interactions (e.g. hair)–only the face is visible—all of which lowers ecological validity. Validation data have only been published for the high intensity expressions, not low intensity. None of the two sets of facial emotion expressions have been standardised by FACS-coding. To date, there is no published and validated stimulus set containing video recordings of facial emotional expressions including varying intensities of expressions of the six basic and also complex emotions. It is advised to include a wide range of emotions in order to increase ecological validity.

The present research searched for a video stimulus set of basic and complex emotional expressions and applied the following criteria to increase ecological validity: 1) the stimuli need to be in colour, 2) standardisation of the encoders’ emotional expressions based on FACS by certified coders, 3) having the whole head, but not the rest of the body of the encoder visible in the videos, 4) have no utterances included to avoid distraction and unwanted further cues to the emotion, 5) a large number of encoders per emotion included, and 6) containing a wide range of emotional expressions.

The ADFES [33] was identified to meet the criteria outlined above. The ADFES is comprised of 12 Northern European encoders (7 male, 5 female) and 10 Mediterranean encoders (5 male, 5 female) expressing six basic emotions and three complex emotions facially: contempt, pride, embarrassment, as well as neutral expressions. The videos are all 5.5 seconds in length, and there are versions of each video with encoders facing direct into the camera and also from a 45° angle. It is a clear advantage of the ADFES that it contains facial expressions of not only the six basic but also three complex emotions (next to neutral), especially for application in facial emotion recognition research. A low number of emotions may not accurately assess emotion recognition. It has been suggested that a low number of possible emotion response alternatives could constitute a discrimination task, since the participant is asked to discriminate between low numbers of options, which can be mastered by applying exclusion criteria at the cost of the results’ validity [57]. For example, if only one positively valenced emotion is included (usually happiness), then the simple discrimination between positive and negative can lead to full accuracy for happiness rather than actually recognising happiness [58]. However, if the six standard basic emotions and complex emotions (e.g. pride), are included, that increases the likelihood of having more than one positively valenced emotion included in the task and thereby demand more recognition than discrimination ability. A wider range of emotions included in a stimulus set therefore not only aids ecological validity, also the results’ validity. However, since the ADFES videos only display high intensities of emotional expressions at their endpoint, the current research aimed to create and validate an expanded standard video stimulus set that includes both high intensity and more subtle expressions on the basis of the Northern European and face forward videos from the ADFES; the Amsterdam Dynamic Facial Expression Set—Bath Intensity Variations (ADFES-BIV).

The videos were edited to display three levels of intensity of facial emotional expression: low, intermediate, and high. The general aim of the current study was to test the validity of the ADFES-BIV, with results showing this new video set has good validity in order to be a useful tool in emotion research. It was aimed to validate these intensity levels as distinct categories on the basis of their accuracies by replicating the pattern from static images, i.e. low intensity expressions would have lower accuracy rates than the intermediate intensity expressions, which themselves would have lower accuracy rates than the high intensity expressions. Furthermore, it was expected that the intensity levels would differ on response latencies with the same pattern of effect as the accuracy rates with fastest responses to high intensity expressions, as they portray the emotions most clearly.

It was further aimed to test the ADFES-BIV’s emotion categories and the emotions at each intensity level for validity on the basis of raw hit rates as well as unbiased hit rates (Hu; [59]). It was expected that all categories would have recognition rates significantly above chance level on raw and unbiased hit rates. It was expected that the recognition rates would vary between emotions and the influence of expression intensity was investigated. That is, are emotions that are easy to recognise at high intensity (i.e. surprise, happiness) also easy to recognise at low intensity with accuracies comparable to the accuracies at high intensity? It was further expected that the emotions with highest recognition would also be the emotions with fastest responses and the emotions with lowest recognition the ones with slowest responses.

Study 1

Method

Participants.

The sample consisted of 92 adult participants (41 male, 51 female) recruited from the University of Bath community. Participants were aged between 18 and 45 (M = 23.25, SD = 5.65) and all had normal or corrected-to-normal vision. Participants were enrolled as Undergraduate students (n = 54), Masters students (n = 8), PhD students (n = 22), and staff at the University (n = 8). Undergraduate participants from the Department of Psychology at the University of Bath gained course credit for participation, while all other participants were compensated with £5 for participation. None of the participants reported a clinical diagnosis of a mental disorder. The data on sex differences based on the ADFES-BIV will be reported elsewhere.

Stimuli development.

Permission to adapt the videos was obtained from one of the authors of the ADFES (Fischer A. Personal communication. 19 April 2013). The 10 expressions included in the present research were anger, contempt, disgust, embarrassment, fear, happiness, neutral, pride, sadness, and surprise. Each of the emotions was expressed by 12 encoders; 7 male and 5 female. For each of the 120 videos from the Northern European set (12 encoders x 10 expressions) three new videos displaying three different stages of expression: low, high, and intermediate were created by extracting consecutive frame sequences starting with a neutral frame (i.e. blank stare). From the neutral expression videos, three different sequences were extracted as well to obtain an equal number of videos per category. Additionally, 10 videos were created for one encoder from the Mediterranean set of the ADFES [33] to be used as an example display of each emotional expression included in the set. This led to a total of 370 videos. The length of each of the videos was set to 26 frames with a frame rate of 25/sec, consistent with the original ADFES [33]. The resulting videos were all 1040ms in length. Since apex of facial expression is generally reached within 1 second for basic emotions (Hara & Kobayashi as cited by [45]), this timing allowed for all videos to start with a neutral facial expression and to continue until the end of the expression. This is more ecologically valid, since outside the laboratory people get to see neutral expressions as a point of reference [43].

The three intensity levels created with the current research were defined by adopting the operationalization of Bould and Morris [34]. Accordingly, subtle expressions of face emotion were more ambiguous in nature [51], whereas the high intensity expressions were generally more unambiguous. A similar approach to create the varying intensities of facial emotional expression was followed as was carried out in Bould and Morris [34], where the unfolding facial emotional expressions, from neutral expression to full intensity, were truncated. Precisely, for each new video, the desired frame for each stage of expression corresponding to the appropriate intensity level (e.g. low, intermediate, and high) was selected, and then 25 consecutive frames were included prior to that frame. This created videos of equal length but with the last frame corresponding to the relevant level of emotional intensity. The different intensities reflected the spatial distance of the facial features in comparison to their location within the neutral expression, based on the degree of contraction of the relevant facial muscles for that emotional category. The category of ‘low’ intensity included subtle expressions where only limited degrees of action units in the face are visible. The ‘high’ intensity category included the apex of the emotional expressions within the videos, which involved the maximal contraction of all the relevant facial action units and the greatest spatial distance of facial features from where they appeared in the neutral expression. The ‘intermediate’ category was chosen to be as distinct as possible from the low and high intensity points in the videos, with the muscle contraction and movement of the facial features being mid-point between those intensities (see Fig 1). The individual in this manuscript has given written informed consent (as outlined in PLOS consent form) to publish his photograph. The categorisation of the videos as being low, intermediate, or high intensity was subsequently confirmed with a study asking participants (N = 30; 50:50 sex ratio) to judge the intensity of the facial expressions on a visual analogue scale ranging from very low to very high intensity (respective to 0–100%). Ethical approval for this study was given by the University of Bath Psychology Ethics Committee and all participants gave written informed consent prior to participation. On average, the perceived intensity of the low intensity videos was 42% (SD = 9.80), 55% for the intermediate intensity videos (SD = 8.54), and the perceived mean intensity of the high intensity videos was 66% (SD = 7.05), whereas the neutral faces were rated with 9% on average (SD = 9.88). The differences between the categories were statistically significant as identified by paired sample t-tests (neutral-low: t(29) = -15.14, p < .001; low-intermediate: t(29) = -16.62, p < .001; intermediate-high: t(29) = -21.23, p < .001). The linear increase of perceived intensity with increasing intensity level is in line with reports from previous research (e.g. [60]).

Download:

Fig 1. Neutral expression, last frame of disgust at low, intermediate, and high expression intensity (from left to right).

https://doi.org/10.1371/journal.pone.0147112.g001

Facial emotion recognition task.

The psychological stimuli presentation software E-Prime 2.0 [61] was used to display the emotion recognition experiment and record responses. The participants began the experiment with an affective state assessment on a 5-point Likert scale rating valence and arousal using the non-verbal Self-Assessment Manikin (SAM; [62]) before and after watching a neutral clip; a short documentary about fertilisers (4 minutes and 18 seconds). This clip aimed to settle the participants for the experiment in case any strong and distracting feelings might have been present. The ratings on valence (before: M = 3.76, SD = 0.75, after: M = 3.41, SD = 0.71, Z = -4.33, p < .001) and arousal (before: M = 1.93, SD = 0.82, after: M = 1.78, SD = 0.81, Z = -1.99, p = .047) changed significantly from before to after the neutral clip, so that afterwards the mood of participants was ‘neutral’. Participants then completed 10 practice trials of the emotion recognition task, which included examples of all 10 emotional expressions of one encoder from the Mediterranean set of the ADFES [33] to familiarise participants in general with the task procedures. The 10 example videos of the Mediterranean encoder did not appear again in the experiment; only the Northern European set was used for validation. The answer screen was presented to participants before the practice started to familiarise them with the answer categories, and to ensure they understood all emotion terms. If participants did not understand a category, then definitions were provided from the Oxford English Dictionary. The answer screen contained a list of all the emotion category choices of anger, contempt, disgust, embarrassment, fear, happiness, neutral, pride, sadness, surprise. The answer choices were equally distributed over the screen in 2 columns and 5 rows and appeared in alphabetical order. The answer screen always appeared the same way throughout the experiment to avoid participants having to search for a term and thereby biasing response times. The mouse position was not fixed which allowed for proximity to answer categories to differ between trials. Participants used the mouse to click on their chosen answer on the screen when they made their decision, and were asked to respond immediately. Instruction was given to have their hand always on the mouse, so that when the answer slide appeared they could click immediately. The experiment consisted of 360 trials (12 encoders x 10 expressions x 3 intensities) presented in random order for each participant.

Each trial started with a fixation cross for 500ms prior to the presentation of each emotion video. After each video a blank slide appeared for 500ms followed by the answer screen. An infinite response time was chosen to avoid restricting participants in their answer time producing trials with no response. An accidental mouse click outside the valid answer choice fields on the response screen prompted it to re-appear to further avoid missing responses. No feedback about their answer was provided. The mouse cursor only appeared for the emotion labelling display within trials. This way, it could not serve as distraction from the video on the screen. The resolution for the experiment was set to 1024 x 768 on a 21-inch monitor and participants were seated approximately 60cm from the computer screen, which allowed the face stimuli to appear in approximately full-size (1024 x 768) to the participant similar to face-to-face social interactions outside the laboratory.

Questionnaires.

Two different questionnaires were included: the Symptom-Checklist 90-R (SCL-90-R; [63]) and the Autism Spectrum Quotient (AQ; [64]). The AQ data was not used in the analyses here and will be presented elsewhere. The SCL-90-R served as screening instrument for potential clinical disorders since a healthy sample was desired. The Global Severity Index (GSI), i.e. the SCL-90-R sum score, was used as a distress index and marker for potential caseness according to the developer’s suggestion of a GSI score equivalent to a T score of 63. Twenty-six participants scored outside the normal range for non-patients. To test whether or not those individuals influenced the overall accuracy of response, analyses were run with and without these participants included. Only minor changes of less than 1% resulted (total accuracy without ‘cases’ = 69.47% vs. with ‘cases’ = 68.81%); a 1-sample t-test did not identify the difference as significant (p > .05). Therefore, results are presented including those participants.

Procedure.

The testing session was conducted in a quiet laboratory at the University of Bath, with between one and of four participants tested simultaneously. All participants underwent the computer task followed by the questionnaires with participants wearing headphones throughout the testing. Participants were fully debriefed on completion of the study and compensated for participation or granted course credit where applicable. Ethical approval for this study was given by the University of Bath Psychology Ethics Committee and all participants gave written informed consent prior to participation.

Dependent variables

DV 1: Raw hit rates referred to the percentage correct out of the total number of trials for each category. Since there were 10 answer choices on each trial, the chance level of response was 10%. Recognition rates above 10% were therefore necessary for a category to be considered recognisable at a greater than chance level of responding. No data were excluded.

DV 2: Unbiased hit rates (Hu) were calculated based on the formula provided by Wagner [59] for each emotion category and each emotion category across all intensity levels. The formula corrects the accuracy rates by the possibility of choosing the right emotion label by chance as well as answering habits and produces so called ‘unbiased hit rates’, which additionally have the advantage of making the results comparable across studies. In facial emotion recognition tasks where multiple answer choices are provided, there is the possibility for the participant to choose the correct emotion label by chance, which biases the accuracy rates (percentage correct out of all presentations for a category). In addition, answering habits can occur, which also bias the results. An extreme example for such an answering habit would be that a participant assigns one specific emotion category to any sort of emotional display, e.g. always surprise for all fear and surprise displays. This would result in a perfect score for surprise, but does not reflect the ability to recognise surprise, as all fear presentations would be misattributed as surprise. To account for those potential biases, Elfenbein and Ambady [65] advised to use the formula proposed by Wagner [59] for multiple choice facial emotion recognition tasks. No data were excluded.

DV 3: Response time referred to the time participants took to respond from the moment the answer screen was presented until the participant clicked the mouse on their answer choice. Mean response times were computed for each intensity level and emotion category. Only trials with correct responses were used in these analyses.

Results

DV 1: Raw hit rates.

The overall accuracy for the task was 69% (SD = 9.02). Taken together, all low intensity videos had a mean raw hit rate of 56% (SD = 11.11), 68% for intermediate intensity (SD = 10.51), and 75% for high intensity (SD = 9.94).

Most of the emotion categories were non-normally distributed with some left- and some right-skewed according to the histograms, and transformations did not normalise the data. Due to the robustness to normality violations [66,67], repeated measures ANOVA was conducted. A 3 (intensities) x 9 (emotions) repeated measures ANOVA with Greenhouse-Geisser adjustment of degrees of freedom was applied due to violation of Sphericity. Neutral was excluded from this analysis, since it does not have varying intensities of expression. There was a significant main effect of intensity (F(1.72, 156.62) = 491.80, p < .001, partial η² = .844, power = 1.000) and pairwise comparisons showed the intensity levels were all significantly different from each other (p’s < .001) (see Fig 2).

Download:

Fig 2. Raw and unbiased hit rates in percentages for the 3 intensity levels.

Error bars represent standard errors of the means.

https://doi.org/10.1371/journal.pone.0147112.g002

The main effect of emotion was significant (F(5.72, 520.43) = 94.81, p < .001, partial η² = .510, power = 1.000) (see Fig 3). Pairwise comparisons showed that the raw hit rates of the emotion categories were significantly different from each other (p’s < .028) with only a few exceptions; disgust and embarrassment did not differ significantly from each other (p = .856), as so disgust and fear (p = .262), and embarrassment and fear (p = .281). The means and standard deviations of the raw hit rates for the 9 emotion categories and neutral are presented in Table 1.

Download:

Fig 3. Raw and unbiased hit rates in percentages for the 10 emotion categories.

Error bars represent standard errors of the means.

https://doi.org/10.1371/journal.pone.0147112.g003

Download:

Table 1. Raw Hit Rates (H) and Unbiased Hit Rates (Hu) for the 10 Emotion Categories.

https://doi.org/10.1371/journal.pone.0147112.t001

The intensity x emotion interaction was significant (F(10.99, 999.93) = 20.14, p < .001, partial η² = .181, power = 1.000) (see Fig 4). Pairwise comparisons showed that the raw hit rates of the intensity levels within each emotion were significantly different from each other for most of the emotions (p’s < .037) except for sadness where the intermediate intensity was not significantly different from the high intensity (p = .154) and surprise where the low intensity was not significantly different from the intermediate intensity level (p = .103). Pairwise comparisons were conducted comparing the raw hit rates of the emotions to each other within each intensity level. Most emotions were significantly different from each other (p’s < .042). At low intensity anger was not significantly different from disgust (p = .709), as so contempt and pride (p = .411), embarrassment and fear (p = .095), and happiness and sadness (p = .174). At intermediate intensity anger and sadness were not significantly different from each other (p = .190), as so disgust and embarrassment (p = .399), disgust and fear (p = .364), embarrassment and fear (p = .950), and happiness and surprise (p = .210). At high intensity anger and embarrassment were not significantly different from each other (p = .840), as so disgust and fear (p = .979), embarrassment and sadness (p = .695), and happiness and surprise (p = .256). Table 2 shows the means and standard deviations of the raw hit rates for each emotion category at each intensity level.

Download:

Fig 4. Raw hit rates in percentages for the 9 emotion categories at each of the 3 intensity levels.

Error bars represent standard errors of the means.

https://doi.org/10.1371/journal.pone.0147112.g004

Download:

Table 2. Raw Hit Rates (H) for the Emotion Categories by Intensity.

https://doi.org/10.1371/journal.pone.0147112.t002

One sample t-tests were conducted to test if the raw hit rates for each of the 27 categories were significantly different from chance level of responding (10%). One sample t-tests showed that with a Bonferroni-corrected p value of .002 all categories were recognised above chance (t(91)’s > 6.29, all p’s < .001).

DV 2: Unbiased hit rates.

The overall accuracy of response (unbiased hit rates) for the 360 videos was 53% (SD = 11.28). The low intensity videos collapsed across emotions had an unbiased hit rate of 43% (SD = 11.51), 54% for intermediate intensity (SD = 12.00), and 63% for high intensity (SD = 12.54).

Most of the emotion categories were non-normally distributed with some left- and some right-skewed according to the histograms. Transformations did not normalise the data, but repeated measures ANOVA were conducted because it is robust to normality violations [65, 66]. A 3 (intensities) x 9 (emotions) repeated measures ANOVA was conducted with Greenhouse-Geisser adjustments of degrees of freedom due to violation of Sphericity. There was a significant main effect of intensity (F(1.84, 167.49) = 319.62, p < .001, partial η² = .778, power = 1.000) and pairwise comparisons showed that the intensity levels were all significantly different from each other (p’s < .001) (see Fig 2).

The main effect of emotion was significant (F(4.96, 451.29) = 61.49, p < .001, partial η² = .403, power = 1.000) (see Fig 3). Pairwise comparisons showed anger, disgust, embarrassment, fear, and happiness were not significantly different from each other (p’s > .172); all other categories were found significantly different from each other (p’s < .001). The means and standard deviations of the unbiased hit rates for the 9 emotion categories and neutral are presented in Table 1.

The intensity x emotion interaction was significant (F(10.71, 974.81) = 11.14, p < .001, partial η² = .109, power = 1.000) (see Fig 5). Pairwise comparisons were conducted to examine the unbiased hit rates for each emotion for the three intensity levels. For most of the emotions significant differences were found (p’s < .014); only for disgust the accuracies at low and intermediate intensity were not significantly different (p = .414). Pairwise comparisons were conducted comparing the unbiased hit rates of the emotions to each other within each intensity level. Most emotions were significantly different from each other (p’s < .042). At low intensity anger was not significantly different from embarrassment (p = .705), fear (p = .590), and happiness (p = .086), as so embarrassment and fear (p = .885), embarrassment and happiness (p = .182), and fear and happiness (p = .287). At intermediate intensity anger was not significantly different from fear (p = .072) and happiness (p = .899), disgust was not significantly different from embarrassment (p = .660) and fear (p = .250), embarrassment was not significantly different from fear (p = .433), and sadness and surprise were not significantly different from each other at intermediate intensity (p = .114). At high intensity anger was not significantly different from embarrassment (p = .128), fear (p = .581), and happiness (p = .191), disgust was not significantly different from fear (p = .529), embarrassment was not significantly different from happiness (p = .543) and sadness (p = .851), as so fear and happiness (p = .083), and happiness and sadness (p = .384). Table 3 shows the descriptive statistics of the unbiased hit rates for each emotion at each intensity level.

Download:

Fig 5. Unbiased hit rates in percentages for the 9 emotion categories at each of the 3 intensity levels.

Error bars represent standard errors of the means.

https://doi.org/10.1371/journal.pone.0147112.g005

Download:

Table 3. Unbiased Hit Rates (Hu) for the Emotion Categories by Intensity.

https://doi.org/10.1371/journal.pone.0147112.t003

One sample t-tests were conducted to test if the unbiased hit rates for each of the 27 categories were significantly different from chance level (10%) and showed that with a Bonferroni-corrected p value of .002 all categories except for contempt at low intensity (t(91) = 2.95, p = .004) were recognised above chance (t(91)’s > 4.95, all p’s < .001).

DV 3: Response times.

Inspection of the Shapiro-Wilk statistics revealed the response time data for the intensities (correct trials only) were non-normally distributed (low: S-W = .92, df = 92, p < .001; intermediate: S-W = .93, df = 92, p < .001; high: S-W = .93, df = 99, p < .001) and therefore normalised using log transformation (low: S-W = .98, df = 92, p = .164; intermediate: S-W = .99, df = 92, p = .480; high: S-W = .99, df = 99, p = .528).

A repeated measures ANOVA for intensity with its three levels and revealed a main effect of intensity (F(2, 182) = 120.38, p < .001, partial η² = .569, power = 1.000). Pairwise comparisons showed, the intensities were all significantly different from each other (p’s < .001). It took the participants about 100ms longer to respond to low intensity (M = 1075ms, SD = 330) than to intermediate intensity (M = 953ms, SD = 296) and to intermediate intensity also about 100ms longer than to high intensity (M = 850ms, SD = 240) (see Fig 6).

Download:

Fig 6. Response latencies (in ms) to the three intensity levels of the ADFES-BIV videos from study 1 and the first-last videos from study 2.

Error bars represent standard errors of the means.

https://doi.org/10.1371/journal.pone.0147112.g006

The mean response times for each emotion category were calculated based on correct responses only: neutral (M = 664ms, SD = 145), happiness (M = 816ms, SD = 280), surprise (M = 849ms, SD = 285), sadness (M = 926ms, SD = 370), anger (M = 992ms, SD = 385), disgust (M = 994ms, SD = 451), fear (M = 1156ms, SD = 413), embarrassment (M = 1022ms, SD = 429), pride (M = 1023ms, SD = 937), contempt (M = 1700ms, SD = 1054) (see Fig 7).

Download:

Fig 7. Response latencies to the ten emotion categories of the ADFES-BIV in ms.

Error bars represent standard errors of the means.

https://doi.org/10.1371/journal.pone.0147112.g007

Discussion

Study 1 has shown, as hypothesised, that the mean accuracies for the emotion categories at each intensity level were well above chance for the 27 categories of the raw hit rates, as well as the unbiased hit rates with the exemption of the unbiased hit rate of contempt at low intensity. In line with the prediction, the emotion categories differed in accuracies and response latencies with emotions of high recognition also yielding fast responses and vice versa. Results also showed differences between levels of intensity of the expressions in both, the unbiased hit rates and raw hit rates, with the lowest mean accuracy for the low intensity expressions (raw: 56%, unbiased: 43%) compared to the intermediate intensity expressions (raw: 68%, unbiased: 54%), which were lower than the high intensity expressions (raw: 75%, unbiased: 63%). The current study further found that the intensity of facial expressions has an influence on response times, as significantly faster responses of about 100ms occurred between the intensities. Fastest responses were given to high intensity expressions and slowest responses to low intensity expressions, also in line with the prediction.

The differences in accuracies and response latencies across intensity levels can be explained by varying difficulty to recognise the expressions at varying intensities of expression. That is, facial emotional expressions of high intensity are easier to recognise than those of low intensity which reflects in higher accuracy and faster responses to high intensity expressions and lower accuracy and slower responses to low intensity expressions. Facial emotional expressions are harder to recognise at lower intensities because those expressions contain fewer cues that can be used for decoding.

However, there were differences between the intensities in display time of the emotional expressions seen by participants. In the low intensity videos of the ADFES-BIV the emotional expression was visible for less time than in the intermediate and high intensity videos, and the intermediate intensity videos had the expression displayed for less time than the high intensity videos. The resulting differences in processing time could be underlying the results, rather than the intensity of the facial expressions. To address this issue, versions of each video from the ADFES-BIV were created such that the last frame of the emotion was visible for exactly the same amount of display time across low, intermediate, and high intensity. Therefore, if the amount of time the expression was displayed across intensity levels was causing differences in accuracy rates and response latencies, these differences across intensity levels should be lost with this variation of the videos as the display times were equated. Instead, if it is the degree of intensity that is important rather than the amount of time the expression is displayed, then the same differences in accuracy rates and response latencies should be evident across the different intensity levels similar to Study 1.

Study 2

Study 2 aimed to validate the results from study 1 that the intensity levels differ from each other in accuracy and response latencies by controlling for exposure time. A first-last approach was chosen for developing the control stimuli of the ADFES-BIV where the first and last frame of the videos are shown. Although this means that the progression and temporal characteristics of the individual expressions are lost, the perception of motion remains however due to the change from neutral to emotional facial expression. Since temporal characteristics are argued to be part of emotion representation [36] and therefore aid recognition [35, 68], the first-last approach leads to lower accuracy rates than complete dynamic sequences (see [34, 68]). Therefore, lower accuracies were expected for the control stimuli than for the ADFES-BIV, but with the same pattern of recognition and response times: highest accuracy rates and fastest responses to the high intensity videos, lowest accuracy and slowest responses to the low intensity videos.