Spontaneous Facial Mimicry Is Enhanced by the Goal of Inferring Emotional States: Evidence for Moderation of “Automatic” Mimicry by Higher Cognitive Processes

Aiko Murata; Hisamichi Saito; Joanna Schug; Kenji Ogawa; Tatsuya Kameda

doi:10.1371/journal.pone.0153128

Abstract

A number of studies have shown that individuals often spontaneously mimic the facial expressions of others, a tendency known as facial mimicry. This tendency has generally been considered a reflex-like “automatic” response, but several recent studies have shown that the degree of mimicry may be moderated by contextual information. However, the cognitive and motivational factors underlying the contextual moderation of facial mimicry require further empirical investigation. In this study, we present evidence that the degree to which participants spontaneously mimic a target’s facial expressions depends on whether participants are motivated to infer the target’s emotional state. In the first study we show that facial mimicry, assessed by facial electromyography, occurs more frequently when participants are specifically instructed to infer a target’s emotional state than when given no instruction. In the second study, we replicate this effect using the Facial Action Coding System to show that participants are more likely to mimic facial expressions of emotion when they are asked to infer the target’s emotional state, rather than make inferences about a physical trait unrelated to emotion. These results provide convergent evidence that the explicit goal of understanding a target’s emotional state affects the degree of facial mimicry shown by the perceiver, suggesting moderation of reflex-like motor activities by higher cognitive processes.

Citation: Murata A, Saito H, Schug J, Ogawa K, Kameda T (2016) Spontaneous Facial Mimicry Is Enhanced by the Goal of Inferring Emotional States: Evidence for Moderation of “Automatic” Mimicry by Higher Cognitive Processes. PLoS ONE 11(4): e0153128. https://doi.org/10.1371/journal.pone.0153128

Editor: Elisabetta Palagi, University of Pisa, ITALY

Received: October 8, 2015; Accepted: March 24, 2016; Published: April 7, 2016

Copyright: © 2016 Murata et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: The study design, data collection and analysis were supported by Japan Society for the Promotion of Science, Grant-in-aid for Scientific Research on Innovative Areas (#25118004, PI: Tatsuya Kameda). Preparation of the manuscript was supported by Japan Society for the Promotion of Science, Topic-Setting Program to Advance Cutting-Edge Humanities and Social Sciences Research (PI: Tatsuya Kameda).

Competing interests: The authors have declared that no competing interests exist.

Introduction

In human societies there is a continual need to coordinate and cooperate with other non-kin individuals in a wide range of social settings. To coordinate effectively with others while minimizing the potential risk of exploitation, individuals must accurately understand the intentions and emotions of others [1]. While in some instances intentional effort is required to infer the thoughts and feelings of others, in many cases it seems that people can understand each other’s feelings quickly and effortlessly [2]. Spontaneous facial mimicry is considered a key process in the quick and effortless understanding of others’ feelings, as well as in the fostering of bonding with partners [3], along with other forms of physiological mimicry such as synchronization of heartbeat [4] and pupil diameter [5].

Spontaneous facial mimicry could be induced by motor resonance mechanisms grounded in automatic perception-action coupling in the sensorimotor regions [6]. The discovery of “mirror neurons” in monkeys, which are activated during both action observation and production [7], as well as human brain regions having similar properties [8], have given neurophysiological support to direct action-perception matching. In line with these neural findings, a number of studies suggest that facial mimicry may be an automatic, fast, reflex-like mechanism beyond intentional control. For instance, Dimberg and colleagues [9] and Bailey and Henry [10] showed that participants exhibited facial mimicry even when facial stimuli were presented subliminally, and that participants’ muscular movements started within 500ms after stimulus onset [9]. These results suggest that the process occurs largely outside of conscious control. The occurrence of rapid mimicry has also been recently demonstrated in many other non-human mammals, including apes [11, 12], monkeys [13], and dogs [14]. It is also observable quite early in development, as early as the neonatal stage in humans [15] as well as in chimpanzees [16], suggesting that facial mimicry occurs automatically as a reflex-like reaction.

Recently, however, several studies on humans have demonstrated that facial mimicry is affected by socio-ecological factors, including the social relationship between the sender and the receiver, group membership, and so on (see [17] for a recent comprehensive review). For example, Bourgeois and Hess showed that people tended to mimic the facial expressions of in-group members more frequently than those of out-group members [18]. Hofman and colleagues also demonstrated that facial mimicry was affected by the target’s reputation for fairness: compared to a baseline, participants exhibited greater facial mimicry when angry faces of unfair opponents were shown, while mimicry decreased when angry faces of fair opponents were shown [19].

These findings indicate that spontaneous facial mimicry may be moderated by the observers’ tasks or relational goals in social contexts [3, 17]. For example, correctly identifying the emotional states of in-group members, with whom we exchange key resources regularly, is presumably more important than understanding the emotional state of out-group members, with whom we are likely to have little or no contact. Likewise, when interacting with individuals known to have engaged in unfair or dishonest behavior, the necessity for vigilance against potential exploitation and aggression is enhanced. Thus it may be the case that mimicking the negative emotions of unfair targets may prepare us to guard against a potentially exploitative interaction, while matching the anger of fair individuals may be damaging to a potentially beneficial interaction. Although this interpretation is highly speculative, such differential incentive structures related to specific socio-ecological contexts [20, 21] may have contributed to the differential mimicry levels of negative emotions between the fair and unfair individuals observed by Hofman and colleagues [19].

Here we investigate the hypothesis that spontaneous facial mimicry may be moderated by the observer’s goal of understanding a target’s emotional state. Although it has been demonstrated that blocking observers’ facial muscle activity impairs their ability to recognize a target’s expressed emotions [22], there have been few studies that directly address the adjustment of mimicry level in response to the specific goal of understanding another’s emotional state. The only exception, as far as we know, is a study by Cannon, Hayes and Tipper [23], in which participants were explicitly asked to judge either the emotional states of targets (i.e., anger and happiness) or the color of tinted facial photographs. Results showed that participants exhibited greater facial mimicry when they engaged in the emotion-judgment task than in the color-judgment task. Here we aim to examine the robustness of this intriguing finding by extending the target facial stimuli to various emotional expressions beyond anger and happiness. As in Cannon and colleagues [23], we measure participants’ facial muscle activity while they view video clips of targets displaying facial expressions, but we use six target expressions rather then two: happiness, sadness, anger, disgust, fear, and surprise. In Study 1, we use electromyography (EMG) to examine the degree of facial mimicry exhibited by participants when they are explicitly instructed to infer the target’s emotional state, compared to when they receive no such instruction. In Study 2, we introduce another condition in which participants are instructed to infer non-emotional traits of the target (e.g., age, gender, physical attribute, or ethnicity) before the video presentation, and their facial muscle activity is assessed using the Facial Action Coding System (FACS), a less invasive procedure than EMG. We predict that the extent of participants’ facial mimicry will be greater when participants have the specific goal of inferring the targets’ emotional states, compared to when they receive no such instruction, or when they have another goal unrelated to emotional inference.

Study 1

Materials and Methods

Ethics statements.

Study 1 and Study 2 were both approved by the Institutional Review Board of the Center for Experimental Research in Social Sciences at Hokkaido University. Written informed consent was obtained from all participants before beginning the task.

Participants.

Fifty-two Japanese student volunteers (26 females and 26 males; mean age: 19.2 ± 1.1 years) at Hokkaido University in Sapporo participated in this experiment and received 1,000 yen (approximately US$10 at the time) as compensation for their participation. Electromyographic (EMG) data from two participants were excluded due to equipment failure, yielding a total of 50 participants (25 females and 25 males) for analysis.

Stimuli.

Twenty-four morphing video clips of emotional facial expressions were presented to each participant. Morphing video clips were created using facial photos of eight Japanese targets (4 females and 4 males with ages ranging from mid-20s to mid-30s) from the ATR Facial Expression Image Database DB99 (ATR-Promotions, Inc.). For each of six types of emotional expressions (happiness, sadness, anger, disgust, fear, and surprise), participants saw 4 video clips of two female and two male targets (see S1 Table for details about how the eight target persons were assigned to the six types of emotional expressions).

Facial EMG.

EMG recordings were performed while participants viewed the stimuli. Facial EMG was measured on the left side of the face, which has been shown to exhibit a higher mimicry rate as compared to the right side [24]. As shown in Fig 1, electrodes were placed according to the standard procedure [25]. The activities of four muscles of interest (see Fig 1) were measured using Ag / AgCl miniature surface electrodes (EL254S, BIOPAC Systems Inc.) with electrolyte gel (Elefix, Nihon Kohden). The skin was cleansed with disinfectant alcohol and pumice gel (Skin Pure, Nihon Kohden). An AcqKnowledge System with a band pass filter was used to exclude EMG signals outside the relevant range of 1-500Hz. The EMG signals were sampled at 200Hz, integrated with 12.5Hz, then rectified and averaged over 100ms intervals.

Download:

Fig 1. EMG electrode placement and emotional expression measurement in Study 1.

Activity of the Zygomaticus major was measured to assess smiling (related to happiness); activity of the Corrugator supercilii was measured to assess frowning (related to anger, disgust, sadness, and fear); activity of the Levator labii Superioris was measured to assess upper lip raising (related to disgust); and activity of the Lateral frontalis was measured to assess eyebrow raising (related to surprise).

https://doi.org/10.1371/journal.pone.0153128.g001

Procedure.

Participants were randomly assigned either to the Passive condition or the Emotion-Inference condition. After arriving in the laboratory, each participant was taken to a soundproof room and seated in front of a computer. The participant’s face was video-recorded using a camera mounted on the left side of the computer monitor (Qcam Orbit AF, Logitech) throughout the tasks, in order to determine whether facial or body movements irrelevant to facial expression (e.g., yawning, blinking) occurred.

The experimental task consisted of eight blocks, each containing between two and four trials. In each block, various emotional expressions (one per trial) of the same target (see S1 Table) were presented sequentially (see Fig 2). At the beginning of each block, participants were shown an introductory image for 5000ms, which consisted of a photo of the target’s smiling face and a self-introduction text in Japanese for the target (e.g., “My name is Hashimoto”), to familiarize participants with the target’s face before viewing the video clips in the following trials. The order of blocks and trials within each block were counterbalanced across participants.

Download:

Fig 2. Task flow in Study 1.

At the beginning of each block, an introduction-picture was presented, followed by 2–4 trials. (A) In the Emotion-Inference condition, the instruction “How does XXX (e.g., Hashimoto) feel?” was presented in Japanese. (B) In the Passive (control) condition, a fixation cross was presented. Reprinted from the ATR Facial Expression Image Database DB99 under a CC BY license, with permission from ATR-Promotions Inc., original copyright (2006).

https://doi.org/10.1371/journal.pone.0153128.g002

In the Emotion-Inference condition (see A in Fig 2), each trial started with a display of instructions in Japanese, which lasted for 2000ms, explicitly asking participants to infer the emotion felt by the target (e.g., “How does Hashimoto feel?”). In the Passive condition, a fixation cross was presented for 2000ms instead of the instructions (see B in Fig 2). Next, a video clip began with the first frame showing a neutral facial expression lasting for 2000ms, followed by a morph from a neutral to an emotional expression, during which the target’s facial expression changed gradually from neutral to full over 1000ms and stayed at the full expression for the remaining 900ms. After this, a blank screen was shown between trials for 5000ms. Thus, except for the instructions, the procedure and stimuli used were identical across the two conditions.

Data treatment and analysis.

In each trial, the EMG data collected during the 4000ms after the start of the video clip (3900ms for the duration of the clip plus an extra 100ms at the beginning of the waiting period) were z-transformed within each participant and each muscle to permit comparison of activities between the four muscles (see Fig 1). For each trial, the response window was the 1900ms interval during which the face changed from a neutral expression to a full expression. If the video of the participant’s face showed irrelevant facial activity during the response window (i.e., blinking, yawning or turning their eyes away), the associated EMG data were excluded. Because EMG wave amplitudes sometimes exhibit abnormal values due to equipment error [25], if the mean z-score of a participant’s muscle activity was deviant (i.e., more than 3 SD away) from the mean of the muscle activities averaged across all participants, the participant’s data from that muscle site were treated as missing values (though preliminary analyses including these data produced statistically the same conclusions). Each participant’s facial muscular response per trial was calculated by averaging muscle activity during the 1900ms interval after the morphing onset. If a participant mimicked the target’s facial expression, the activities of the facial muscles corresponding to the stimulus expression should be selectively enhanced. Therefore, in the following analysis, we compared the activities of “targeted muscles” used in the movements of each of the emotional-expression stimuli (see Fig 1), with the activities of “non-targeted muscles” during the response window, and we refer to this difference as the muscle “type.”

We used generalized linear mixed effects models (GLMM) to analyze EMG activity for each muscle type. Condition, muscle type (targeted vs. non-targeted) and emotion were entered as fixed effects. Because we had repeated measures from the same participants, and trials were nested within each participant, participants and trials were both treated as random effects in the models. Because facial muscular responses are measured as continuous values ranging from negative to positive, GLMMs were modeled with Gaussian distributions and fitted using the GLIMMIX procedure in SAS statistical software version 9.4 (SAS Institute, Cary, NC).

In the GLMM analysis, the models of all possible combinations of fixed factors and interactions were fitted and compared in terms of the degree of fit according to the Akaike information criterion ([26]; see S2 Table for details about the model selection). If occurrences of facial mimicry are moderated by the conditions as predicted, the best-fit model should include the interaction effect of condition (Passive vs. Emotion-Inference) and muscle type (targeted vs. non-targeted).

Results

Fig 3 shows z-scores of EMG activity for each muscle type (targeted/non-targeted) as a function of the six emotions and two conditions. Consistent with our hypothesis, targeted muscle activity was generally higher than non-targeted in the Emotion-Inference condition, while no such effects were evident in the Passive condition. The GLMM analysis supported this observation: the best-fit model (see S2 Table and S1 Fig for details about the model selection), contained the expected condition x muscle type interaction effect (F_{3, 4580} = 8.71, p < .0001; see S3 Table for parameter coefficients of the selected model). The effect of emotion was also significant (F_{5, 4580} = 2.62, p = .023), indicating that the magnitudes of muscular responses were different across the six emotions.

Download:

Fig 3.

Z scores of EMG activity by emotion for each muscle type (targeted or non-targeted) in (a) the Emotion-Inference condition (N = 26) and (b) the Passive condition (N = 24). Error bars represent standard error of the mean.

https://doi.org/10.1371/journal.pone.0153128.g003

To investigate in more detail how the emotion-inference goal may moderate facial mimicry, we examined changes in the EMG signals over time. Fig 4 shows the time course of EMG signals of the two types of muscle (targeted vs. non-targeted) under the two conditions. In the Emotion-Inference condition, differences in EMG activities between the two types of muscle started to emerge at about 500ms after the onset of morphs. In contrast, the activity of the targeted muscles in the Passive condition remained indistinguishable from the activity of the non-targeted muscles. This indicates that, when participants were instructed to infer the target’s emotion, facial mimicry measured as muscular EMG activity emerged rapidly, immediately after the stimulus (morphing) onset.

Download:

Fig 4.

Time course of EMG activity of targeted and non-targeted muscles from morphing onset in (a) the Emotion-Inference condition and (b) the Passive condition. The horizontal axis shows time elapsed from morphing onset (in milliseconds), and the vertical axis shows z-score of EMG amplitude for each muscle type. Error bars represent standard error of the mean at each time point.

https://doi.org/10.1371/journal.pone.0153128.g004

Study 2

We conducted Study 2 to further test the validity of the moderation effect of goal-setting on facial mimicry as observed in Study 1, with some methodological modifications. The first modification in Study 2 was the use of the Facial Action Coding System (FACS) to assess the facial expressions displayed by participants in response to stimuli. While the EMG measurements used in Study 1 are known to be a sensitive method of detecting both visible and invisible muscular activity, EMG signals may also include noise from body movements irrelevant to emotional expression (e.g., eye blinks, yawns, etc.). Indeed, given the potential for noise in the data, the number of trials per emotion used in Study 1 was relatively low compared to typical EMG emotion research [9, 18, 19], although we have addressed this problem statistically by using a multi-level model that fully captured the nested structure of the data set. Because FACS is a method to specifically code visible facial muscular movements, it allows us to accurately evaluate facial activity with less noise than EMG, as well as to examine whether the goal-dependent moderation of mimicry is evoked in externally visible facial reactions.

Furthermore, we could not completely dismiss the possibility that the effects of the goal-related instructions in Study 1 were merely caused by differences in participants’ levels of concentration between passively viewing morphs and being asked to actively attend to the target’s face, rather than the motivation to infer the target’s emotion. In other words, it is possible that the participants in the Passive condition who received no specific instruction may simply have been less engaged in the task compared with participants in the Emotion-Inference condition. To address this potential problem, the second modification in Study 2 was to ask participants to respond to questions about the facial morph stimuli in both the experimental and control conditions. For the control, we asked participants to reply to questions about external traits that were irrelevant to emotional inference (i.e., age, gender, body shape, and ethnicity) but required the same level of concentration, to ensure that they were given similar motivation to engage in the task.