Recognition of Intensive Valence and Arousal Affective States via Facial Electromyographic Activity in Young and Senior Adults

Background Research suggests that interaction between humans and digital environments characterizes a form of companionship in addition to technical convenience. To this effect, humans have attempted to design computer systems able to demonstrably empathize with the human affective experience. Facial electromyography (EMG) is one such technique enabling machines to access to human affective states. Numerous studies have investigated the effects of valence emotions on facial EMG activity captured over the corrugator supercilii (frowning muscle) and zygomaticus major (smiling muscle). The arousal emotion, specifically, has not received much research attention, however. In the present study, we sought to identify intensive valence and arousal affective states via facial EMG activity. Methods Ten blocks of affective pictures were separated into five categories: neutral valence/low arousal (0VLA), positive valence/high arousal (PVHA), negative valence/high arousal (NVHA), positive valence/low arousal (PVLA), and negative valence/low arousal (NVLA), and the ability of each to elicit corresponding valence and arousal affective states was investigated at length. One hundred and thirteen participants were subjected to these stimuli and provided facial EMG. A set of 16 features based on the amplitude, frequency, predictability, and variability of signals was defined and classified using a support vector machine (SVM). Results We observed highly accurate classification rates based on the combined corrugator and zygomaticus EMG, ranging from 75.69% to 100.00% for the baseline and five affective states (0VLA, PVHA, PVLA, NVHA, and NVLA) in all individuals. There were significant differences in classification rate accuracy between senior and young adults, but there was no significant difference between female and male participants. Conclusion Our research provides robust evidences for recognition of intensive valence and arousal affective states in young and senior adults. These findings contribute to the successful future application of facial EMG for identifying user affective states in human machine interaction (HMI) or companion robotic systems (CRS).


Introduction
Alongside the rapid and extensive development of interactive devices such as tablets and smartphones, the interaction between humans and their digital environment has become not only a technical activity, but also a form of empathic companionship; the literature characterizes this relationship as human computer interaction (HCI) or generalized human machine interaction (HMI). This type of relationship provides not only passive functionalities, but also, ideally, functions based on perception of the user's implicit current needs, responses, preferences, coherence, and intention [1][2]. This type of companionship-by-design system should be able to completely and individually adapt to the user, and reflect the user's situation and emotional states; in other words, the system should have the ability to be empathetic [3][4][5].
To achieve an ideal technical companionship system, the user's affective experience is a vital consideration; accurate and appropriate recognition of the user's emotional states is paramount to the success of this type of technology. In human-human interaction (HHI), individuals recognize emotional states using innate sensors for facial expression, gestures, eye contact, language, speed or tone of speaking, and other natural indicators [6]. A computer, of course, is unable to perceive emotional states by definition. There are numerous signals relevant to emotional responses that are, though, measurable by devices such as cameras, microphones, and sensors [7]. For example, small electrodes attached to the skin can accurately detect psychobiological changes indicative of the human emotional experience, gaining access to human emotional states. One such psychophysiological sensor, the facial EMG, represents a robust and effective method of recognizing human affective states in HMI [8][9].
Affective responses involve subjective experience, central and peripheral nervous system changes, and behaviors (e.g., facial expressions, gestures, and vocal characteristics) [10]. Facial expressions, specifically, are innate and untrained reactions to affective states [11] which enable us to recognize and communicate emotions transiently as we interact with other people. Affective science research has struggled to quantifiably measure the affective states of human beings, however [10]. There are two primary issues related to this problem (and, as such, to the present study's primary objectives). First, there is no standardized model for evaluating or interpreting emotions or affects [12][13], however, as Mauss and Robinson [10] suggested, "measures of affective responses seem to be structured along dimensions rather than discrete emotions". Second, due to the sizeable (and rapidly growing) proportion of senior adults in many societies, particularly in Western countries, companion technology is highly desirable because of a shortage of qualified healthcare personnel [9]. A meta-analytic study reviewing emotions and aging suggested that seniors are worse than younger adults in recognizing emotional states, and that there is a general declining trend in emotion recognition with age [14], though the results of this study may have not been fully valid. Aging does indeed play an essential role in detecting affective responses, however.
The tridimensional theory of emotion, which is commonly applied in studies on this subject, evaluates affective states according to valence, arousal, and dominance (VAD). This approach can be dated back to Wundt [15]. "Valence" describes affective states from highly negative (unpleasant) to highly positive (pleasant); "arousal" measures the intensity of affective states ranging from highly calm to highly excited or alert; and "dominance" represents the feeling of being controlled or influenced by external stimuli [16][17]. Lang, Rice, and Sternbach assumed that emotion is comprised of these three dimensions [18]; and research has shown that dominance is highly correlated with the dimension of valence [16]. As the proponents of this theory have suggested, the tridimensional model of emotion can be reduced to two orthogonal dimensions [19][20][21]: valence and arousal, in which all emotions can be classified. We adopted the two-dimensional theory of affective experience in this study.
A number of researchers have investigated the effects of the sole dimension of valence on facial EMG. Their findings demonstrated that facial EMG captured over corrugator supercilii (frowning muscle), which is associated with negative emotional expressions, and zygomaticus major (smiling muscle), which is related to positive emotional expressions [22][23], can differentiate valence emotions and their intensities [24]. In these studies, corrugator EMG amplitude increased in response to negative affective stimuli and decreased with positive affective stimuli compared to neutral stimuli, whereas zygomaticus EMG amplitude increased during positive stimuli [25][26][27][28][29]. In addition, zygomaticus EMG amplitude was not shown to discriminate neutral and negative emotions, thus, corrugator and zygomaticus EMG activities can be considered indicators of negative and positive affective states, respectively [25,28].
To date, the dimension of arousal has not received much attention in facial EMG emotion recognition studies. Very limited evidence from previous studies has shown, though, that zygomaticus EMG differentiates positive valence/low arousal affective states from positive valence/ high arousal affective conditions. In the participants in these studies, the zygomaticus EMG activity was lower when viewing positive valence/low arousal affective images than when viewing positive valence/high arousal ones [30], i.e., the high-arousal affective pictures elicited higher corrugator and zygomaticus EMG activities than low-arousal ones [31]. Corrugator and zygomaticus EMG could not independently differentiate the affective states in the arousal dimension [28], which may modulate facial EMG responses when participants are confronted by visual stimuli. Therefore, taking both the dimensions of valence and arousal simultaneously into account is arguably necessary for detecting their compound effects on facial EMG technology.
Research has also shown that increasing age is correlated with a general decline in physiological functions [32][33], and many previous studies have indicated lower physiological responses in senior adults than in young adults when exposed to affective stimuli [34]. Along with the decline in physiological functions, increased positive well-being, affect control, and affect regulation have been observed in senior compared to young adults [35][36][37]. Influenced by the "positivity effect", seniors typically highly favor positive stimuli, whereas negative stimuli are more recognized by young adults [38]. Conversely, senior adults in certain situations experience greater affective responses than young adults do, although evidence suggests less affective reactions for senior adults than young adults in most contexts [39]. Researchers have claimed that there are no significant age variations in either corrugator or zygomaticus EMG in relation to valence affective states [40], but researchers have identified lower overall corrugator EMG activity in response to affective pictures in senior adults, regardless of valence [41]. Another study found that increasing age is associated with decreased corrugator EMG amplitude in response to neutral stimuli [42]. Evidence for age differences in facial EMG on intensive valence and arousal affective states, however, is scarce in the literature.
In brief, the primary goal of the present study was to assess the potentially distinct effects of intensive valence and arousal affective states on facial EMG detected over the corrugator supercilii and zygomaticus major of study participants. The secondary objective was to investigate age differences in the EMG activity between senior and young adults in different affective states, and the same by gender. We employed a case study followed by classification methods and statistical analysis to achieve these goals.

Materials and Methods Participants
Seventy young adults (from 20 to 40 years of age, mean = 24.57, SD = 4.37) and 43 senior adults (from 52 to 77 years of age, mean = 64.30, SD = 7.16) participated in this study. The data from two subjects in the young group and three in the senior group were excluded from subsequent analysis due to technical problems (i.e., movements or artifacts.) The final sample size was 108 (69 females, 39 males). All participants were healthy and had normal vision or corrected normal vision. The experiment was designed and implemented according to the ethical guidelines of the University of Ulm and was approved by the university's Ethical Committee (number: 245/08-UBB/se).

Stimuli
To elicit intensive and sustained affective states, which are not easily induced by a static single image in laboratory situations, the stimuli consisted of 10 picture blocks. In each block, 10 affective images with similar rating scores on valence and arousal were combined. Thus, a total of 100 pictures chosen from the international affective picture system (IAPS) [43] and Ulm pictures [44], which are well-designed, standardized, and generally employed for eliciting emotional states with three dimensions (VAD) and are often utilized in studies on affective science or affective computing. The content of the pictures ranged from daily experiences (e.g., household furniture) intended to elicit neutral and calm emotions, to extreme encounters (e.g., severe human injuries,) that induce highly negative and arousal emotions, or to erotic images intended to produce highly positive and arousal emotions. Ten picture blocks were divided into five categories (i.e., two picture blocks with similar valence-arousal responded to one of five affective states) according to the standardized rating scores: 1) 0VLA, neutral valence (4.96 ± 0.25) and low arousal (2.75 ± 0.64); 2) NVLA, negative valence (3.74 ± 0.43) and low arousal (3.63 ± 0.51); 3) NVHA, negative valence (2.20 ± 0.64) and high arousal (6.66 ± 0.57); 4) PVLA, positive valence (7.58 ± 0.39) and low arousal (3.26 ± 0.47); and 5) PVHA, positive valence (7.00 ± 0.57) and high arousal (6.50 ± 0.51). Fig 1 shows the location of these five categories of affective states in the valence-arousal space.

Procedure
Participants were seated in a comfortable reclining chair in a sound-attenuated room of the Emotion Lab, Ulm, after being introduced to the experiment and signing an informed consent form. EMG sensors were attached to the participants' respective facial muscles, afterwards participants were asked to relax, keep stable, and pay attention to the image blocks for the duration of the experiment.
Ten affective image blocks were presented randomly on a 17-inch monitor. Each block, carrying the same probability without any repetition, was presented for 20 s, during which each of the 10 pictures was shown for 2 s continuously. There was a fixed pause of 20 s between blocks (Fig 2). Facial EMG activity was recorded throughout the experiment.

EMG Data Acquisition
A NeXus-32 physiological measurement system (NeXus-32, Mind Media, Roermond-Herten, Netherlands) running on a desktop computer was used for the acquisition of facial EMG. The software package Biobserve Spectator (version 2.4.0.5, BIOBSERVE GmbH, Bonn, Germany) was used to record the trigger and psychophysiological data. Facial EMG signals were captured with bipolar miniature silver/silver chloride (Ag/AgCl) skin electrodes 4 mm in diameter with gel-filled attach spaces. Bipolar electrodes were placed on the participants' left corrugator supercilii and zygomaticus major muscle regions (see Fig 3, EMG signal acquisition) according to the guidelines for EMG placement recommended by Fridlund and Cacioppo [45]. Facial EMG signals were recorded at a sampling rate of 512 Hz.

EMG Signal Processing and Data Reduction
As shown in Fig 3, raw facial EMG signals were filtered offline by a 20-250 Hz band-pass Butterworth filter (order = 4) to exclude motion-related components, and an adaptive filter was applied to remove the 50 Hz power line interference [46]. In order to classify the stimulated affective statesusing facial EMG, further processing methods including the empirical mode decomposition (EMD) technique and the Hilbert Spectrum (HS) were employed. Data from  single (corrugator or zygomaticus) and combined (corrugator and zygomaticus) site(s) were submitted to classification. All the processing, analyses, and machine learning were conducted using the MATLAB software package (version R2009a and R2015a, Mathworks Inc., Natick, MA, USA).

Data Classification
In order to verify the possibility of discriminating distinct affective states (0VLA, PVHA, NVHA, PVLA, NVLA) and also the baseline (forming a six-class classification problem, in effect), the following sequence of steps for data processing was applied: 1. Feature extraction from particular regions of interest of the signals. A problem we had was defining the length (in seconds) of the region of interest, as there is no specific rule for choosing this length. We assessed distinct empirical combinations as described in Table 1, where similar and distinct periods for baseline and evoked emotions were adopted based on the assumption that evoked emotion could induce early (< 2s) and/or late (> = 2 s) changes in the observed time series. In total, 16 features were estimated from each region of interest, yielding a 16-D feature vector that was properly labeled according to the possible classes (0VLA, PVHA, NVHA, PVLA, NVLA, and baseline). Because two muscles (corrugator and zygomaticus) were examined in this study, the resulting feature vector had 16 (from corrugator or zygomaticus) or 32 (from corrugator and zygomaticus) features. These features were employed according to the literature [47]. The aim of feature extraction was to capture changes in the signals related to their amplitude, frequency, predictability, and variability. Table 2 presents a summary of the employed features, and a comprehensive description of the features is provided in a previously published study [48]. The fitcecoc method available in Matlab, which fits multiclass models for support vector machines or other classifiers, was employed, and 10-fold crossvalidation was used for classifier assessment.

Fmed
To obtain the median frequency, find the value of the frequency that bisects the area below the X waveform.

Fmode
This fast Fourier transformation equation is valid for this and the following frequency features:.
Variability Var 3. We estimated the success rate of each class, defined as the number of correctly classified patterns divided by the total number patterns of the class. The entire classifier-training procedure and performance assessment was repeated 10 times to estimate the mean and the standard deviation of the rate of success.

Classification Results
The classification results are presented from Tables 3-7, where results are presented for each specific condition listed in Table 1. Table 3 describes the mean (M) and standard deviation (SD) of classification accuracies (ten-fold cross-validation) of five affective states and the baseline, based on the combined corrugator and zygomaticus EMG data. The classification accuracy reached 100% for the baseline in all seven conditions, where the five affective states ranged from 87.04% to 88.47% in 1-1, from 87.92% to 90.65% in 1-2, from 84.81% to 88.75% in 1-10-f, from 84.58% to 88.75% in 1-10-l, from 75.69% to 80.69% in 2-2, from 85.19% to 89.07% in 2-10-f, and from 85.05% to 88.47% in 2-10-l. Table 4 presents the M and SD classification accuracies of baseline and the five affective states based on the combined corrugator and zygomaticus EMG data according to age. For the baseline, the classification accuracy hit 100% for all conditions except in 2-2 in young groups; the five affective states ranged from 77.06% to 90.74% in the young group and from 86.25% to 90.25% in the senior group. The classification rates for senior adults were higher than young adults in all seven conditions.
As far as gender, as Table 5 shows, classifying the combination of corrugator and zygomaticus EMG showed similar results regardless of age; the baseline was nearly 100% in all conditions except in 2-2in the female group. The five affective states ranged from 81.18% to 90.59% in the female group and from 85.00% to 90.38% in the male group. Tables 6 and 7 show where the M and SD of classification accuracy for the baseline reached nearly 100% with either single corrugator or zygomaticus EMG; all five affective states are relatively low (from 13.19% to 69.86%).

Statistical Results
To compare the effects of single corrugator and zygomaticus and the combination of corrugator and zygomaticus EMG for affect recognition, data from Tables 3, 6 and 7 were submitted to repeated measures ANOVA analysis. The results demonstrated a significant primary effect (F (1,41) = 87.34, p< 0.001), and multiple comparisons with Bonferroni revealed significant differences between the combination of corrugator and zygomaticus and the single corrugator, the combination of corrugator and zygomaticus and single zygomaticus, and the single corrugator and zygomaticus (p < 0.001).
As far as age, data from Table 4 was subjected to the independent T-test, and results showed significant differences between young and senior participants (t = -2.67, p < 0.01, two-tailed).
Data from Table 5 was subjected to independent T-test, with no pronounced difference between female and male subjects (t = -1.58, p > 0.05, two-tailed).

Discussion and Conclusions
The interaction between humans and machines will, in future, most likely be an empathic relationship resembling the "companionship" observed in HMI research. In such a companionship, the ability to perceive a user's current physical conditions and emotional state is highly desirable [1]. In other words, the ideal companion system is able to fully empathize with its individual user [3][4][5]. To achieve this, artificial "eyes" and "ears" such as cameras, microphones, and sensors are required. These physical devices, including psychobiological sensors, provide accurate signals that allow the machine to gain access to human emotional states.
The primary goal of this study was to investigate the performance of facial EMG for recognizing valence-arousal affective states, in effort to bridge the gap between machine and human emotional experiences. We specifically focused on the classification of intensive valence and arousal affective states on facial EMG activities captured over corrugator supercilii and zygomaticus major, as well as age and gender differences.
Future, successful application of facial EMG for identifying affective states in HMI must be in real-time-the machine-learning system could offer us the opportunity to classify affective states automatically. With the combination of corrugator and zygomaticus EMG data, the classification rate achieved high accuracies from 75.69-100.00% for the baseline, 0VLA, PVHA, PVLA, NVHA, and NVLA in all individuals, young and senior groups, female and male participants. In contrast, by using single corrugator or zygomaticus EMG data, the classification rate for the baseline reached a perfect level while the other five affective states reached only a very low level. Basically, combining a separated corrugator as well as azygomaticus EMG would be the better way for affect classification.
One interesting finding of this study is that the classification accuracy for senior adults was significantly higher than that for young adults. To our knowledge, there has been no consensus achieved for age differences on emotion recognition, especially for combined intensive valence and arousal states. As such, our findings provide valuable new information on the topic. No significant difference due to gender was found in the classification results, which confirms published findings that though women may recognize emotion more accurately, they show no difference in intensive affective states compared to men [48].
One limitation of this study is that it only employed facial EMG; other psychobiological channels such as skin conductance level (SCL), supposedly associated with arousal, could help to differentiate valence and arousal affective states. Future research should consider these shortcomings and increase the sample size. Moreover, similar to the interpersonal interaction situation, in HMI, users' expectations regarding cooperation and competition affect their emotional responses [49]. Future studies investigating possible confounds of expectation, motivation, personality, gender, and neurological or psychiatric conditions are necessary due to the complexity of individuals and machine companionship goals. The emotion dimension, in which valence emotions have received much attention using facial EMG, is predominantly employed. One dimension is not enough, though, so this study went beyond the single dimension of valence. To the best of our knowledge, no study has investigated the effect of dominance on facial EMG, even though dominance is assumed to strongly correlate with valence. This should also be considered in future studies of emotion recognition through facial EMG.
In conclusion, the facial EMG technique for differentiating valence-arousal emotions was indeed confirmed by the results of our experiments. Similar to conditions in natural circumstances, in HMI, emotional experiences consist of many elementary emotions that may rapidly change. Thus, facial EMG response patterns may indicate dynamic emotional states [50]. Moreover, the procedure employed to measure emotions induced by a standardized set of affective visual stimuli in this study may contribute to methods for successfully identifying individual situations in HMI. Future applications, such as real-time calibration methods for emotion recognition in machines or even in companion robotic systems and for users with personalized needs in healthcare settings, may be informed by the results of this study [51].
Supporting Information S1 Data. Data of features for 7 conditions. In this dataset, there are 7 excel files. Each of them includes 32 features (from corrugator and zygomaticus EMG) for certain combination of different period of baseline and evoked emotions.