An event-related potential comparison of facial expression processing between cartoon and real faces

Faces play important roles in the social lives of humans. Besides real faces, people also encounter numerous cartoon faces in daily life which convey basic emotional states through facial expressions. Using event-related potentials (ERPs), we conducted a facial expression recognition experiment with 17 university students to compare the processing of cartoon faces with that of real faces. This study used face type (real vs. cartoon), emotion valence (happy vs. angry) and participant gender (male vs. female) as independent variables. Reaction time, recognition accuracy, and the amplitudes and latencies of emotion processing-related ERP components such as N170, VPP (vertex positive potential), and LPP (late positive potential) were used as dependent variables. The ERP results revealed that cartoon faces caused larger N170 and VPP amplitudes as well as a briefer N170 latency than did real faces; that real faces induced larger LPP amplitudes than did cartoon faces. In addition, the results showed a significant difference in the brain regions as reflected in a right hemispheric advantage. The behavioral results showed that the reaction times for happy faces were shorter than those for angry faces; that females showed a higher accuracy than did males; and that males showed a higher recognition accuracy for angry faces than happy faces. Due to the sample size, these results may suggestively but not rigorously demonstrate differences in facial expression recognition and neurological processing between cartoon faces and real faces. Cartoon faces showed a higher processing intensity and speed than real faces during the early processing stage. However, more attentional resources were allocated for real faces during the late processing stage.

With the development of social communication, cartoons have entered people's lives. In addition to real faces, people also encounter many cartoon faces on daily basis. Moreover, common social networks (e.g., WeChat) provide various cartoon face emoji for communicating and expressing emotions. Compared with real faces, cartoon faces usually have larger eyes, smaller noses, and finer skin texture (Schindler, Zell, Botsch & Kissler, 2017). Chen and colleagues (2010) found that people developed a preference for real faces with larger eyes after adaption to cartoon faces with unusually large eyes in Japanese cartoons. Some researchers compared cartoon faces and real faces with regard to recognition accuracy and reaction time. Kendall, Raffaelli, Kingstone, and Todd (2016) asked participants to identify emotions on five sets of briefly presented faces that ranged from photorealistic to fully iconic. The results showed stronger emotion recognition accuracy for cartoonized faces. In another study (Wang, Wang, Wang & Lu, 2012), participants showed faster reaction times to real faces than cartoon faces when they were required to determine whether an image was a face or a car. However, research on the recognition of cartoon and real faces has shown mixed results. Using synthesized emotion images, Hoptman and Levy (1988) studied the processing preference of left-and right-handed individuals for cartoon and real faces. The results failed to reveal significant difference between cartoon and real faces.
Both cartoon and real faces convey emotional information through facial expression. The six basic facial expressions can be categorized as positive or negative expressions. Mixed results have been reported by research on the reaction times and recognition accuracies of positive and negative facial expressions. Some believe that reaction times for positive expressions are faster than those for other facial expressions. Eimer, Holmes and Mcglone (2003) found that recognition of happiness is faster than that of other basic facial expressions. In an identification task regarding whether facial expressions were neutral or emotional, participants showed the shortest reaction time for happiness, the longest reaction times for sadness, the lowest error rate for surprise, and the highest error rate for sadness. Calvo and Lundqvist (2008) asked participants to press different buttons for each of the six basic facial expressions and found similar results. However, they found that the recognition accuracy for happiness was highest, whereas the recognition accuracy for fear was lowest. Other studies have suggested people recognize negative facial expressions faster than positive ones. Hansen and Hansen (1988) found that the search speed for angry face targets was faster when both angry and happy faces served as targets and distractors. Eastwood, Smilek and Merikle (2001) reported that the search for negative facial expressions (sadness) was faster than positive facial expressions (happiness) when neutral faces were used as distractors.
The above experiments investigated the differences among different facial types and facial expressions from the perspective of behavioral science. Using this behavioral research as a basis, other researchers have also used event-related potentials (ERPs) to study the neurophysiological basis behind these differences. ERPs refer to the changes in the electrical potential of various brain regions when a stimulus is applied or removed to the sensory system or a certain part of the brain (Wei & Luo, 2002). ERPs directly reflect electrical neurological activity. ERPs have been widely applied in face processing research because it provides high temporal resolution, real-time and non-invasive measurement, and a connection among stimulus events, psychological reactions, and brain activity. ERPs can be used to classify different visual stimulus and differentiate disparate emotional states. Without any participant response, ERP testing enables the measurement of emotional attitudes that people are unwilling to express (Bernat, Bunce & Shevrin, 2001).
The ERP components related to faces and facial expressions include N170, vertex positive potential (VPP), late positive potential (LPP), and others. N170 is primarily distributed in the occipito-temporal region of the brain and usually shows a larger response in the right hemisphere (Rossion & Jacques, 2008). N170 is a face-specific ERP component, and its peak shows face selectivity. N170 is only induced by face stimuli (i.e., not by furniture, cars, hand gestures, or other stimuli) (Bentin, Allison, Puce & McCarthy, 1996). Related to face type, research has shown that the N170 components induced by real and cartoon faces do not significantly differ (Wang et al., 2012).
Another study (Sagiv & Bentin, 2001) showed that real faces induced a stronger N170 effect than abstract sketches of faces. Compared with schematic faces, however, the difference was not significant. Facial expressions are also related to N170 during early processing (Galli, Feurra & Viggiano, 2006). A meta-analysis revealed that larger N170 amplitudes are associated with facial expressions of anger, fear, and happiness compared with neutral facial expressions (Hinojosa, Mercado & Carretié, 2015). Rellecke, Sommer, and Schacht (2012) required participants to explicitly or implicitly process happiness, anger, and neutral faces. Their results showed that emotional faces induced larger N170 amplitudes than did neutral faces under both processing conditions. With respect to different facial expressions, Batty and Taylor (2003) recorded the ERPs of participants responding to the six basic facial expressions and neutral expressions. The results showed that positive expressions resulted in shorter N170 latencies than negative expressions and that fear expressions induced significantly larger amplitudes than did other expressions.
N170 has a corresponding positive component at the mind-central sites, namely VPP. VPP and N170 have similar functional properties. They are two manifestations of the same brain processes (Joycea & Rossion, 2005). VPP sometimes shows more sensitivity to facial expression information than N170, and VPP is influenced by facial expressions when N 170 is not (Ashley, Vuilleumier & Swick, 2004).
In addition, other studies have not found significant differences between the processing of positive and negative expressions (Codispoti, Ferrari & Bradley, 2006;Recio, Sommer & Schacht, 2011).
Krolak-Salmon, Fischer, Vighetto and Mauguière (2001) asked participants to view images of different facial expressions (e.g., fear, happiness, disgust, surprise, and neutral expressions) and recorded their ERPs during two different tasks with the same stimuli. First, participants were instructed to focus on the gender of the faces by counting the number of males or females. Second, they were asked to focus on facial expressions by counting the number of faces that looked surprised.
The results showed significant differences between late-latency ERPs to emotional faces and those to neutral faces (between 250 and 550 ms of latency). The activation was symmetric in the occipital lobe. The ERP components of different facial expressions differed between 550 and 750 ms on the right side of the occipital cerebral region. Topographic maps of these differences showed specific right temporal activity related to each emotional expression.Differences were also found with regard to the processing of different face types. Researchers studied adults' ERP processing of real and cartoon faces with neutral expressions (Wang et al., 2012). The results indicated that real faces induce significantly higher average LPP amplitudes in the occipital and temporal regions than do cartoon faces. The processing of the cartoon faces showed obvious lateralization, primarily in the right parietooccipital area, whereas the processing of the real faces was primarily in the parietooccipital areas of both sides. Schindler et al. (2017) employed six face-stylization levels varying from abstract to realistic and investigated the difference in the processing of real and cartoon faces. The results showed that the LPP amplitude increased as the faces became more realistic. The above studies suggest that different face types and facial expressions induce significantly different ERP components and amplitudes in different brain regions.
In addition, other studies have investigated the influence of participant gender on the recognition of facial expressions. Hoffmann and colleagues (Hoffmann, Kessler, Eppel, Rukavina & Traue, 2010)asked male and female participants to identify six basic but subtle facial emotions (50% emotional content). The results showed that women were more accurate than men at recognizing subtle facial displays of anger, disgust, and fear, suggesting that women are more sensitive to negative emotions. This result might be related to the role of women throughout evolution. The treatment of emotions comes from a corresponding neural basis. Even if the influence of gender is not reflected in the accuracy or speed of facial expression recognition, differences remain between men and women with regard to neural activity. Wildgruber, Pihan, Ackermann, Erb & Grodd (2002) found no behavioral difference between males and females with regard to differentiating happy from sad sounds. However, higher response amplitudes within the left-hemisphere posterior middle temporal gyrus were found among women compared with men, whereas a larger increase of activation within the right middle frontal gyrus was observed among the latter. Han, Gao, Humphreys & Ge (2008) found significant differences in the behaviors and brain activities between men and women during emotion-related tasks. Women showed faster threat detection times, while men showed stronger posterior parietal activation.
In summary, the existing research suggests that the N170, VPP, and LPP ERP components are closely related to facial expression processing and that each component presents different properties when processing the unique facial expressions of real faces. However, consistent conclusions do not exist regarding the comparison of processing methods, speeds, and intensities between cartoon faces and real faces. With respect to facial expression selection, happiness is usually used as a positive expression, whereas anger and sadness are usually selected as negative expressions (Schindler et al., 2017;Hansen & Hansen, 1988;Eastwood et al., 2001;Bentin et al., 1996;Hietanen & Astikainen, 2013). The present study used anger and happiness for comparison because the recognition accuracy of anger is higher than that of sadness (Eimer et al., 2003). Moreover, significant differences exist between males and females with regard to anger recognition accuracy (Calvo & Lundqvist, 2008;Hoffmann et al., 2010). The present study used an ERP methodology to investigate the processing of real and cartoon facial expressions among men and women. We hypothesized that (1) recognition time would be faster with regarding to a positive emotion (i.e., happiness) than a negative emotion (i.e., anger); (2) women would recognize facial expressions faster and more accurately than would men; (3) the late component LPP, but not N170 or VPP, would be affected by emotional valance; and (4) face type (i.e., real and cartoon faces) would influence the amplitudes and latencies of N170 and VPP as well as the amplitude of LPP.

Method Participants
We recruited 17 participants (11 males, 6 females; average age = 24.18, SD = 2.32) from universities in Beijing. All participants were right-handed, had normal hearing and vision (with or without correction), and no history of hearing, neurological, or psychiatric disorders. Participants signed an informed consent document prior to the experiment and were compensated for their time after the experiment. This study was approved by the ethical committee of Beijing Key Laboratory of Learning and Cognition of Capital Normal University.

Materials
The pictures used in the experiment were selected from the Chinese Facial Affective Picture System (CFAPS; Wang and Luo, 2005) and the Japanese Female Facial Expression (JAFEE) database. Fifty pictures of happy faces (25 males and 25 females) and 50 pictures of angry faces (25 males and 25 females) were selected from the two picture databases. In total, 100 pictures were selected. We used MYOTee (a cartoon image editor) to convert these faces into cartoon faces.
Subsequently, we used Photoshop to overlay the cartoon faces onto the original pictures for fine-tuning, and we retained the same face structure and hairstyle to synthesize 100 cartoon facial expression pictures. In total, 200 pictures were used in this experiment. All pictures were presented in black and white with a resolution of 260 × 300 at a consistent contrast. Twenty additional volunteers (non-participants; mean age = 25.3 years) evaluated the pictures.
The evaluation included the identification of facial expression type (i.e., by pressing the "G" key for happiness and the "F" key for anger) and a Likert rating of the facial emotion (9 = extremely happy or angry; 1 = not at all happy or angry). The evaluation results revealed a recognition accuracy of 95.9%, with an emotion intensity rating of 4.81 ± 1.91 (Table 1). Therefore, all 200 pictures were retained as stimuli for the experiment.

Procedure
The experiment was conducted in a quiet and dimly lit laboratory. The stimulus images were presented on a 16-inch CRT monitor with a screen resolution of 1920 x 1080. Participants were required to complete facial expression identification tasks according to instructions presented on the monitor. Their electroencephalogram (EEG) data were collected during the experiment. For each trial, a focus point was presented for 1,000 ms. Subsequently, a facial image was presented, and the participant was required to determine whether the face was happy or angry by pressing a button (happy = 1; angry = 2) within 1,000 ms. If a button was pressed within 1,000 ms, then the picture disappeared, and a blank screen was presented until the next picture appeared. If no button was pressed, then the picture disappeared after 1,000 ms, and a blank screen was presented until the next picture appeared. The duration of the blank screens varied randomly from 900 ms to 1,700 ms. Figure 2 shows the experimental procedure. The experiment was divided into two blocks, each with 100 trials. The pictures within each block were balanced. Participants were given 2-3 min to rest between blocks.

Statistical analyses
The behavioral and EEG data were analyzed using IBM SPSS 19.0 for Windows. N170 was analyzed using a 2 (face type: cartoon vs. real face) x 2 (emotional valance: happiness vs. anger) x 2

N170
Mean amplitudes and latency of N170 and VPP and standard deviations are shown in Table 3.

Processing differences between cartoon and real faces
The differences in the processing of the two face types were primarily reflected by the amplitudes and latencies of N170, VPP, and LPP. Although no significant difference was found with regard to reaction time, cartoon faces resulted in shorter N170 latencies than did real faces. This finding is not consistent with previous studies. Wang et al. (2012) found that adults' N170 latencies were shorter when viewing real faces than cartoon faces. These inconsistent results might have been caused by the differences in stimulus materials. In Wang's study, real faces were collected from kindergarten children (mean age = ~6 years), and cartoon faces were obtained from screenshots of high-resolution DVD cartoons. In the present study, the real face stimuli were collected from adult facial expression databases (CFAPS and JAFFE), and the cartoon faces were converted from these real faces using MYOTee. The own-age bias (OAB, Wright & Stroud，2002) might have caused these findings to differ from those of previous studies. The OAB states that people show better performance when recognizing faces from their own age group compared with other age groups.
While both studies used adult participants, the age of viewed faces was closer to the participant age group in the present study.
A difference was also found in the N170 and VPP amplitudes during the early processing of real and cartoon faces. Our findings suggest a significant difference in brain processing intensity for the two face types during early processing. Cartoon faces were associated with significantly higher amplitudes than real faces. Previous studies (Wang et al., 2012;Sagiv & Bentin, 2001)  revealed a U-shape relationship between N170 and face realism. That is, both the most abstract and most realistic faces caused stronger reactions compared with medium-stylized faces. The cartoon faces used in this study were converted from real faces using MYOTee. They are more simplified and abstract and, therefore, might have resulted in stronger N170 amplitudes. In addition, Proverbio, Riva, Martin and Zani (2010) found that infant faces elicited higher N170 amplitudes than did adult faces, most likely because of juvenile characteristics such as the larger proportion of the eyes. In the present study, the eyes of the cartoon faces were much larger than those of real people (including infants).
Real and cartoon faces also differed in LPP amplitude. The LPP amplitudes induced by real faces were significantly larger than those induced by cartoon faces. This finding is consistent with those of previous studies. When the neutral expressions of real faces and puppet faces were compared, no differences in N170 were observed. However, a stronger LPP was found with regard to real faces starting at 400 ms (Ma, Qian, Hu & Wang, 2017). This effect is because of the significance and uniqueness of the real face as well as the understanding of the portrayed individual (Wheatley, Weinberg, Looser, Moran & Hajcak, 2011) because computer-generated faces are usually more difficult to remember (Balas & Pacella, 2015;Crookes et al., 2015). Bruce and Young (1986) considered facial feature encoding and identify recognition as the second stage of face recognition.
This stage includes the accurate processing of facial information such as age, gender, race, and facial expression. Adults invest more psychological resources to real faces during late face processing. Compared with simplified cartoon faces, real faces convey more personal information and social meaning. LPP is related to facial attractiveness (Ma et al., 2017;Marzi & Viggiano, 2010;Werheid, Schacht & Sommer, 2007). Therefore, the results of the present study might suggest that real faces are more attractive than simplified cartoon faces to adults.
Significant differences were found among various brain regions. LPP amplitudes in the central and parietal areas were greater than those in the occipital area, indicating that the central and parietal areas are the major brain regions for face processing, which is consistent with Bauer's two-route model (1984). This model assumes that two routes exist for face recognition: the ventral route and the dorsal route. The dorsal route is primarily responsible for detecting the meaning of a face, beginning in the visual cortex, passing through the upper temporal sulcus, lower parietal lobe, and cingulate gyrus, eventually reaching the limbic system. In addition, the LPP amplitudes in the middle area and the right hemisphere were higher than those in the left hemisphere, suggesting a right hemisphere advantage. With regard to the N170 component, cartoon faces caused significantly smaller latencies than did real faces in the right hemisphere. However, no difference was observed in the left hemisphere. This finding is consistent with previous studies reporting a right hemisphere advantage with regard to face recognition intensity and speed (Yovel, Levy, Grabowecky & Paller, 2003). Furthermore, cartoon face processing showed obvious lateralization, primarily in the right hemisphere, whereas real face processing is bilateral (Wang et al., 2012).

Differences between angry and happy faces in face processing
The ERP results revealed that LPP amplitudes distinguished emotion type during late processing.
Angry faces induced larger positive LPP waves than did happy faces. This finding is consistent with Zhu and Liu (2014), who found that negative facial expressions were associated with significantly larger LPP amplitudes than were positive and neutral expressions. Previous studies (Holmes, Green & Vuilleumier, 2005;Whalen et al., 1998;Bradley, Mogg & Lee, 1997) reported bias in individual attention allocation when recognizing different facial expressions, especially those of negative emotions. Facial expression type did not influence N170. According to the face processing model of Bruce and Young (1986), facial expression processing is independent from facial structural processing. That is, facial emotion information should not influence N170. The research of Eimer et al. (2003) and Ashley, Vuilleumier snd Swick (2004) support this hypothesis.

Differences between men and women with regard to facial expression processing
Women showed a higher facial expression recognition accuracy than did men. Extensive research (Hall, Hutton & Morgan, 2010;Brewster, Mullin, Dobrin & Steeves, 2011;Mcbain, Norton & Chen, 2009;Megreya, Bindemann & Havard, 2011) has demonstrated that women have a face recognition advantage over men. In addition, an interaction effect was found between facial expression type and gender. Men showed a higher accuracy for recognizing angry faces, whereas women showed no difference between the two expression types. Previous studies did not find this interaction. Overall, the recognition accuracy of happy faces was higher for both genders compared with angry faces.
The high accuracy of angry face recognition among men might be because they are more physically aggressive than women (Glascock, 2008;Eagly & Steffen, 1986). High aggressiveness is positively correlated with serum testosterone concentration (Hu & Li, 2014). Therefore, men are likely more sensitive to social signals that convey aggressiveness.

Advantages and limitations
Our selected stimuli (i.e., the cartoon and real facial expression pictures) represent the advantage of this study. One important advantage is that the cartoon faces were converted from real faces and therefore retain the same facial structure and hairstyle. Most of the existing research has used screenshots of cartoon characters or sketched faces and expression icons (Wang et al., 2012;Eastwood et al., 2001;Sagiv & Bentin, 2001); therefore, they cannot exclude nuanced information other than facial expressions compared with real faces. Another advantage is that all of the images used were of Asian adults, which prevented the introduction of cultural and age differences that might have been caused by use of western emotional faces. One limitation of this study is its limited sample size, which might have resulted in the large standard deviation in VPP latency. Although the interaction effect of face type x emotion valence was significant, the simple effects were not. Future research should increase the sample size to examine the interaction of cartoon and real faces with regard to facial expression type. Another limitation is that although ERP is advantageous for its temporal resolution, its special resolution is low. Future research should apply fMRI, which has a high spatial resolution. Finally, children are the primary audience for cartoons. Childhood is an important stage for developing emotional cognition. Based on the present study, future studies should compare adults and children with regard to the processing of cartoon and real facial expressions. This line of research might help draw a clearer picture of the developmental process associated with cartoon face processing.

Conclusions
We used ERPs to measure the brain activity responses induced by the facial expressions of cartoon and real faces. According to the neurophysiological evidence in this study, face type has a strong but heterogeneous effect on the N170, VPP, and LPP components. During the early processing stage, adults process cartoon faces faster than real faces. However, adults allocate more attentional resources for real face processing during late processing stage. Facial expression type influenced late-stage component LPP, showing attentional bias for negative emotions; however, early-state N170 was not influenced. Future research should use larger sample sizes to examine the interaction between face type (real vs. cartoon) and facial expression.