Subjective ratings and emotional recognition of children’s facial expressions from the CAFE set

Access to validated stimuli depicting children’s facial expressions is useful for different research domains (e.g., developmental, cognitive or social psychology). Yet, such databases are scarce in comparison to others portraying adult models, and validation procedures are typically restricted to emotional recognition accuracy. This work presents subjective ratings for a sub-set of 283 photographs selected from the Child Affective Facial Expression set (CAFE [1]). Extending beyond the original emotion recognition accuracy norms [2], our main goal was to validate this database across eight subjective dimensions related to the model (e.g., attractiveness, familiarity) or the specific facial expression (e.g., intensity, genuineness), using a sample from a different nationality (N = 450 Portuguese participants). We also assessed emotion recognition (forced-choice task with seven options: anger, disgust, fear, happiness, sadness, surprise and neutral). Overall results show that most photographs were rated as highly clear, genuine and intense facial expressions. The models were rated as both moderately familiar and likely to belong to the in-group, obtaining high attractiveness and arousal ratings. Results also showed that, similarly to the original study, the facial expressions were accurately recognized. Normative and raw data are available as supplementary material at https://osf.io/mjqfx/.


Introduction
Children communicate positive and negative emotions through multiple channels, namely: vocalizations, gestures, body postures, body movements and facial expressions (for a review, see [3]). Traditionally, research has focused on the latter. Not only do facial expressions signal the children's emotional state, but they can also evoke behavioral motives (e.g., motivation to nurture) in the observers (for a review, see [4]). Importantly, parent-child interaction and parental mental health may be predicted by how accurately the children's emotional expression is perceived (for a review, see [5] are still scarce and usually are only validated for the accuracy of emotional recognition. The goal of the current work was to extend the available norms for the Child Affective Facial Expression (CAFE; [2]), a database that exclusively includes photographs depicting facial expressions of children. Besides emotion recognition, for each stimulus, we also assessed a set of eight subjective evaluative dimensions concerning the model (familiarity, attractiveness, arousal, and in-group belonging) and the expression (valence, clarity, intensity, and genuineness) being portrayed. These additional subjective ratings provide important information that further extends the usefulness of the stimuli set. Specifically, it enables the selection of stimuli through a combination of criteria (e.g., happy faces controlled for attractiveness; fear faces varying in intensity). Static human face stimuli are the most frequently used type of material in emotion recognition and detection studies, and have been relying on both behavioral (e.g., forced-choice labeling of emotions; matching task) and non-behavioral methodologies (e.g., functional and structural MRI, EEG; for a review, see [6]).
In studies with children populations these materials are often used to investigate how (and at what age) children are able to understand and identify emotional faces (e.g., [7], for reviews, see [8,9]), or to characterize their affective reactions to emotional facial expressions (e.g., [10]). Importantly, children who are better at recognizing emotions in others also tend to be successful in several socioemotional areas (e.g., greater cooperation and assertion reported by parents, greater social competence reported by teachers, higher liking by peers, for a review, see [11]). Congruently, a wide range of child psychiatric disorders are associated to impairments in facial emotion recognition, which are likely to negatively affect family and peer relationships (for a review, see [12]). For example, children with bipolar disorder or severe mood deregulation show deficits in labeling emotions-particularly negative emotions such as fear or anger-displayed by adult or child models [13]. This lower performance in emotion recognition tasks was also detected for abused or maltreated children (e.g., [14][15][16], for a review, see [17]).
Studies with children participants have frequently used facial expression databases depicting adults. For example, Barnard-Brak, Abby, Richman and Chesnut [18] have recently validated a sub-set of the NimStim [19] with a sample of very young children (2-6 years old), and showed that they can accurately label photographs of adults depicting happiness, sadness, anger and fear. Other studies used these materials to investigate whether the findings demonstrated with adult participants also generalize to children. For example, LoBue [20] also used pictures from the NimStim in a study related to emotion detection and showed that children share the attentional bias for angry faces (i.e., angry faces are detected faster than happy or neutral faces). A subsequent study using another database depicting adult models (KDEF; [21]) showed that negative facial expressions impaired children's working memory to a greater extent, when compared to neutral and positive expressions [22].
Other studies have been using databases that include stimuli depicting non-adult models that can either be presented to children or adults. The availability of these databases is important for diverse research areas. In particular, these materials allow the use of peer-aged stimuli in studies with samples of children [23]. For example, a study with young children (3-5 years old) showed that the previously described attentional bias for angry faces is stronger when pictures of child (vs. adults) models are used [24]. Another important line of research did not focus on children's responses, but rather on the behavioral [25,26] or psychophysiological responses of adults in general, or parents [27][28][29], to children's emotional expressions. For example, Aradhye et al. [4] used photographs of children to examine how different expressions influence the responsiveness of non-kin young adults and found that smiling children were rated as more likely to be adopted than crying children. Other studies have even examined non-normative adult samples (e.g., maltreating parents or parents with psychiatric disorders).
For instance, mothers with borderline personality disorder (vs. controls) showed an overall lower performance in recognizing emotion in children-both their own and unknown children-and to misinterpret neutral expressions as sadness [30]. Likewise, neglectful mothers [31] and abusive fathers [32] tend to perceive children's emotional cues more negatively than non-maltreating parents.
Photographs of children's facial expression can also be used to investigate how variables such as the age of the model influence person [33] or emotion [34] perception. For example, in a recent study by Griffiths, Penton-Voak, Jarrold, and Munafò [35], children and adult participants categorized the facial expressions of prototypes of different age groups (created by averaging photographs of individuals of the same gender and age group). Results showed similar accuracy for both child and adult facial expression prototypes across age groups. Thus, no evidence of own-age advantage emerged in either group of participants. Nevertheless, the age of the model did interact with other variables, such as gender (for a review, see [36]). For example, Parmley and Cunningham [34] showed that adult participants were more accurate to identify angry expressions displayed by male children than by female children, whereas no sex differences were detected in the identification of angry expressions displayed by adult models.
Currently, there are plentiful validated databases of facial expressions (for a review, see [37]). These databases include dynamic (i.e., videos) and static (i.e., pictures) stimuli depicting human models of different nationalities and cultural backgrounds, expressing a wide range of facial expressions. However, most databases include only young adults as models [19,21,[37][38][39]. A few exceptions include adult models of distinct age groups. For example, the Lifespan Database of Adult Facial Stimuli [40] includes 18 to 93 years old models, and the FACES database [41] includes 19 to 80 years old models. As a consequence of this limited availability of validated databases depicting models across the lifespan, researchers often have to develop (and pre-test) new materials. For example, Parmley and Cunningham [34] selected a set of photographs of adults from existing databases, and complemented it with an original set of children's photographs. In Table 1 we present an overview of the databases that include photographs of facial expressions of children (for dynamic stimuli databases, see for example [42,43]).
As shown in Table 1, nine databases exclusively with photographs of children's facial expressions were recently published. These databases comprise standardized stimuli regarding graphic features (e.g., size, color, background) that were typically obtained through photoshoots in controlled settings (the CIF is an exception, with parents conducting the photoshoot and photographs processed by the authors). Facial expressions were prompted by employing different strategies during the photoshoot. For example, the models were exposed to videos (e.g., CEPS) or coached to imagine situations that would elicit the intended expression (e.g., "sitting on chewing gum" for eliciting disgust, DDCF). In other cases, the experience of the situation actually took place during the shoot (e.g., having infants tasting an unfamiliar food such as lemon to induce disgust, TIF). Despite these differences, all databases (except TIF and BIC-Multicolor) include specific emotions like happiness or anger, as well as neutral expressions. The characteristics of the models are also diverse across databases. For example, regarding age, the databases include photographs of infants (e.g., TIF; CIF; BF) or adolescent models (e.g., NIMH-ChEFS; DDCF). Nonetheless, there is a prevalence of Caucasian models across the databases (for exceptions, see [52,53]), which may limit the selection of ecologically valid stimuli in other cultural backgrounds (for a discussion on the implications of the demographic homogeneity of models, see [53]). Regarding the validation procedure, most studies were conducted with adult participants untrained in emotion recognition (an exception is the NIMH-ChEFS, which was subsequently validated with children and adolescents [54]), and typically entailed a forced-choice task to categorize the emotion depicted. In some cases, participants were also asked to rate the child expression in several evaluative dimensions (e.g., intensity, clarity, genuineness). The CAFE [1,2] comprises the largest stimuli set (i.e., 1192 photographs) and is one of the most diverse databases regarding the race or ethnicity of the models, including Caucasian/ European American, African-American, Latino, Asian, and South Asian children (see Table 1). The set includes a wide range of facial expressions-happiness, sadness, disgust, anger, fear, surprise, neutral-, with over 100 photographs per expression (minimum of 103 photographs depicting surprise, and maximum of 230 depicting a neutral expression). Another advantage of this database is the possibility to select different expressions produced by the same model. Moreover, although the models were photographed in constant conditions (e.g., same off-white background with overhead lighting), they are still depicted in a naturalistic way. For example, the hairstyle of the children is visible, in contrast with other databases such as the DDCF, which only shows the facial features and covers hair and ears.
The original CAFE stimuli were photographed by an expert (i.e., trained coder of facial expressions) and then validated by asking a sample of 100 untrained adult participants to identify the expressions (forced-choice task). As argued by Lobue and Trasher ( [2], see also [19]), the use of untrained participants has the advantage of obtaining emotion recognition scores of participants who are similar to those who will be recruited in future studies. In the validation study, the overall accuracy rate was 66%. However, there were significant differences in accuracy across the seven facial expressions, with pictures depicting happiness obtaining the highest accuracy scores (85%), followed by surprise (72%), anger and neutral (66%), disgust (64%), sadness (62%), and fear (42%). These accuracy rates were all significantly different from each other (except for anger vs. neutral and disgust vs. sadness). Results also showed that emotion recognition accuracy was not systematically influenced by the characteristics of the model (i.e., sex and race/ethnicity). Regarding the characteristics of the participants, only a significant effect of sex emerged, such that women raters were more accurate than men at identifying all facial expressions. A recent study examined preschoolers' (3-4 years old) emotional recognition accuracy of a subset of the CAFE, and revealed strong associations between their ratings and those obtained in the original validation with adult participants [55]. Further corroborating the usefulness of this database, since its publication in 2015, the CAFE stimuli have been used as materials in multiple research domains, such as the neural processing of emotional facial expressions [28], attentional bias [24], stereotyping [56][57][58][59], and morality [60][61][62].
The racial/ethnic diversity of the models included in the CAFE makes it a particularly useful database for research in the stereotyping domain, namely to investigate if the racial biases identified in response to adults of specific social groups (e.g., Blacks) generalize to children of that same group. For example, in a sequential priming task, adult participants were faster to identify guns (vs. toys) when preceded by pictures of Black (vs. White) boys, suggesting that the perceived threat typically associated to Black men generalizes to Black boys [59]. Likewise, children expected the same negative event (e.g., biting their tongue) to induce less pain when experienced by Black (vs. White) children, demonstrating that the assumption that Back people feel less pain than White people also generalizes to Black children [56]. Importantly, by including children of different age groups as participants, this latter study also allowed to identify when such bias emerges in development, given that the effect was only strongly detected by the age of 10.
Our main goal was to further develop the CAFE database by assessing how the stimuli are perceived in a set of eight evaluative dimensions. Some of these dimensions require judgments about the model (i.e., familiarity, attractiveness, arousal, in-group belonging), whereas other are focused on the expression being displayed (i.e., valence, clarity, intensity and genuineness).
The measures regarding the facial expression have been assessed in other databases of children's expressions (see Table 1). In contrast, the measures that entail judgments about the model are less common and have been assessed in validations of databases depicting adults (for a review, see [37]). For example, we included attractiveness ratings because attractive children (similar to attractive adults) are more positively perceived (e.g., more intelligent, honest, pleasant) than less attractive children (for a review, see [63]). Because the stimuli set was developed in a distinct cultural context we also included a measure of target's in-group belonging (i.e., rating of the likelihood of the child being Portuguese). This measure can be of interest given the evidence that the recognition accuracy of facial expressions is higher when there is a match (vs. mismatch) between the cultural group of the expresser and of the perceiver (for reviews, see [64,65]). This in-group advantage for emotion recognition was also found with child participants when judging emotional expressions displayed by adults (e.g., [66]). Moreover, we also included a forced-choice expression recognition task to replicate the original validation study. The comparison of the accuracy scores obtained with our Portuguese sample with those produced by an American sample also informs about the cross-cultural validity of the database.
Lastly, we will also examine if individual factors (e.g., sex of the participant, parental status) impact emotion recognition and subjective ratings of the facial expressions. For example, it was shown that parents of young children rated images portraying facial expressions of infants as clearer, when compared with participants without children, or with older children (TIF database, [49]).

Participants
The sample included 450 adult participants, from 18 to 71 years old (84.7% women; M age = 32.34; SD = 10.76), of Portuguese nationality, who volunteered to participate in a web-survey. Regarding their ethnic/cultural background, most participants reported being of Portuguese ancestry (88.4%). The majority of participants were active workers (54.0%) or students (33.6%), who attained a bachelor's degree (37.8%) or had completed high-school (36.4%). Regarding parental status, 43.8% of the participants were parents, and reported having up to four children (M = 1.66, SD = 0.76), with ages varying between 1 and 40 years old (M age = 9.93, SD = 9.22).

Materials
Our stimuli set included 283 images selected from CAFE [1]. The original database comprises color photographs of children posing in six basic emotional expressions (sadness, happiness, anger, disgust, fear and surprise), plus a neutral expression. The models (N = 154, 58.4% female) were heterogeneous in age (from 2 to 8 years old, M age = 5.3) and ethnic background (50% Caucasian/European American, 17.5% African American, 14.9% Latino, 10.4% Asian and 7.1% South Asian). The models were prompted to display each of the emotions by the photographer, who exemplified the intended expression. All models were covered from the neck down with an off-white sheet. The final set of 1192 photographs corresponds to the number of poses deemed successful. The photographs are available in high resolution (2739 x 2739 pixels) and are standardized regarding background color (off-white), viewing distance and figure-ground composition.
The stimuli sub-set used in the current work was selected based on several criteria. First, we took into consideration the accuracy of emotional categorization (i.e., "proportion of 100 adult participants who correctly identified the emotion in the photograph") reported in the original validation. Only photographs depicting facial expressions correctly identified by more than 50% of the sample were selected (resulting in 891 images). Second, we selected models that included photographs portraying neutral, happy and angry expressions (resulting in 455 images, 63 models). Third, we selected models that exhibited at least four different emotions (besides the neutral expression). Whenever different versions of the same emotion were available for the same model (e.g., happiness displayed with open and closed mouth), we selected the version that obtained the highest accuracy in the original database. Table 2 summarizes the characteristics of the photographs included in our sub-set (N = 283, corresponding to 51 models: 28 female, M age = 4.81; 23 male, M age = 5.00).

Procedure
The study was reviewed and approved by the Ethics Committee of ISCTE-Instituto Universitário de Lisboa. The study involved human data collection from adult volunteers. The study was noninvasive, no false information was provided, data were analyzed anonymously and written informed consent was obtained. The use of CAFE stimuli was approved by the Ethics Committee of ISCTE-Instituto Universitário de Lisboa and consent was obtained from Databrary via the signature of an Access Agreement. The parents/guardians of the children participating in the original CAFE study [2] signed a release giving permission for the use of their data/image in scientific research.
Participants were invited (e.g., institutional email, social networking websites) to collaborate on a web-survey aimed at testing materials for future studies. The hyperlink directed participants to a secure webpage in Qualtrics. The opening page informed about the goals of the study (evaluation of photographs of children displaying different facial expressions), its expected duration (approximately 20 minutes), and ethical considerations (i.e., anonymity, confidentiality and the possibility to withdraw from the study at any point). After agreeing to collaborate in the study, participants were asked to evaluate each photograph considering their overall perception of the child portrayed (i.e., familiarity, attractiveness, arousal and likelihood of the child being Portuguese) as well as the facial expression displayed (i.e., valence, clarity, genuineness and emotional intensity). All evaluations were made in 7-point rating scales (for detailed instructions and scale anchors, see Table 3). In addition, participants were asked to identify the facial expression by selecting the corresponding label (i.e., sadness, happiness, anger, disgust, fear, surprise or neutral).
Participants were informed that there were no right or wrong answers. Instructions also emphasized that the presentation order of the evaluative dimensions would vary across photographs. Before initiating the evaluation task, participants were required to indicate their nationality (if other than Portuguese they were directed to the end of the survey), gender, current occupation and education.
To prevent fatigue and demotivation, participants were asked to rate a subset of 20 photographs. These photographs were randomly selected from the 283 available to minimize any systematic response bias deriving from the composition of the subsets. Each trial corresponded to the evaluation of one photograph. Specifically, in a single page of the web-survey, the image was presented at the center of the page with all the rating scales below it. The rating scales were presented in a random order across trials. However, the facial expression identification task (labeling) was always presented at the end of each trial. The seven emotional labels were also presented in a random order across trials. At the end of the 20 trials, participants were asked to report their cultural background (i.e., Portuguese of. . . "Portuguese ancestry", "African ancestry", "Brazilian ancestry"; "Ukrainian ancestry" or "Other"), as well as their parenting status (parents were also asked to report the number of children, as well as the age of each child). Finally, participants were asked if their work entails regular contact with children, and if they have social contact with children other than their own (both using the following scale anchors: 1 = No regular contact at all; 7 = Very regular contact). Upon completion of the questionnaire, participants were thanked and debriefed.

Results
Given that we only retained completed questionnaires for analyses (N = 450) there were no missing cases. The preliminary analysis of the data showed no indication of systematic responses (i.e., participant using the same value of the response scale across dimensions) and a small percentage of outliers (1.02%-outliers were identified considering the criterion of 2.5 standard deviations above or below the mean evaluation of each stimulus in a given dimension). Therefore, no responses were excluded.
Below, we will present the analyses required to validate the stimulus set, as well as additional analyses that are potentially useful for researchers interested in using the set: 1. Overall subjective ratings: We present the descriptive statistics of the subjective ratings for the entire sample and compare ratings according to participants' gender and parental status. Additionally, we also examined the associations between evaluative dimensions and examined the role of individual differences (e.g., age, frequency of contact with children in social and work contexts) in these associations.
2. Impact of facial expression and model characteristics on subjective ratings: We compared ratings across evaluative dimensions according to facial expression (i.e., sadness, happiness, anger, disgust, fear, surprise or neutral), and model characteristics (i.e., sex and race/ethnicity of the model); 3. Emotion recognition: We examined individual differences in overall accuracy. We also examined the impact of the expression, as well as the influence of model characteristics, on the accuracy of emotion recognition (mean % of hit rates); 4. Cross-cultural comparison: We compared the accuracy in emotion recognition between the original and the current validation according to emotion type; 5. Frequency distribution: To facilitate the overall characterization of the stimuli in the set we also present the frequency distribution of images across three levels (low, moderate and high) of each evaluative dimension.
Each photograph was evaluated by a minimum of 31 and a maximum of 34 participants. Normative and raw data files are available at https://osf.io/mjqfx/. Appendix A includes item level data (i.e., descriptive results for the set of eight evaluative dimensions and accuracy rates of emotion recognition. Each photograph is described (e.g., file name, model characteristics and facial expression) according to the original CAFE database. Appendix B comprises normative data organized by participant (including socio-demographic information of the raters), overall emotion accuracy rate, and ratings for each evaluative dimension according to facial expression, and model's characteristics (i.e., sex and race/ethnicity). Appendix C includes full raw data.

Overall subjective ratings
We compared ratings across evaluative dimensions against the scale midpoint and tested for gender and parental status differences considering the entire set of stimuli (see Table 4).
Overall, participants evaluated the photographs above the scale midpoint in attractiveness, arousal, clarity, genuineness and intensity, and below the scale midpoint for in-group belonging and valence, all ps � .001. Familiarity ratings did not differ from the scale midpoint, p = .241. Regarding gender differences, results show that women provided higher attractiveness, arousal, in-group belonging, and intensity ratings than men. Lastly, parents evaluated the stimuli as more familiar, more intense, and aroused than non-parents. The correlations between evaluative dimensions are described in Table 5. Taking the strength of the correlation as criteria [67], we report correlations that were at least weak (i.e., r � .20). Results showed that clarity was strongly and positively associated with both genuineness and with intensity, such that facial expressions rated as clearer were also perceived as more genuine and intense. We also found a strong and positive association between genuineness and intensity. Familiarity ratings showed a moderate positive correlation with in-group belonging (i.e., models rated as more familiar were also perceived as more likely to be Portuguese). We also found the same type of correlation between intensity and arousal (i.e., children displaying more intense expressions were also perceived as more aroused). Attractiveness ratings were only weakly and positively associated with the remaining evaluative dimensions, as were the associations between arousal and clarity and genuineness, and between genuineness and familiarity and valence.
Frequency of contact with children in a work context was weakly and positively correlated with frequency of contact in a social context, and both variables were also weakly associated with participants' age. Note that overall the associations between these variables and the subjective ratings were non-significant or very weak (i.e., associations between each of these variables and familiarity, as well between frequency of work and social contact and attractiveness).

Impact of facial expression and model characteristics on subjective ratings
We computed mean ratings for each of the 283 stimuli across the eight evaluative dimensions and conducted three separate univariate ANOVAs to examine the influence of facial expression, the sex and race/ethnicity of the model on each variable (post-hoc comparisons were conducted with Bonferroni correction and only the extreme values will be presented). Descriptive results (means and standard deviations) are summarized in Table 6.
Familiarity. Familiarity ratings varied according to the type of facial expression, F(1,6) = 7.53, MSE = 1.27, p < .001, η p 2 = .14. Photographs displaying surprise obtained the highest familiarity ratings, all ps � .008 (but not different from sadness, p = .053, fear, p = .617 and happiness, p = 1.000), and neutral photographs obtained the lowest familiarity ratings, all ps < .001 (but not different from anger, disgust, fear and sadness, all ps = 1.000).  American models obtained the highest attractiveness ratings, all ps � .007 (but not different from Asian and European, both ps = 1.000) and South Asian models obtained the lowest attractiveness ratings, all ps < .001 (but not different from Asian, p = .216, and Latino, p = .602). Arousal. Arousal ratings varied according to facial expression, F(1,6) = 136.66, MSE = 36.13, p < .001, η p 2 = .75. Specifically, we observed that models displaying anger were perceived as more aroused, all ps � .001 (but not different from surprise, p = .214), and that those with neutral expressions obtained the lowest arousal ratings, all ps < .001. Arousal ratings did not vary according to the sex, F < 1, or the model's race/ethnicity, F < 1.
In-group belonging. Ratings regarding the likelihood of the model being Portuguese did not vary according to the emotion displayed, the sex or the model's race/ethnicity, all F < 1.
Valence. Valence ratings varied according to facial expression, F(1,6) = 311.80, MSE = 87.94, p < .001, η p 2 = .87, such that photographs displaying happiness were rated as the most positive, all ps < .001, and that photographs displaying anger were rated as the most negative, all ps � .002 (but not different from sadness, p = 1.000). Overall, we observed differences across subjective ratings according to the type of emotional expression, but not according to the characteristics (sex, race/ethnicity) of the models.

Facial expression recognition
Hit scores (%) were obtained for each stimulus by calculating the percentage of participants that correctly recognized the intended expression based on the number of participants that evaluated a given photograph.
We also examined the influence of facial expression, and both sex and race/ethnicity of the model by conducting three separate univariate ANOVAs (see Table 6). As expected, accuracy varied according to the facial expression, F(1,6) = 8.94, MSE = 2824.85, p < .001, η p 2 = .16 (see Table 6). Post-hoc comparisons with Bonferroni correction, showed that photographs displaying happiness obtained the highest accuracy rates, all ps � .001 (but not different from anger, p = .080, and surprise, p = .252), and that photographs displaying fear obtained the lowest accuracy rates, all ps � .040 (but not different from sadness, p = .839, and disgust, p = .869). Accuracy rates did not vary according to the sex, F(1,281) = 1.37, MSE = 505.15, p = .243, η p 2 = .01, or the model's race/ethnicity, F < 1. Again, we observed differences on accuracy rates according to the type of expression, but not according to the models' characteristics.

Cross cultural comparison
To compare the mean accuracy rates observed in our sample (for the same sub-set of stimuli) with those reported in the original validation study [2], we conducted a 2 (sample) x 7 (facial expression) univariate ANOVA. Results

Frequency distribution
We computed descriptive statistics (i.e., means, standard deviations and confidence intervals) for each photograph per evaluative dimension (see https://osf.io/mjqfx/). According to the confidence interval, each photograph was categorized as low (i.e., lower bound below scale midpoint), moderate (confidence interval included the scale midpoint) or high (lower bound above scale midpoint) on a given dimension (for a similar procedure, see [68][69][70]. For the valence dimension, the low, moderate and high levels correspond to negative, neutral and positive, respectively. Fig 2 represents the frequency distribution of photograph across dimensions. Regarding the evaluative dimensions concerning the model, results showed that most photographs were perceived as moderate in familiarity (79%) and in likelihood to belong to the ingroup (51%), and as high in attractiveness (75%). In the case of arousal, photographs were distributed across the three levels with the highest percentage of photographs evaluated as high in arousal. Regarding the dimensions related to the evaluation of the expression, most photographs were perceived as high in intensity (70%), genuineness (67%) and clarity (67%), and also as negative (53%).

Discussion
Databases of children's facial expressions have been used in a myriad of research domains, such as emotion detection and recognition, social cognition (e.g., impression formation, stereotypes), cognitive psychology (e.g., attention bias), with samples of normative or non-normative (e.g., psychiatric disorders) children or adults (parents or non-parents).
In this work, we provide further validation for a sub-set of one of the most comprehensive databases of facial expressions depicting children-the CAFE [2]. This sub-set (283 photographs) is varied regarding the characteristics of the model, as it includes stimuli depicting boys and girls of heterogeneous race/ethnicity. It is also varied in the range of expressions depicted (i.e., sadness, happiness, anger, disgust, fear, surprise, neutral). Moreover, one of the primary criteria for selecting stimuli for the current validation was to select models that exhibited at least four different emotions (51 models)-with angry, neutral and happy expressions mandatory. Angry and happy faces have been used to activate negative versus positive valence (e.g., [71]), or as exemplars of socially aversive versus appetitive stimuli (e.g., [72]). The availability of neutral expression for all the models is also of particular interest, as these stimuli may serve as baseline in several experimental paradigms (e.g., affective priming, approach-avoidance tasks), or as the target stimuli in impression formation tasks (e.g., [73]). Besides assessing emotion recognition accuracy (as in the original validation), we also asked participants to evaluate each stimulus in eight subjective dimensions focusing on the characteristics of the model or of the expression depicted.
Based on the overall mean ratings, the facial expressions were rated as high in clarity, genuineness and intensity, and the models were perceived as high in attractiveness and arousal, as moderately familiar and as low in their likelihood of in-group belonging. Overall valence ratings were negative, which is not surprising considering the range of facial expressions included (i.e., fear, sadness, anger and disgust vs. happiness, surprise and neutral). Differences according to the sex of the rater were only found for a few dimensions, such that woman (vs. men) evaluated the models as more attractive, aroused and as more likely to belong to the ingroup, and the expressions as more intense. Parental status also impacted mean ratings, such that parents (vs. non-parents) evaluated the models as more familiar and less aroused, and the expressions as more intense.
The overall accuracy in emotion recognition was satisfactory (77%) and did not vary according to the sex of the rater. This finding contrasts with the results from the original validation CAFE validation (i.e., higher accuracy rates for female respondents), but is in line with the results obtained in other validations of children's photos (e.g., [49]). Parental status did impact overall accuracy, but in the reverse direction: overall non-parents were actually more accurate than parents. However, parents of younger children (up to 8 years old, as the models included in our sub-set) were more accurate than those with older children. Previous studies that examined parental status have also failed to demonstrate a general advantage of parents in children's emotion recognition (e.g., [49]). In turn, differences regarding parental status seem to be found only in interaction with other variables, such as sex and type of facial expression [26]. Finally, the overall ratings were not strongly associated with the frequency of contact with children (both in work and social contexts).
Accuracy also varied according to the facial expression, with the highest accuracy rate obtained for happy faces (although not statistically different from anger and surprise). Indeed, studies have consistently shown an advantage in the recognition speed and/or accuracy of happy faces in comparison to other basic emotional categories (for a review, see [74]). The accuracy of emotion recognition was independent of the models' characteristics such as sex or race/ethnicity, replicating the original CAFE validation. Finally, the comparison of the results of the emotional recognition measure between our sample and the original validation for the same sub-set of stimuli, showed that overall, the accuracy rates of the Portuguese sample were lower. However, this difference was inferior to 4% and was due to higher recognition rates for neutral and sad faces in the original sample. Indeed, the accuracy rates for faces depicting surprise were higher in the Portuguese sample, whereas no cross-cultural differences were detected for the other facial expressions.
Overall, we found positive correlations between most evaluative dimensions (e.g., clarity was strongly and positively associated with genuineness and with intensity and the latter dimensions were also strongly associated). Importantly, the impact of facial expression was found for all dimensions (except judgments of in-group belonging). For example, happy faces were perceived as the most attractive, positive, clear and genuine, whereas angry faces were rated as the most aroused and intense. The characteristics of the models (i.e., sex, race/ethnicity) did not impact these ratings. Indeed, the only effect regarding race/ethnicity detected was for the attractiveness dimension, with African models rated as the most attractive (along with Asian and European models).
The CAFE database is suitable to be used with adult participants (e.g., to study how normative and non-normative samples differ regarding emotion recognition of child facial expressions). Moreover, this database is particularly useful in research conducted with samples of children as it allows for the use of peer-aged stimuli. Yet, the generalization of the current norms to children should be made cautiously. Although no differences between child and adult raters have been reported regarding emotion recognition performance [55], that might not be the case for some of the subjective dimensions. For example, a recent study showed that although ratings of valence and arousal produced by adults and children regarding facial expressions depicted by adult models were correlated, some differences emerged according to the raters' age group (e.g., children rated all expressions more positively [75]). The replication of the current validation procedure with children is recommended.
In sum, the current CAFE sub-set is diverse regarding the objective characteristics of the models and the range of facial expressions depicted. Note however, that this sub-set is limited regarding certain emotional expressions (e.g., photographs of fear expression are only available for 15 models). Another limitation is that the several model characteristics (race/ethnicity, sex and emotional expression) are not fully balanced (e.g., South Asian models are all females and Asian models are all males). This imbalance derives both from the distribution of exemplars across all categories in the original database and from the criteria used to select the subset for the current study. Also, the choice is limited for researchers interested in ambiguous facial expressions, as only 35 photographs show recognition rates below 50%. We expanded the original database by assessing an extensive set of evaluative dimensions. Most stimuli were rated as depicting genuine, clear and intense facial expressions. Also, regarding the evaluation of the models, most stimuli were evaluated as portraying familiar and attractive children. Results from the in-group belonging measure suggest the applicability of this set across different cultural backgrounds. For example, Portuguese participants indicated that most pictures (63%) depicted models with a moderate or high likelihood of belonging to their in-group. For valence and arousal dimensions, the stimuli are more equally distributed across the three levels of the dimensions. Hence, numerous exemplars of each level can be selected for future research. This normative data allows researchers to select adequate stimuli according to different criteria, for example manipulating the dimensions of interest (e.g., type of expression), while controlling for other variables (e.g., model characteristics).