The Dartmouth Database of Children’s Faces: Acquisition and Validation of a New Face Stimulus Set

Facial identity and expression play critical roles in our social lives. Faces are therefore frequently used as stimuli in a variety of areas of scientific research. Although several extensive and well-controlled databases of adult faces exist, few databases include children’s faces. Here we present the Dartmouth Database of Children’s Faces, a set of photographs of 40 male and 40 female Caucasian children between 6 and 16 years-of-age. Models posed eight facial expressions and were photographed from five camera angles under two lighting conditions. Models wore black hats and black gowns to minimize extra-facial variables. To validate the images, independent raters identified facial expressions, rated their intensity, and provided an age estimate for each model. The Dartmouth Database of Children’s Faces is freely available for research purposes and can be downloaded by contacting the corresponding author by email.


Introduction
Faces are important social stimuli and therefore a frequent focus of scientific investigation. They are used as stimuli in a variety of research areas, including emotion, social attention, speech perception, human face recognition, computer face recognition, eyewitness identification, and in the study of neuropsychological disorders such as autism and prosopagnosia. Within the field of face recognition alone, research covers a breadth of topics including expression recognition, identity perception and memory, gender discrimination, age recognition, and uses methods that range from behavioral testing to neuroimaging and neuropsychology. Access to a well-controlled set of face stimuli is critical to experimental design, and effects such as the own-age bias [1][2][3], where individuals are better at remembering faces from their own age group, demonstrate that the age of faces can be an important consideration in stimulus selection.
Although several databases of adult face stimuli exist (see [4], for an excellent review), very few databases of children's faces are available [5]. Of these, the most extensive is the NIMH Child Emotional Faces Picture Set (NIMH-ChEFS) [5], which includes front-facing images of sixty children between 10 and 17 years-ofage, posing five facial expressions with direct and averted gaze. This database provides a good variety of images, including faces of children of different races, multiple facial expressions and gaze directions, and visible extra-facial features such as hair, jewelry, and clothing. While this variety increases the usefulness of the database, it compromises certain aspects of stimulus control that may be desirable in some areas of study. Thus, there is a need for a freely available database of children's faces that, while providing stimulus variety, places particular emphasis on stimulus control.

Development of Database
1.1 Image acquisition. Photographs were taken at Dartmouth College and at the University of Minnesota using the same equipment and set up (Figure 1). Children between the ages of 5 and 16 (mean age 9.72, SD = 2.41) were recruited from the Dartmouth community and through the University of Minnesota database of child participants. Children (n = 123, 61 male) and their parents provided written assent and permission to have photographs taken and distributed to other researchers. They also specified whether photographs can be used in scientific publications and/or presentations (e.g. used in figures). Children were paid for their participation. Participation was in accordance with the ethical guidelines of the Committee for the Protection of Human Subjects at Dartmouth College.
Children were dressed in black salon gowns and black hats that covered their ears before being seated in front of a black felt backdrop. Glasses and jewelry such as necklaces and earrings were removed. Ceiling lights were kept on, but no camera flash was used. Instead, two spotlights with 250-Watt light bulbs lit the models from 30 degrees on each side of the central camera (0 and 60 degrees with respect to the model). Spotlights were softened with white photography umbrellas. Three Canon EOS Rebel XS cameras were positioned at a distance of 130 cm in front of the model at 0 degrees, and 30 and 60 degrees to the model's left or right. Dot-line remote releases allowed the three cameras to be triggered simultaneously.
Each model was asked to pose eight different facial expressions: neutral, content (smile without teeth), happy (smile with teeth), sad, angry, afraid, surprised, and disgusted. Two happy expressions, which we call ''happy'' and ''content'', were included to provide the option of having teeth visible or hidden (teeth can provide feature-based identity information) and to have a choice of intensity of the expression (happy with a large smile versus more subtle, ''content''/''pleased''). Models were coached by encouraging them to imagine situations that would elicit the desired facial expressions (e.g. Disgust: ''Imagine sitting in chewing gum'', or, Anger: ''Imagine your brother or sister broke your PlayStation''), and photos were taken until the photographer felt satisfied that the expressions were the best the child could produce. Each facial expression was photographed at least twice and in two lighting conditions. One lighting condition used both spotlights, while the other used only one (Spotlight A, see Figure 1). Models initially faced Camera 1 (see Figure 1). Once photographs were taken for all eight facial expressions, under two lighting conditions, the model was positioned to face Camera 3. The model was asked to produce all expressions again. Combined, this allowed for frontal, as well as 30 degree, and 60 degree angles of the each side of the model's face. An example of all facial expressions and camera angles for one model are in Figure 2.
1.2 Image processing. All images of all models were visually inspected for quality. Poor quality images (in which the model was moving, blinking, making an unidentifiable facial expression, or was out of focus) were removed. Due to the racial homogeneity in the Dartmouth community, the vast majority (118/123) of the children who volunteered to participate were Caucasian. Given the small number of other-race faces in our sample, we chose to restrict the image set to Caucasian models only, leaving images of 50 male and 51 female models. Images were cropped to 3006300 pixels (100 dpi) around the face and were otherwise left unprocessed. 2.2 Procedure. Participants rated at least one block of images, but no more than two blocks in one sitting. Blocks included all frontal images from 10 models (one block had 11 female models). Participants were seated at a comfortable viewing distance from a 130 Macbook Pro laptop computer. For each block of models, they were asked to perform three tasks: identify the facial expression, rate the intensity of the facial expression, and estimate the model's age.

Validation of the Database
Participants were first asked to identify the facial expression. The image appeared on the screen above seven possible expression words: neutral, happy, sad, angry, afraid, surprise, disgust. These words were displayed with a number corresponding to the key participants were to press to select that expression. The expression words were presented in a different order for each block of models. An additional option, ''none'' was included at the end of the list of expression words. Participants were instructed to choose ''none'' if they felt that the facial expression did not match any of the expression words. The experimenter explicitly stated that the word ''neutral'' represented a lack of facial expression, whereas ''none'' indicated that the facial expression was ambiguous. Content was not one of the choice expression words, so the correct classification for content facial expressions (happy with no teeth visible) was ''happy''.
After images were classified based on facial expression, participants rated the same images for the intensity of the expression. Images were presented randomly and participants were asked to rate them on a scale of 1-5 with 1 representing low intensity, and 5 representing high intensity. Participants indicated their rating by key press. Finally, participants were asked to estimate the age of each model. They were first shown three example faces, a 5-year-old, a 10-year-old, and a 15-year-old, to familiarize them with what children of those ages look like. Models for these examples were chosen from other blocks of models. Age estimates were given for one neutral expression image per model and were limited to whole values in years (e.g. 9-years-old). Participants indicated their age estimate by typing it in an answer box on the screen and confirming their answer by pressing enter. Participants were given unlimited time for all ratings. The total time to perform all three tasks for one block of images was between 18-25 minutes.

Expression Ratings
First, a score reflecting the identifiability of facial expressions was computed for each model. This score was based on the mean number of times the model's posed facial expressions were correctly identified by the raters. These scores ranged from 54.7% to 90.6% meaning that the best model produced facial expressions correctly identified by 90.6% of the raters on average. Based on these scores, the bottom 10 male and 11 female models were removed from the image set, leaving the best 40 male and 40 female models. We chose to retain 40 male and 40 female models in order to maximize the number of identities in the database, while removing the models with the lowest ratings. The identifiability scores for the final set of models ranged from 70.0% to 90.6% (Table 1). The remaining analyses were performed using only these 80 models.
A confusability matrix was computed, indicating the percent of accurate identifications of a given facial expression, and, if inaccurate, which expression was chosen ( Figure 3). Cohen's Kappa [6] indicated good agreement between rater-chosen expressions and intended expressions, Kappa = 0.780, 95% CI (0.775, 0.786). On average, the expressions were correctly identified in 79.7% of the images (SD = 22.7%), which is comparable to rates from other published face databases [7]. Happy and Content were the most accurately identified expressions; raters correctly classified 97.8% of the Happy (teeth visible) images and 90.8% of the Content (teeth not visible) faces. The least accurately identified expression was Afraid, which was correctly identified in 49.0% of the images, and most often confused with Surprised (26.0% of the Afraid images). This pattern is similar to that seen for ratings of Ekman and Friesen's Pictures of Facial Affect [8,9]. The mean rating for each model is indicated Table 1. Ratings for individual images can be found in the supplementary information (Table S1).
Others have found that the gender of the observer may affect memory for faces [10,11] and other stimuli [12] and that these effects may interact with the gender of the face stimuli [13]. We therefore performed a 262 analysis of variance (ANOVA) on the accuracy ratings with factors of Rater (Male vs. Female) and Model (Male vs. Female). There was a main effect of Rater gender such that Female raters were more accurate at identifying the intended expression (mean = 80.6%, SD = 23.5%) than Male

Intensity Ratings
An average intensity rating was computed for each image, and these ratings were used to compute an average intensity rating for each expression ( Table 2)

Age Ratings
The final 80 models ranged in age from 6-16 years (mean = 9.84, SD = 2.33). Age estimates from raters were used to compute a mean age estimate for each model (Table 1). A paired ttest revealed that the mean estimated age of the models (mean = 9.23, SD = 2.27) was significantly younger than their actual mean age, t(79) = 3.51, p,0.001.
We also computed the difference between the mean estimate score and the true age for each model. The absolute value of the difference scores was used to compute a mean difference between estimated age and true age of all models. This mean difference was 1.36 years, and the range of deviations was -3.32 to +3.15 years. Using these absolute difference scores, we performed a 262 ANOVA with factors of Rater (Male vs. Finally, to assess inter-rater reliability for age estimates, we performed 10 separate two-way mixed Intraclass Correlations (one for each group of models, see Table 3). Four raters did not provide age estimates for all models and so were excluded from this analysis. All average measure Intraclass Correlation Coefficients (ICCs) were significant (all ps,0.001) and ranged from 0.93-0.99, indicating strong agreement for age estimates in all groups.

Discussion
Here we present the Dartmouth Database of Children's Faces, a well-controlled database of faces of 40 male and 40 female Caucasian children. All faces were assessed by at least 20 raters for facial expression identifiability and intensity, and perceived age. There was good agreement between rater-chosen expressions and intended expressions. Consistent with other databases, Happy was most accurately identified [5,[7][8][9][14][15][16] and Afraid was least accurately identified [7][8][9]14,16]. As in Ekman and Friesen's Pictures of Facial Affect [8,9], Afraid was most often confused with Surprised. Only a small percentage of images were identified as ''None'', meaning that most facial expressions could be classified as one of the target expressions. Surprised was the most intense facial expression, whereas Neutral was the least intense. Happy was rated as more intense than Content, indicating that intensity ratings were meaningful. Raters were able to estimate the age of   the models within a little over a year of their true age and showed strong agreement in age estimates. Consistent with previous findings of gender differences in face tasks [13,17,18], female raters were more accurate than male raters at identifying facial expressions. Female raters also rated facial expressions as more intense overall. Taken together, these findings support the validity of the expressions, and also provide information that will allow researchers to choose particular intensities of the expressions and perceived ages. We sought to develop a database of images of children's faces that would be of use to researchers from a variety of fields. One area of research that may particularly benefit from this database is the own-age bias: the effect where individuals are better able to identify faces of people their own age [8,14,16]. While this effect has important implications for stimulus selection in face perception studies, the perception of own-versus other-age faces can be influenced by personal characteristics of the observer (e.g. experience with own-or other-age faces [1][2][3][19][20][21][22][23]). Although most research on the own-age bias has focused on identity recognition (but see [24,25]), one interesting question is whether the own-age bias extends to expression identification. Our database is ideally suited to facilitate studies designed to answer this question. Investigations such as these, on the interaction between stimulus and observer age, provide important information about the development of normal face recognition and support evidence from cross-species [26,27], and own-versus other-race studies [28][29][30] that suggest that a range of facial characteristics impact face perception. Given these considerations, it is worth restating that the image ratings provided here are from adult raters and that rater and model age may interact. Again, this is an interesting question for future research.
Although a single database will not satisfy the needs of every research study, our primary goal was to create a database that varied facial identity, gender, and expression, while minimizing variation in extra-facial features that could be used to distinguish individuals (e.g. jewelry, glasses, etc., but also skin color). While we chose to restrict our database to Caucasian children only, including children of different ethnicities would provide an avenue for answering additional research questions, such as those regarding other-race effects [29,31], and the interaction between the perception of race and age. The acquisition of images of children of different ethnicities was beyond the scope of this study, but this is undoubtedly a desirable addition for future image sets.
In general, findings on the intersection between participant and stimulus emphasize the importance of using a broad range of wellcontrolled stimuli in vision research. Given the disparity in access to adult versus children face stimuli, we hope that our freely available database of children's faces will help address this issue and fill the current void in stimulus databases, providing a useful tool for future research in a variety of areas.

Supporting Information
Table S1 Image ratings. Image ratings for all images included in the Dartmouth Database of Children's Faces. Images are listed by model number, file name, and intended expression. Rating is percent agreement between rater-chosen expression and intended expression. (PDF)