Familiarity facilitates feature-based face processing

Recognition of personally familiar faces is remarkably efficient, effortless and robust. We asked if feature-based face processing facilitates detection of familiar faces by testing the effect of face inversion on a visual search task for familiar and unfamiliar faces. Because face inversion disrupts configural and holistic face processing, we hypothesized that inversion would diminish the familiarity advantage to the extent that it is mediated by such processing. Subjects detected personally familiar and stranger target faces in arrays of two, four, or six face images. Subjects showed significant facilitation of personally familiar face detection for both upright and inverted faces. The effect of familiarity on target absent trials, which involved only rejection of unfamiliar face distractors, suggests that familiarity facilitates rejection of unfamiliar distractors as well as detection of familiar targets. The preserved familiarity effect for inverted faces suggests that facilitation of face detection afforded by familiarity reflects mostly feature-based processes.


Introduction
Humans are thought to be face-experts. We are able to draw important information from faces such as emotions from facial expressions [1,2], direction of attention from eye gaze and head position [1,2], and recognition of identity [3], [4]. When focusing on face identity, human performance is dramatically different for familiar and unfamiliar faces. Despite the subjective impression of efficient or "expert" perception of faces in general, performance accuracy when discriminating unfamiliar face identities or perceiving that different images are of the same unfamiliar identity are markedly worse than for familiar faces [3,[5][6][7][8][9][10][11][12].
In previous work, we showed that personally familiar faces have a more robust representation as compared to unfamiliar faces for both early detection and perception of social cues. Familiar as compared to unfamiliar faces can be detected with reduced attentional resources and can be processed without conscious awareness [13]. Moreover, social cues, such as eyegaze or head orientation, are processed faster when conveyed by familiar faces [14]. With a saccadic reaction paradigm, we found that participants were able to detect and shift their gaze to familiar faces in 180 ms [15] when the distractors were faces of strangers, a latency shorter a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 than the known evoked potentials that differentiate familiar from stranger faces [16]; but see [17]). Overall, these results highlight a difference in processing between familiar and unfamiliar faces and point to a facilitation of familiar face processing that precedes the activation of a conscious, view-invariant representation [13,15], and that extends to the local features of a familiar face [12,14].
In order to test the hypothesis that fast and efficient detection of familiar faces relies primarily on feature-based processing, we assessed whether the advantage for familiar face detection persists for inverted faces. Face inversion has been used to demonstrate face-specific processing and the role of configural processing when faces are presented upright. Inverting a face disrupts configural and holistic processing, thereby increasing reliance on parts-based processing [18][19][20][21][22][23][24][25][26][27][28][29][30][31]). Faces are characterized by two types of relational/configurational properties: firstorder relational properties (e.g.; eyes above the nose above the mouth) and second-order relational properties (e.g. spacing between the eyes) [32][33][34]. Another term used in the face literature for face processing is "holistic" [35], meaning that all face-parts are processed as a whole [36]. In the present experiment we hypothesized that if familiar face recognition exploits identity-specific local facial features, then the advantage for personally familiar faces should be maintained with face inversion. On the other hand, if familiar face recognition relies on holistic or configural processing, face inversion should eliminate the familiarity advantage. We used a visual search task for personally familiar and unfamiliar identities with upright and inverted faces. The results showed that the advantage for familiar faces persists also after inversion. We discuss these findings in terms of parts-based processing for efficient detection of personally familiar faces.

Methods
Raw data, analysis scripts, and presentation code are available at https://osf.io/p86wb/ Participants 19 subjects (12 male, mean age: 24.79, SD 3.71) from three groups of friends participated in the experiment. No formal power estimate was computed to determine sample size, but we aimed for a sample size that was larger than that in a paper by Tong & Nakayama [37] (8-16 subjects) on a visual search task for one's own face, while recruiting subjects that were highly familiar with the familiar stimuli. We chose friends that had extensive daily interaction with each other occurring for at least one year prior to the experiment. They were recruited from the Dartmouth College graduate and undergraduate community. All had normal or correctedto-normal vision. Subjects were reimbursed for their participation; all gave written informed consent to use their pictures for research and to participate in the experiment in accordance with the Declaration of Helsinki. The Dartmouth Committee for the Protection of Human Subjects approved the experiment (Protocol 21200).

Stimuli
For each subject we created three sets of images: target familiar faces (two identities: one male, one female), target stranger faces (two identities: one male, one female), and distractor stranger faces (twelve identities: 6 male, 6 female). Prior to the experiment, subjects and their friends had their pictures taken to be used as stimuli in the experiment. To ensure that all stimuli were of equal image quality, pictures were taken in a photo studio with standardized lighting, camera placement and camera settings. For each identity we used two different pictures taken in the same session to reduce image-specific learning. The familiar targets were chosen among the subject's friends. The pictures of the 14 stranger individuals (12 distractor identities and 2 target identities) were taken at the University of Vermont with the same lighting, camera placement and settings as used for subjects recruited at Dartmouth College. For each subject the two unfamiliar target identities were chosen randomly. Inverted stimuli were created by rotating the images 180˚. Images were cropped and converted to grayscale using custom code written in Python on Mac OS X 10.9.5. The average pixel intensity of each image (ranging from 0 to 255) was set to 128 with a standard deviation of 40 using the SHINE toolbox (function lumMatch) [38] in MATLAB (R2014a).
Stimuli for visual search trials consisted of two, four, or six face images positioned on the vertices of a regular hexagon centered on the fixation point, such that the center of each image was 7˚of visual angle from the fixation point. Each image subtended 4˚x 4˚of visual angle. The position of the stimuli always created a shape symmetrical with respect to the fixation point (see Fig 1). All face images for each block were either upright or inverted.

Experimental setup
The experiment was run on a GNU/Linux workstation (Xubuntu 14.04 with low-latency kernel 3.13, CPU AMD FX-4350 quad-core 4.2 GHz, 8GB RAM, AMD Radeon R9 270 video card with radeon drivers) and a DELL 2000FP screen, set at a resolution of 1600x1200 pixels with a 60hz refresh rate, using Psychtoolbox (version 3.0.12) in MATLAB (R2014b). Subjects sat at a distance of approximately 50 cm from the screen (eyes to screen) in a dimly lit room.

Task
The experiment consisted of a brief familiarization phase with the same stimuli used in the experiment, followed by a visual search session. In a visual search task, an array of stimuli (here 2, 4, or 6 stimuli) is presented on every trial while measuring participants' reaction times. Linear search curves are fitted in order to relate the number of stimuli presented with subjects' behavior, and to quantify processes that lead to recognition of targets or rejection of distractors. In our experiment, we quantified how these processes were modulated by the main conditions of interest, namely familiarity (target personally familiar or unfamiliar), target orientation (upright or inverted), and target presence.
During the familiarization phase, subjects were shown the 32 images used in the visual search task (two familiar and two unfamiliar targets, and 12 distractors, with two images for each identity, see Fig 1). Images (both upright and inverted) were presented in random order. Each image was presented for two seconds. After the image disappeared, subjects were required to press a key to continue to the next image. They were instructed to carefully observe each face for the entire presentation and to continue at their own pace.
The visual search session consisted of eight blocks, with a short break after the first four blocks. In each block, subjects were instructed to search for one of the four target identities, with one upright and one inverted block for each identity. Within each block, all distractor faces were of the same sex and in the same orientation as the target images. Subjects responded as quickly and accurately as possible by pressing either the left-arrow key (target present) or the right arrow-key (target absent). They received feedback (a beep) if they responded incorrectly or did not respond within three seconds. No feedback was given for correct answers. Eye movements were explicitly allowed [37].
The order of blocks was counterbalanced for familiarity and face orientation within each subject. Familiarity always changed from one block to the next, while inversion changed every two blocks. Because of software error, the sex of the targets wasn't counterbalanced across subjects: 12/19 subjects had male targets in the first half of the experiment and female targets in the second half (and the converse for the remaining 7/19 subjects).
Each block started with 24 practice trials followed immediately by 120 test trials. At the beginning of each block, subjects were cued with one image of the target identity (upright or inverted) and pressed a key to start. On each trial, a central fixation cross appeared for a jittered period between 800-1000 ms, followed by a visual search array of two, four, or six faces displayed for a maximum of three seconds.
Target images appeared in half of the trials. The target was equally likely to appear in the left or right hemifield to avoid possible lateralization biases. Distractor faces were randomly chosen from the set of six distractor identities, and all distractors were different from each other. Stranger target identities never appeared as distractors. Each trial type was repeated 10 times in each block (with distractors randomly sampled every time). Each block thus had 120 trials: 3 (Set Size) x 2 (Target Presence) x 20 (2 different target images x 10 repetitions). The order of the trials within each block was randomized.

Statistical analyses
The analyses were run in R (version 3.2.3). The code for all the analyses are available on the Open Science Framework website (https://osf.io/p86wb/) as RMarkdown notebooks.
To assess statistical significance we fitted Generalized Mixed Models using the package lme4 (version 1.1.11 [39]). Significance of the model parameters was tested using a Type 3 analysis of deviance (Wald's χ 2 test), as implemented in the package car ( [40], version 2.1.1). We also used the following additional packages in our analyses: We analyzed subjects' accuracies using Logit Mixed Models [50], and reaction times of correct trials only with Linear Mixed Models. We preferred the use of Generalized Mixed Models (GMMs) instead of ANOVAs because of the advantages they provide [39,[51][52][53][54]. Unlike ANOVAs, GMMs provide a general framework to model data sampled from different distributions (e.g., accuracies and RTs). Logit Mixed Effect Models can be used to model categorical data, such as accuracies [50]; Linear Mixed Effect Models can be used to model normally distributed data, such as log-transformed reaction times. Moreover, one can better model the data by accounting for the variance generated by the use of specific stimuli; this is accomplished by introducing a random-effects term for the "items" (i.e., the stimuli), which is "crossed" with the random-effects term for the subjects [51]. The inclusion of random effects is then statistically tested to determine whether they help model the data better, by comparing the model of interest with a reduced model (in a stepwise fashion, removing one random effect at a time; see below). The inclusion of random effects modeling the variance generated by both subjects and stimuli at the same time can improve the model fit and reduce Type 1 error rates [54].
Following common practices for analyzing data from visual search paradigms [37,[55][56][57], we fitted separate models for target present and target absent trials to compare search parameters (such as search slopes) for the two target conditions (present/absent) [37,58]. For each model, we entered Set Size, Familiarity, and Target Orientation as main effects with all their interactions. The initial random-effect structure contained both subjects and items terms. For the latter term we entered the combination of stimuli appearing on the screen regardless of their position. This allowed us to model the variance due to subject and item (specific images) differences. We also added an extra regressor that indicated the sex of the target, and added random slopes with respect to this term for both subjects and items. We considered this term as a covariate, and thus we did not analyze it further.
The initial random-effect structure was tested using a log-likelihood ratio test against reduced models (created by removing random slopes first). For the linear models on reaction times in both target present and absent trials, the final structure contained subjects with random slopes and intercepts, and items with random intercepts-the model with random slopes for items failed to converge, thus we used a less complex model. The final logit models on accuracies in target present trials had subjects with random intercepts only, while in target absent trials it had subjects with both random intercepts and slopes.
After fitting the models with zero-sum contrasts for the regressors, we tested statistical significance of the fixed-effect terms using a Type 3 analysis of deviance (Wald's χ 2 test), as implemented in the package car [40]. For the models on reaction times we log-transformed the independent variable to account for the skewness of the distribution of reaction times; visually inspecting the predicted vs. residual plot confirmed that such a transformation provided a better fit for the model. The final linear model was refitted using restricted maximum likelihood estimation (REML). The analyses reported here can be reproduced using the code shared in the OSF repository, under analyses/acc-models.Rmd for the models on accuracy, and analyses/ rt-models.Rmd for the models on reaction times.
We used a bootstrapping procedure [59] to investigate the direction of the significant effects found by the models. Trials were always bootstrapped maintaining the structure of the original dataset. For example, for any bootstrap sample the number of trials within each subject and condition (Set Size, Target Presence, Target Orientation, Familiarity, and Target Sex) was preserved, and trials were sampled with replacement only within the appropriate subject and condition. For the next sections, numbers in square brackets represent 95% basic bootstrapped confidence intervals (CI) after 10,000 replications.
We estimated search slopes and Set Size 1 intercepts to further quantify subjects' behavior. We fitted a regression line for each subject and condition separately to obtain a linear prediction of subjects' reaction times from set size (number of stimuli presented). The slope of such lines provides an estimate of the speed of processing for each presented stimulus [58,60]; Set Size 1 intercepts represent interpolations of those lines were subjects presented with only one stimulus (i.e., regression line computed at Set Size 1). Slopes and Set Size 1 intercepts provide information about distractor-rejection and target-recognition processes respectively [37]. To obtain 95% confidence intervals we bootstrapped the trials (in a stratified fashion, i.e., maintaining the factorial design of the conditions) and ran the regression model again, repeating this process 10,000 times.
Subjects were overall faster when searching for a familiar face than a stranger face, and they were faster with upright faces than inverted faces (see Fig 3). The advantage for familiar faces was 114 ms [97,131] in the upright condition, and 75 ms [55,95] in the inverted condition, with a difference of 39 ms [13,65]). Fig 4 shows [11,32]). The search slopes for inverted faces were steeper than those for upright faces, and they did not differ across familiarity (familiar faces 122 ms/ item [112,130]; stranger faces 116 ms/item [107, 125]; difference -4 ms/item [-17, 8]).

Discussion
In this study subjects searched for friends' faces and strangers' faces in a visual search task. We found a processing advantage for personally familiar faces that was robust to face inversion. Subjects' behavior could be framed in terms of a self-terminating serial search [60,61], with target-absent search slopes about twice the target-present ones. In target present trials subjects were highly accurate both with familiar and stranger targets, showing no evidence of a speedaccuracy trade-off.
Set Size 1 estimates showed that familiar face targets were processed faster than stranger target faces when presented both upright and inverted. This result adds to the evidence that personally familiar faces benefit from facilitated processing in a variety of experimental conditions [13][14][15] and real-life situations [7].
Critically, in this experiment we showed that the advantage of familiar face processing extended to inverted faces. Evidence suggests that turning a face upside-down reduces holistic perceptual processing and favors feature-based processing [18,19,24,25,29]; see also [20,21]. Thus, the faster detection of personally familiar faces in the inverted condition suggests that more efficient processing of personally familiar faces rests largely on the enhanced processing of local facial features.
Our findings extend the theoretical relevance of the results by Tong and Nakayama [37], who used subjects' own faces as familiar identities. By using faces of subjects' friends instead of subjects' own faces, we made the experimental task closer to everyday experience given that we usually spend more time looking at the faces of other people, especially personally familiar others, than at oneself.
We found that the search slopes differed between familiar and stranger conditions for upright faces on target present trials, and for both upright and inverted faces on target absent trials, but not for inverted faces on target present trials. These results indicate that subjects were faster at rejecting a stranger distractor when looking for a familiar face target than when looking for a stranger face target, even in target absent trials, in which the stimulus arrays were equivalent for familiar target and unfamiliar target blocks. The increase of the reaction times based on the number of items in the search array and two-fold greater search slopes for target absent trials are consistent with a serial self-terminating search [60,61] that was faster when searching for familiar face targets than for stranger face targets. This indicates that the internal representation of a familiar face, against which each distractor is compared, is either more robust and precise or sparser. We propose that familiarity with a face, consolidated through extensive exposure and personal interactions, may direct processing to specific features that are diagnostic of a familiar face's identity, whereas the representation of a stranger's face does not include such learned features and is more holistic [62].
The results of our previous experiments support the hypothesis for a streamlined detection of familiar faces based on diagnostic, identity-specific features. We have shown that changes in eye gaze, a local feature that serves as a potent social cue, were detected faster when conveyed by personally familiar faces [14]. We also showed that personally familiar faces were distinguished from stranger faces in a saccadic reaction time task at a latency of 180ms [15]. The very rapid detection of familiarity reported in that study was faster than the time required to build a view-invariant representation of faces the monkey face patch system [63], further corroborating our hypothesis that rapid familiarity detection may be based on a simpler, perhaps feature-based, process.
The slightly smaller effect of familiarity on reaction times for inverted faces than for upright faces, as reflected by the significant Familiarity x Orientation interaction, may suggest that some of the features of familiar face representations that afford more rapid processing are configural or holistic. However, the greater magnitude of the familiar advantage even for the inverted faces shows that this facilitation relies mainly on local features. Related work by others also indicates that configural information is less important for recognition of a familiar identity (see [12] for a cogent argument).
An open question remains regarding the stage at which the advantage for visual processing of familiar faces first appears. Previous evidence supported the idea that this advantage can occur at pre-attentive stages [13], and within 200 ms from stimulus presentation [15] (see also [17]), suggesting that a more advanced stage of processing, when the participant is aware of the identity being shown, might not be necessary. In the present study no measure was collected as to whether participants correctly identified the target face at every trial, thus we were not able to disambiguate between detection and identification. However, building upon previous evidence, we refer to the advantage reported here as an advantage for detection of familiar faces. Moreover, such advantage persists after face inversion, and even when the targets are absent. This suggests that participants could be creating a better search template for familiar faces, but not unfamiliar ones, based on diagnostic features of the face, that were consolidated through repeated and extensive interaction, and stable across limited image variations (see Fig 1).
The experiment reported here was carefully controlled for low-level visual differences that could confound the results. To this end, we used a relatively limited set of images (32 images for 16 different identities), which could have introduced a learning component in the task. However, even if such a learning effect was present, familiar faces still showed an advantage. Learning during the experimental task would have reduced such an effect. The persistence of this advantage indicates that robust representations for personally familiar faces cannot be completely explained by simple visual learning over the time span of an experiment [37]. Building upon controlled experiments such as the one reported here, future studies should adopt more varied (and thus more challenging) images, perhaps taken from naturalistic settings, in order to quantify the advantage for familiar faces with more ecologically valid stimuli [10,12,64].
In summary, the results of our experiment add to the existing evidence that the human visual system is finely tuned for rapid detection of familiar faces, much more so than of stranger faces. Participants searched for a familiar or stranger identity among distractors presented in either an upright or inverted orientation. They responded faster when searching for familiar faces even in the inverted condition. Our results suggest that robust representations for familiar faces contain information about idiosyncratic facial features that allow subjects to detect or reject identities when searching for a friend's face in a crowd of stranger faces.
Supporting information S1 File. Analysis of deviance for the mixed models on accuracy and reaction times. (PDF)