The Part Task of the Part-Spacing Paradigm Is Not a Pure Measurement of Part-Based Information of Faces

Background Faces are arguably one of the most important object categories encountered by human observers, yet they present one of the most difficult challenges to both the human and artificial visual systems. A variety of experimental paradigms have been developed to study how faces are represented and recognized, among which is the part-spacing paradigm. This paradigm is presumed to characterize the processing of both the featural and configural information of faces, and it has become increasingly popular for testing hypotheses on face specificity and in the diagnosis of face perception in cognitive disorders. Methodology/Principal Findings In two experiments we questioned the validity of the part task of this paradigm by showing that, in this task, measuring pure information about face parts is confounded by the effect of face configuration on the perception of those parts. First, we eliminated or reduced contributions from face configuration by either rearranging face parts into a non-face configuration or by removing the low spatial frequencies of face images. We found that face parts were no longer sensitive to inversion, suggesting that the previously reported inversion effect observed in the part task was due in fact to the presence of face configuration. Second, self-reported prosopagnosic patients who were selectively impaired in the holistic processing of faces failed to detect part changes when face configurations were presented. When face configurations were scrambled, however, their performance was as good as that of normal controls. Conclusions/Significance In sum, consistent evidence from testing both normal and prosopagnosic subjects suggests the part task of the part-spacing paradigm is not an appropriate task for either measuring how face parts alone are processed or for providing a valid contrast to the spacing task. Therefore, conclusions from previous studies using the part-spacing paradigm may need re-evaluation with proper paradigms.


Introduction
There is a general consensus that the mechanisms involved in face processing are ''special,'' but there is less agreement as to what exactly constitutes this ''specialness.'' A newly developed paradigm, the part-spacing paradigm [1], has become increasingly popular in testing hypotheses for face specificity [2,3] and in the diagnosis of face perception in cognitive disorders [4][5][6][7][8][9]. By either manipulating the shape of face parts (i.e., the part task) or the fine distances among them (i.e., the spacing task), the measurement afforded by this paradigm is presumed to provide information as to how the brain processes featural and configural information, respectively. The part task, along with the spacing task, provides a perfect tool for examining how featural information of faces is processed and interacted with configural information in the context of a whole face (e.g., [3,10]). However, we argue that when it is used to characterize how face parts alone are represented, the part task may confound pure information about face parts alone with the effect of face configuration on the perception of those parts.
Our daily experience in face recognition suggests the importance of using both featural and configural information to correctly identify a specific individual in a fraction of a second. The underlying mechanisms of processing these two types of information are hotly debated, however. A dominant view suggests that faces are encoded and processed as a gestalt, without an internal part structure (i.e., Holistic-encoding hypothesis [11][12][13]). Recent fMRI and TMS studies, however, challenge this hypothesis by showing that featural information is encoded independent of the configural information, which supports a dual-mode hypothesis [14][15][16][17][18][19]. For example, a face-selective region in the lateral inferior occipital gyri (i.e., occipital face area, OFA [20,21]) is sensitive only to the presence of real face parts and not to the correct configuration of those parts [22, see also 21]. Also, TMS stimulation of this region selectively disrupts subjects' ability to discriminate faces on the basis of differences in face parts but not on the basis of differences in the spacing among those parts [23].
The dissociation in representing featural versus configural information in the brain suggests that the underlying mechanisms of processing the featural information should be studied outside the context of configural information. This is because evidence from the whole-part effect [12] and the composite effect [24] suggests that the discrimination of face parts is automatically influenced by an intact face configuration. Further, behavioral performance is a sum over outputs from all the stages involved, and therefore a measurement as regards face parts is likely to reflect effects from configural information, even if they had been processed at different stages and/ or at different neural substrates. Therefore, to acquire pure information about face parts, the configural information must be removed. However, most recent studies on featural information processing have been carried out in the context of the veridical face configuration. In the part task of the part-spacing paradigm, the shape of face parts (either eyes or mouth) varies, but the first-order face configuration (i.e., the ''T''-shaped configuration of eyes above nose above mouth) and the second-order face configuration (i.e., spacing) remain largely unchanged ( Figure 1A, top left). This design may lead to conflicting results. For example, there is currently a debate as to whether the inversion of face stimuli affects the processing of the featural and configural information differently [25]. Some studies have shown that the size of the inversion effect on the featural information is as large as that on the configural information, using the part-spacing paradigm [2,3,26,27, but see 1,4,19,[28][29][30][31][32]. Therefore, the featural information is proposed to be processed in a holistic fashion and not qualitatively different from that for configural information [3,10]. We argue that the lack of qualitative difference in processing between featural and configural information may actually stem from problems in the design of the part task itself, as the inversion effect observed in the part task may actually reflect an additional contribution from face configuration. Specifically, in the part task of the part-spacing paradigm, the firstorder face configuration is always present and, moreover, changing the shapes of face parts alters the second-order face configuration [16,19,33]. When face stimuli are inverted, the contribution from face configuration is eliminated, and therefore a decrease in accuracy is observed in behavioral performance. To test this hypothesis, we either disturbed the processing of face configurations by rearranging face parts in a non-face configuration ( Figure 1A, top right), or we kept the veridical face configuration but used high-pass-filtered face images, which is thought to reduce holistic face perception [34-36, but see [37][38][39] ( Figure 1A, bottom left). The inversion effect on face parts in both scrambled faces and high-pass-filtered faces was compared to that for full faces. We predicted that the inversion effect should be either absent or significantly reduced when the holistic processing was interrupted. Finally, we further illuminated the role played by face configuration in the processing of face part information by testing a group of self-reported developmental prosopagnosic subjects [40][41][42] who are specifically impaired in holistic face perception.

Subjects
Nineteen subjects (ages 21-31; 11 males) with normal face perception and six prosopagnosic subjects (ages 19-20, 1 male) participated in the study. All subjects are right-handed and have normal or corrected-to-normal visual acuity. The prosopagnosic subjects were identified among college students at Beijing Normal University through a 21-item self-report face-recognition questionnaire followed by a one-hour semi-structured interview developed by Kennerknecht et al. [43]. None of the prosopagnosic subjects reported a history of severe head injury or neurological disease, but all complain of severe problems with face recognition in daily life. Because there are no well-accepted standards for screening developmental prosopagnosics, many studies are simply based on self-report and interview results [44]. However, we recognize that this approach is not ideal, and that objective behavioral tests may provide more conservative inclusion criteria to ensure the purity of prosopagnosics (e.g., [3,7,45,46]). The protocol was approved by the IRB of Beijing Normal University. Informed consent was obtained from all subjects before their participation.

Stimulus and Procedure
General Procedure. Computer-based tasks were run on PC desktops using Matlab 6.5 with the psychophysics toolbox extensions [47,48] at a viewing distance of approximately 70 cm from the screen. Two experiments were conducted for this study.
The first experiment included five component tests: one spacing task and four versions of part tasks. For the part tasks, subjects were instructed to discriminate part changes in face stimuli that (i) either had veridical face configurations, or did not, and (ii) contained either full-spectrum spatial frequencies, or only high spatial frequencies, and that were presented either upright or inverted. Before each of the component tests, subjects were explicitly informed as to what aspects of facial information were changed, and ten practice trials were given at the beginning of each block to ensure that the subjects understood the instruction and were familiar with the stimuli. Instructions were to respond as accurately as possible without sacrificing response speed.
The second experiment tested the prosopagnosic subjects with the whole-part task, the spacing task, and the part tasks with face configuration either preserved or scrambled. All subjects but one in the normal group participated in both experiments -one subject did not participate in the part tasks with high-pass-filtered faces.
Stimulus. The face stimuli used in these tests were gray-scale adult Chinese faces with external contour (a roughly oval shape with hair on the top and sides) removed. Three male faces were used to generate the stimuli for the part task, and all stimuli were 7 cm wide and 8.3 cm high (5.7u66.8u visual degrees). Four sets of nine faces were generated from a face template containing eyebrows and nose. For the face set used in the standard part task of the part-spacing paradigm (Veridical), either the two eyes or the mouth were replaced in each of the nine faces by eyes and mouths of similar shape from three original male face images ( Figure 1, top left). For the face set without veridical face configurations (Scrambled), the face parts from the nine faces in the veridical face set were rearranged in a non-face configuration. This non-face configuration was the same for all face stimuli in this set ( Figure 1A, top right). For the high-spatial-frequency (HSF) face set, the above two face sets were Fourier-transformed and multiplied by high-pass Gaussian filters to preserve high spatial frequencies (above 40 cpf) (for details, see [35]).
Face stimuli for the spacing task were generated by varying the distance through either vertical displacement (between mouth and nose, 2 mm or 0.17u) or horizontal displacement (between two eyes, 3.4 mm or 0.28u) or both (horizontal displacement between two eyes, and vertical displacement between eyes and noses, 2.7 mm or 0.22u) ( Figure 1B). The displaced face stimuli were likely in the normal range of anthropomorphic norms (vertical displacement between mouth and nose: 1.06 standard deviations (SD); between eyes and nose: 0.64 SD; horizontal displacement between two eyes: 2.21 SD) [49].
Another three male faces were used as targets in the whole-part task. Two distractor faces in whole face condition were created for each target face by either replacing the eyes, the mouth, or the nose from the target face with the corresponding feature from a different face. The target and distractor whole-face stimuli were 8.9 cm wide and 10.7 cm high (7.3u68.7u). The individual face parts in the part condition were cropped from each of the target faces, creating a rectangular section with the feature in the center. The sizes of the face parts varied across different face parts, but the size for the same face part was constant ( Figure 1C).
The Part Task. Two identical faces, or two faces that differed only in eyes or mouth, were presented sequentially, either upright or inverted. Subjects were instructed to judge whether the two faces were identical. Each trial started with a blank screen for 1 s, followed by the first face stimulus presented at the center of the screen for 500 ms. Then, after a blank interval of 1 s, the second stimulus was presented for 500 ms. Each response was followed immediately by a visual feedback that provided accuracy feedback. Eight conditions from a 2 (Veridical versus Scrambled)62 (Fullspectrum versus HSF)62 (Upright versus Inverted) design were tested in separate blocks (i.e., 8 blocks in total). Tasks with stimuli with full-spectrum spatial frequencies were conducted before those with stimuli with only high spatial frequencies, and the test order of task with rest manipulation was counterbalanced across subjects. Each block included a total number of 72 trials, half of which consisted of identical faces.
The Spacing Task. The spacing task was also conducted in separate blocks so that the results could be compared directly with those of the part task. The procedure was the same as that of the part task, except that faces were either identical or they differed only in terms of the distances between parts.
The Whole-Part Task. This task had two phases. In the learning phase, subjects were instructed to memorize three faces and their associated names. Only when the subjects could correctly identify all face-name pairs were they allowed to enter the test phase. In the test phase, a question (e.g., ''Which is Xiao Zhang's nose (or mouth, or eyes)?'') was presented, followed by a choice of two alternative pictures presented to the left and right sides of the screen. The display was left on the screen until the subjects responded. There were two conditions, each consisting of 36 trials. For the part condition, the display contained two isolated features (e.g., two noses): one was from the target face and the other from one of the learned faces. For the whole condition, the display contained two intact faces, with the target and the foil face differing only with respect to the individual feature that had been tested in the part condition; all other feature information was constant. These two conditions were randomly interleaved.

Results and Discussion
We first examined whether there were qualitative differences in subjects' performance in discriminating part changes without the context of the veridical face configuration. The accuracy in discriminating part changes was analyzed in a two-way ANOVA for which the factors were stimulus type (Veridical versus Scrambled) and stimulus orientation (Upright versus Inverted). This ANOVA found significant main effects of stimulus type (F(1,18) = 6.09, p,.03) and stimulus orientation (F(1,18) = 25.37, p,.001). More importantly, a significant two-way interaction of stimulus type by stimulus orientation (F(1,18) = 15.10, p,.001) indicates that the amount of performance decrease in accuracy in discriminating face parts differs across stimulus type (Figure 2A).
In fact, a post-hoc pair-wise t-test with Bonferroni correction showed that the inversion of face stimuli significantly decreased the accuracy of discriminating face parts when face configurations were intact (t(18) = 5.98, Bonferroni corrected p,.001) (Figure 2A, Veridical), replicating previous findings [2,3,26,27,30,31]. Inverting the face parts in a non-face configuration, however, did not significantly decrease the accuracy of discriminating those face parts (t(18),1) (Figure 2A, Scrambled). Further, the failure to observe an inversion effect was not due to a floor effect because the subjects' performance was significantly higher than that to be expected from chance (i.e., 50%) whenever the faces were upright or inverted (ps,.001). Also, there was a significant drop in accuracy when the face configuration was scrambled than when it was intact, even when face stimuli were upright (t(18) = 4.92, Bonferroni corrected p,.001). Because the main difference between these two versions of the part task was the presence versus absence of the normal ''T'' face configuration, the difference in accuracy seems to have reflected an impact made by face configuration.
One may argue, though, that the absence of the inversion effect in the part task with scrambled faces may simply reflect a lack of experience with face stimuli that do not have veridical face configurations. To rule out this possibility, we examined subjects' behavioral performance in discriminating face parts when only high-spatial frequencies of face images were presented while face configurations were preserved ( Figure 1A, bottom left). Previous studies have shown that face stimuli containing only high spatial frequencies are processed less holistically [35]. Therefore, we could expect that with HSF faces, the inversion effect for face parts in the context of face configurations would be significantly reduced and not significantly different from that when face configurations were scrambled. Indeed, a two-way ANOVA analysis revealed no significant interaction of stimulus type (Veridical versus Scrambled) by stimulus orientation (F(1,17) = 2.16, p = .16) when faces were high-pass-filtered ( Figure 2B). The reduced inversion effect for face parts in the HSF faces was not due to a lack of experience, because low-spatial-frequency faces produced an even larger inversion effect than did full-spectrum faces [35]. Indeed, a significant three-way interaction of stimulus type (Veridical versus Scrambled), stimulus orientation (upright versus inverted), by spatial frequency (full-spectrum versus HSF) (F(1,17) = 7.62, p,.02) further indicates an additional contribution from face configuration on the perception of face parts in the part task. In other words, we found that by either rearranging face parts to a non-face configuration or by removing the low spatial frequencies of face images, the perception of face parts was no longer sensitive to face inversion. This suggests that there is a qualitative difference in processing configural and featural information (see also [25]).
Though our data show that the standard part task is not a pure measurement of part-based information, one might have argued that it could still serve as a contrast for the spacing task, so that researchers could investigate whether a manipulation (e.g., inversion) produced a qualitative difference between configural changes and part changes. This argument is not tenable, either. When faces were inverted, a significant decrease in accuracy in discriminating configural changes was observed (t(18) = 5.65, p,.001) ( Figure 2C); but this inversion effect was not significantly different from that for part changes in the standard part task (F(1,18) = 1.65, p = .22) (Figure 2A). On the other hand, the inversion effect for discriminating part changes in scrambled faces (Figure 2A) was significantly smaller than that for discriminating configural changes (F(1,18) = 12.87, p,.005). Therefore, the processing of featural information is indeed qualitatively different from the processing of configural information, but this difference is concealed by the presence of face configuration in the standard part task. In fact, the difference between these two versions of the part task, (Veridical2Scrambled)/Scrambled, was positively correlated with the inversion effect for configural changes in the spacing task, (Upright2Inverted)/Inverted, (r = 0.53, p,.02) ( Figure 2D), suggesting again that the standard part task involves the processing of configural information rather than simply partbased analysis. Therefore, the standard part task cannot be used as a valid contrast to the spacing task, either.
Because the part task of the part-spacing paradigm does not take into account the contribution of face configuration, the paradigm may be providing conflicting, or even false, information on face perception. For example, in a recent study in our lab on subjects with developmental prosopagnosia (DP) who show severe face perception deficits [50], the results from the whole-part task [12] and the part task of the part-spacing paradigm showed a conflicting pattern of deficits. The results from the whole-part task showed a significant two-way interaction of subject type (normal subjects versus DPs) by stimulus (whole versus part) (F(1,23) = 5.98, p,.03) ( Figure 3A). This indicates that the normal controls were better at discriminating a specific face part in the context of a whole face than when the part was isolated (t(18) = 2.62, p,.02), whereas the DPs performance showed a part superiority effect (t(5) = 3.05, p,.03). This finding suggests that the DPs are selectively impaired in holistic processing but that their ability to identify isolated face parts is largely intact. The results from the part task of the part-spacing paradigm, however, could indicate that the DPs are impaired in their ability to process face parts, because the DPs' performance in identifying face parts was significantly poorer than that of the controls in this task (t(23) = 2.13, p,.05) ( Figure 3B, Veridical). We suggest that these conflicting results are due to a failure to discount the contribution of face configuration in the part task. Indeed, when the DPs were instructed to discriminate face parts in a scrambled face, their performance was as good as that of the normal controls (t(23),1) ( Figure 3B, Scrambled). In fact, the DPs were impaired only in discriminating configural changes of faces, as the spacing task revealed (t(23) = 3.27, p,.005) ( Figure 3B), and this is consistent with the findings from the whole-part task. Previous studies have revealed that individuals with developmental prosopagnosia may be selectively impaired in the holistic processing of faces [3,6,41,46]. Consistent with these findings, we found that the DPs did not show the whole-part effect, and performed poorly in discriminating both configural and featural information in the context of a whole face. However, when face parts were presented either in isolation or outside the context of face configuration, the DPs' performance was as good as that of normal controls, suggesting that the processing of face parts is dissociable from the processing of face configurations.
In sum, converging evidence from both the inversion effect and the whole-part effect demonstrates that in the part task of the partspacing paradigm confounds pure information about face parts alone with the face configuration on those parts. It is therefore not an appropriate task for either measuring how face parts alone are processed or for providing a valid contrast to the spacing task. That being said, we are not suggesting that conclusions from previous studies using the part-spacing paradigm are problematic, because the results from the part task do partially reflect an analysis of face parts as the task explicitly instructs. Rather, we simply suggest that, because of the possible influencing role of face configuration in the processing of face parts, conclusions drawn from studies using this paradigm may need further re-evaluation with proper paradigms.