Adjudicating between face-coding models with individual-face fMRI responses

The perceptual representation of individual faces is often explained with reference to a norm-based face space. In such spaces, individuals are encoded as vectors where identity is primarily conveyed by direction and distinctiveness by eccentricity. Here we measured human fMRI responses and psychophysical similarity judgments of individual face exemplars, which were generated as realistic 3D animations using a computer-graphics model. We developed and evaluated multiple neurobiologically plausible computational models, each of which predicts a representational distance matrix and a regional-mean activation profile for 24 face stimuli. In the fusiform face area, a face-space coding model with sigmoidal ramp tuning provided a better account of the data than one based on exemplar tuning. However, an image-processing model with weighted banks of Gabor filters performed similarly. Accounting for the data required the inclusion of a measurement-level population averaging mechanism that approximates how fMRI voxels locally average distinct neuronal tunings. Our study demonstrates the importance of comparing multiple models and of modeling the measurement process in computational neuroimaging.

graphics model. We developed and evaluated multiple neurobiologically plausible 23 computational models, each of which predicts a representational distance matrix and a 24 regional--mean activation profile for 24 face stimuli. In the fusiform face area, a face--25 space coding model with sigmoidal ramp tuning provided a better account of the data 26 than one based on exemplar tuning. However, an image--processing model with 27 weighted banks of Gabor filters performed similarly. Accounting for the data required 28 the inclusion of a measurement--level population averaging mechanism that 29 approximates how fMRI voxels locally average distinct neuronal tunings. Our study 30 demonstrates the importance of comparing multiple models and of modeling the 31 measurement process in computational neuroimaging. 32 Introduction 46 Humans are expert at recognizing individual faces, but the mechanisms that support 47 this ability are poorly understood. Multiple areas in human occipital and temporal 48 cortex exhibit representations that distinguish individual faces, as indicated by 49 successful decoding of face identity from functional magnetic resonance imaging (fMRI) 50 response patterns (1-10). Decoding can reveal the presence of face--identity information 51 as well as invariances. However, the nature of these representations remains obscure 52 because individual faces differ along many stimulus dimensions, each of which could 53 plausibly support decoding. To understand the representational space, we need to 54 formulate models of how individual faces might be encoded and test these models with 55 responses to sufficiently large sets of face exemplars. Here we use representational 56 similarity analysis (RSA) (11) to test face--coding models at the level of the 57 representational distance matrices they predict. Comparing models to data in the 58 common currency of the distance matrix enables us to pool the evidence over many 59 voxels within a region, obviating the need to fit models separately to noisy individual 60 7 Fig 1. Face spaces obtained from a reference PCA model, perceptual similarity judgments and fMRI response patterns in human visual cortex. (a--b) We generated 12 faces in a polar grid arrangement on a 2D slice through the reference PCA space (3 eccentricity levels and 4 directions). The grid was centered on the stimulus space norm (not shown). Euclidean distances between the 12 faces are illustrated in a distance matrix (a) and in a 2D visualization of these distances (b, multidimensional scaling, metric--stress criterion). (c--d) Group--average perceptual face space (N=10) estimated from psychophysical similarity judgments. The distance matrix depicts the percentage of trials on which each pair of faces was rated as relatively more dissimilar. (e--h) Group-average cortical face--space (N=10) estimated from fMRI response patterns in human visual cortex. The distance matrices depict a cross--validated estimate of multivariate discriminability where 0 corresponds to chance--level performance (leave--one--run--out cross--validation, Materials and Methods). Each cortical region was defined in individual participants using independent data. The distance matrices use separate color scales since we expect variable effect sizes across cortical regions. For data from other regions of interest, see S2 Fig. Cortical face spaces are warped relative to the reference PCA space 113 In order to sample human cortical and perceptual face representations, human 114 participants (N=10) participated in a perceptual judgment task followed by fMRI scans. 115 The perceptual judgment task (S1 Movie, 2145 trials over 4 recording days) involved 116 force--choice judgments of the relative similarity between pairs of faces, which were 117 used to estimate a behavioral distance matrix (Materials and Methods). The same faces 118 were then presented individually in a subsequent rapid event--related fMRI experiment 119 (S2 Movie, 2496 trials over 4 recording days). Brain responses were analyzed separately 120 in multiple independently localized visual regions of interest. We compared 121 representational distance matrices estimated from these data sources to distances 122 predicted according to different models using the Pearson correlation coefficient. These 123 distance--matrix similarities were estimated in single participants and the resulting 124 coefficients were Z--transformed and entered into a summary--statistic group analysis for 125 random--effects inference generalizing across participants and stimuli (Materials and 126 Methods). 127 We observed a strong group--average correlation between distance matrices 128 estimated from perceptual dissimilarity judgments and Euclidean distances in the 129 reference PCA space (mean(r)=0.83, mean(Z(r))=1.20, standard error=0.05, p<0.001, 130  Table). Correlations between the reference PCA space and cortical face 131 spaces were generally statistically significant, but smaller in magnitude (Fig 5b--c, S5 Fig,  132 S1 Table). Distances estimated from the fusiform face area were weakly, but highly 133 significantly correlated with the reference PCA space (mean(r)=0.17, mean(Z(r))=0.17, 134 standard error=0.04, p<0.001, Fig 5c). Distances estimated from the early visual cortex 135 were even less, though still significantly, correlated with the reference PCA space 136 (mean(r)=0.07, mean(Z(r))=0.07, standard error=0.04, p=0.044, Fig 5b). These smaller 137 correlations in cortical compared to perceptual face spaces could not be attributed 138 solely to lower functional contrast--to--noise ratios in fMRI data, because the effects 139 generally did not approach the noise--ceiling estimate for the sample (shaded region in 140 Fig 5). The noise ceiling was based on the reproducibility of distance matrices between 141 participants (Materials and Methods). Instead, these findings indicate that the reference 142 PCA space could not capture all the explainable dissimilarity variance in cortical face 143

spaces. 144
Cortical face spaces over--represent eccentricity 145 We quantified the apparent warps in the cortical face spaces by constructing a 146 multiple--regression RSA model, with separate distance--matrix predictors for 147 eccentricity and direction, and for within and across face viewpoints (Fig 2a, Materials  148 and Methods). These predictors were scaled such that differences between the 149 eccentricity and direction parameters could be interpreted as warping relative to a 150 veridical encoding of distances in the reference PCA face space. We also observed strong 151 viewpoint effects in multiple regions, including the early visual cortex (Fig 1e-- Eccentricity changes had a consistently greater effect on representational patterns 155 than direction changes, suggesting that a step change along the radial axis resulted in a 156 larger pattern change than an equivalent step along the tangential axis (Fig 2c--d). The 157 overrepresentation of eccentricity relative to direction was observed in each 158 participant, and in both cortical and perceptual face spaces, although the effect was 159 considerably larger in cortical face spaces. A repeated--measures two--factor ANOVA 160 (eccentricity versus direction, within versus across viewpoint) on the single--participant 161 parameter estimates from the multiple--regression RSA model was consistent with these 162 apparent differences, with a statistically significant main effect of eccentricity versus 163 direction for perceptual and cortical face spaces (all p<0.012, S2 Table). Thus, compared 164 to the encoding in the reference PCA space, cortical face spaces over--represented the 165 radial, distinctiveness--related axis compared to the tangential, identity--related axis. 166 Fig 2. Face--space warping reflects an over--representation of eccentricity over direction information. (a) Squared distances in the reference PCA space were parameterized into predictors coding all 4 combinations of face--space metric (direction, eccentricity) and viewpoint (within, across). This multiple regression RSA model also included separate constant terms for each viewpoint, and was fitted separately to each face space using ordinary least squares (Materials and Methods). Importantly, the scaling of the eccentricity and direction predictors ensures that equal parameter estimates corresponds to a preserved reference PCA space.  Although these findings suggest a larger contribution of face--space eccentricity than 168 direction in visual cortex, we also observed clear evidence for greater--than--chance 169 discrimination performance among faces that differed only in face--space direction. 170 Group--average cross--validated discriminant distances for faces that differed in direction 171 but not eccentricity exceeded chance--level performance (p<0.05) for typical and 172 caricatured faces in both the early visual cortex and the fusiform face area (S3 Fig). Sub--173 caricatured faces were less consistently discriminable. Indeed, direction discrimination 174 increased with eccentricity both within (mean=0.013, standard error=0.007, p=0.036) 175 and across (mean=0.012, standard error=0.005, p=0.016) viewpoint in the fusiform face 176 area (linear effect of sub>typical>caricature), suggesting that direction discrimination 177 increased with face--space eccentricity in a dose--dependent manner (S3 Table). By 178 contrast, within viewpoint discrimination performance in the early visual cortex scaled 179 with eccentricity (mean=0.039, standard error=0.008, p<0.001), but distances that 180 spanned a viewpoint change did not vary with eccentricity (mean=0.009, standard 181 error=0.019, p=0.297). This is consistent with a view--dependent representation in early 182 visual cortex. Thus, cortical regions discriminate identity--related direction information 183 even in the absence of a difference in distinctiveness--related eccentricity information, 184 suggesting that cortical face representations cannot be reduced to a one--dimensional 185 code based on distinctiveness alone. In summary, cortical coding of face--space position 186 is systematically warped relative to the reference PCA space, with a substantial 187 overrepresentation of eccentricity and a smaller, but reliable contribution of face--space 188 direction. 189

Regional--mean fMRI activation increases with face--space eccentricity,
190 but removing such effects does not substantially alter cortical face--191 space warping 192 Cortical face--space warping could not be explained by regional--mean activation 193 preferences for caricatures. We performed a regional--mean analysis of responses in 194 each cortical area, which confirmed previous reports that fMRI responses increase with 195 distinctiveness across much of visual cortex (Fig 6, S6 Fig) (23). In order to test the 196 influence of such regional--mean activation effects on representational distances, we 197 adapted our discriminant distance metric to remove additive and multiplicative overall 198 activation effects (Materials and Methods). Distance matrices estimated using this 199 alternative method were highly similar to ones estimated without removal of overall 200 activation effects (all r=0.9 or greater for the Pearson correlation between group--201 average distance matrices with and without mean removal, S4 Fig). Thus, although 202 eccentricity affected the overall activation in all visual areas, the warping of the cortical 203 face spaces could not be attributed to overall activation effects alone. 204 Accounting for cortical and perceptual face spaces with PCA face--205 space--tuning models and image--computable models 206 We developed multiple computational models, each of which predicts a 207 representational distance matrix and a regional--mean activation profile. These models 208 can be divided into three classes: the sigmoidal--ramp tuning and exemplar tuning 209 models receive face--space coordinates as input, while the Gabor--filter model receives 210 gray--scale pixel intensities from the stimulus images as input. We evaluated each of 211 these three model classes with and without a measurement--level population--averaging 212 mechanism, which approximates how fMRI voxels locally average underlying neural 213 activity. 214 The sigmoidal--ramp--tuning model proposes that the representational space is 215 covered with randomly oriented ramps, each of which exhibits a monotonically 216 increasing response along its preferred direction in face space (Fig 3a). This model is It can be seen that measurement--level population averaging introduces a substantial U--226 shape in the individual response functions, with only a minor deflection in favor of a 227 preferred face--space direction. At the level of Euclidean distances between population 228 response vectors evoked by each face, this leads to exaggerated distances for radial 229 relative to tangential face differences (Fig 3c). In summary, measurement--level 230 population averaging provides a simple means to interpolate between two extreme 231 cases: A value of 0 corresponds to the case where the model's response is perfectly 232 preserved in the fMRI voxels, whereas a value of 1 corresponds to the case where the 233 model's response to a given stimulus is reduced to the arithmetic mean over the model 234

units. 235
In the exemplar model, each unit prefers a location in face space, rather than a 236 direction, and its tuning is described by a Gaussian centered on the preferred location. 237 The representational space is covered by a population of units whose preferred 238 locations are sampled from a Gaussian centered on the norm face (Fig 3d, Materials and  239 Methods). We fitted the Gaussian exemplar--tuning model similarly to the sigmoidal 240 ramp--tuning model, using two parameters that controlled the width of the Gaussian 241 tuning function and the width of the Gaussian distribution from which preferred faces 242 were sampled. We also evaluated a variant of the exemplar model where the 243 distribution of preferred faces followed an inverted--Gaussian distribution (Fig 3e). 244 Population averaging was modeled in the same way as for the sigmoidal--ramp--tuning 245 model using a third parameter. 246 represented. For the spatial frequencies, however, we let the data determine the 256 weighting. We fitted a weighted representational model with one weight for each 257 spatial--frequency bank (5 free parameters, Fig 4c). Local averaging in fMRI voxels was 258 modeled using two stages of measurement--level population averaging: First, filters with 259 tuning centers on either side of the vertical meridian were translated separately toward 260 their respective hemifield--specific population averages. Second, a global--pool averaging 261 was performed similarly to the other models. The contribution of these two population--262 average signals to the measured responses was modeled by 2 additional parameters 263 (Fig 4d). The additional hemifield--specific averaging stage was necessary to account for 264 strong view--specific effects in early visual cortex, but did not materially contribute to 265 the fit in ventral temporal regions. 266  268 We fitted each of the computational models so as to best predict the 269 representational distance matrices from cortical regions and perceptual judgments. We We found that all evaluated models explained almost all the explainable variance for 282 the perceptual face space (Fig 5a), with only negligible differences in cross--validated 283 generalization performance (for all pairwise model comparisons, see S4 Table). The 284 inclusion of measurement--level population averaging had little effect on performance. 285 This is expected because perceptual judgments, unlike fMRI voxels, are not affected by 286 local averaging across representational units. Thus, the behavioral data was ambiguous 287 with regard to the proposed models, which motivates model selection by comparison to 288 the functional imaging data. 289 Unlike the perceptual judgments data, the cortical face spaces exhibited substantial 291 differences between the model fits, with a robust advantage for measurement--level 292 population averaging in most cases. In the following, we focus on generalization 293 performance for fits to the early visual cortex and the fusiform face area (for fits to other 294

Multiple models can explain cortical and perceptual face spaces
The early visual cortex was best explained by the Gabor--filter model, which beat the 296 alternative computational models (p<0.001 for all pairwise model comparisons, S4 297 Table) and came close to explaining all explainable variance given noise levels in the 298 data. Generalization performance for this model was slightly, but significantly better 299 (p=0.009, Fig 5b) when population averaging was enabled. As a control, we also tested 300 raw pixel intensities as the representational units. We found no significant difference in 301  Table), suggesting that these models were able to 307 explain the variance in the dataset that was consistent between participants. We also 308 observed comparable generalization performance for the Gaussian exemplar model. 309 Measurement--level population averaging improved generalization performance in the 310 fusiform face area for both the Gabor--filter (p<0.001) and sigmoidal--ramp--tuning 311 models (p=0.032), but did not improve either of the exemplar models (p=0.263 for 312 Gaussian exemplar, p=0.096 for negative Gaussian exemplar). In summary, the 313 representation in the fusiform face area could be explained by multiple models, and in 314 most cases measurement--level population averaging improved the quality of the fit. 315 316 and Gabor--filter, but not Gaussian--exemplar models 317 Multiple computational models provided qualitatively similar fits to our cortical data 318 at the distance--matrix level. However, we might still be able to adjudicate between them 319 at the level of regional--mean activation profiles. To this end, we obtained activation--320 profile predictions from each model by averaging over all model units. We then 321 correlated the predicted population--mean activation profile with the regional--mean 322 fMRI activation profile for each participant and performed an activation profile 323 similarity analysis analogously to the distance matrix similarity analysis above. We 324 found that the activation profiles from the computational models were predictive of 325 cortical activation profiles, even though these models were fitted to distance matrices 326 rather than to regional--mean fMRI responses (Fig 6, for other regions see S6 Fig). In 327 particular, the sigmoidal--ramp and Gabor--filter models both predicted increasing 328 population--mean responses with face--space eccentricity, while the Gaussian--exemplar 329 model predicted decreasing responses with eccentricity (S5 Table, S6 Table for pairwise  330 comparisons). This constitutes evidence against the Gaussian--exemplar model, under 331 the assumption that neuronal activity is positively associated with regional--mean fMRI 332 response in visual regions (29-33). The preference for faces closer to the PCA--space 333 origin (sub--caricatures) in the Gaussian--exemplar model arises as a necessary 334 consequence of the Gaussian distribution of preferred faces, which is centered on the 335

Regional--mean activation profiles are consistent with sigmoidal--ramp
average face. We also tested a Gaussian exemplar--tuning model with an inverted 336 Gaussian distribution of preferred faces. In this model, more units prefer faces far from 337 the norm (caricatures) than faces close to the norm (sub--caricatures). However, the 338 inverted--Gaussian exemplar model's generalization performance was considerably 339 worse than the standard--Gaussian exemplar model's (Fig 5). 340 In summary, exemplar models accurately predicted representational distances for 341 cortical face spaces when the preferred--face distribution was Gaussian, but such 342 distributions led to inaccurate predictions of regional--mean fMRI activation profiles. 343 Thus, analysis of regional--mean fMRI responses enabled us to adjudicate between 344 models that made similar predictions at the distance--matrix level, and specifically 345 indicated that the Gaussian exemplar model is unlikely to be the correct model for 346 cortical face--space representation, despite a good fit at the distance--matrix level. 347

Measurement--level population averaging is necessary to account for
349 symmetric--view tolerance and over--representation of eccentricity 350 We found that models that included measurement--level population averaging 351 generally outperformed models that did not. This advantage appeared to originate in 352 how models with population averaging captured two effects in the cortical face spaces: 353 symmetric view--tolerance and over--representation of face--space eccentricity relative to 354 direction. 355   Exemplar--coding models exhibited relatively lower generalization performance, or 407 made inaccurate predictions for regional--mean fMRI activation profiles. Importantly, the 408 advantage for both the sigmoidal--ramp and Gabor--filter model depended on the 409 measurement--level population averaging mechanism. The key contribution of our 410 modeling effort is to narrow the set of plausible representational models for the 411 fusiform face area to two models that can explain both representational distances and 412 the regional--mean activation profile. report are driven at least in part by a general mechanism for object individuation in 432 visual cortex rather than the engagement of specialized processing for face recognition. 433 In summary, direct interpretation of regional--mean fMRI activations in terms of 487 neuronal tuning can be misleading when the underlying neuronal populations are 488 heterogeneous. 489 (45). Intuitively, the measurement channels re--describe the space with randomly 503 oriented axes (without any directional bias). However, fMRI voxels are better 504 understood as taking local non--negative weighted averages of neuronal activity, since 505 the association between neural responses and fMRI response is generally thought to be 506 positive. In such representational spaces the axes have orientations that are biased to 507 fall along the all--1 vector. In practice, this measurement model assumes that the fMRI 508 distance matrix for a given region of interest will over--represent distances to the extent 509 that those distances modulate the neuronal population--average response. 510

Modeling of measurement--level population averaging is important for
Here we approximated measurement effects for models with nonlinear parameters 511 by mixing the population average into the predicted representational feature space. 512 Despite the simplicity of this method, our noise--ceiling estimates indicate that the 513 winning models captured nearly all the explainable variance in the current dataset. For 514 model representations without nonlinear parameters, this measurement model can be 515 implemented more easily by linearly combining the model's original distance matrix and 516 the distance matrix obtained for the population average dimension of the space (using 517 squared Euclidean distances estimates, see also 46-48). Thus, the measurement--level 518 population averaging mechanism we propose here is widely and easily applicable to any 519 case where a computational model is compared to neuroimaging data at the distance--520 matrix level. 521 The distance--matrix effects of local pooling of neuronal responses in fMRI voxels is 522 correctly accounted for by our measurement model under the assumption that neurons 523 are randomly intermixed in cortex (i.e. voxels sample random subsets of neurons). This 524 simplifying assumption is problematic for early visual areas, where there is a well--525 established retinotopic organization with a strong contralateral response preference. 526 For the retinotopic Gabor filter model, we therefore added a hemifield--specific pooling 527 stage, which helped account for strong view--sensitivity in occipital regions of interest. 528 Accounting for measurement effects in the presence of topographic organization is likely 529 to prove more challenging for naturalistic stimulus sets. One solution is to account for 530 local averaging in fMRI voxels by local averaging of the model's internal 531 representational map (45). This local--pooling approach can be thought of as providing a 532 further constraint on the comparison between model and data, because smoothing the 533 model representation is only expected to improve the fit if the model response 534 topography resembles the cortical topography. This may provide a means of 535 adjudicating between topographically organized models, even when the models predict 536 similar distance matrices in the absence of measurement--effect modeling. In summary, 537 the global population average is a special dimension of the representational space, 538 which is overrepresented in voxels that pool random subsets of neurons. This effect 539 accounts for much variance in the representational distances in the current study, and is 540 easy to model. Modeling the overrepresentation of the global average, as we did here, is 541 suitable for models that do not predict a spatial organization (e.g. PCA--space coding 542 models). For models that do predict a spatial organization, it may be more appropriate 543 to simulate fMRI voxels by local averaging of the model's representational map (45). 544 545 This study demonstrates the importance of considering multiple alternative models 546 to guide progress in computational neuroimaging. In particular, the finding that 547 practically every model we evaluated exhibited significantly greater--than--zero 548 generalization performance strongly suggests how studies that only evaluate a limited 549 set of candidate models can arrive at misleading conclusions (see e.g. 49). 550

Model comparison is essential for computational neuroimaging
Representational similarity analysis has two key advantages for model comparison 551 relative to alternative approaches. First, competing model predictions can be easily 552 visualized at the level of the best--fitting distance matrix for a given cortical region (e.g., 553 between fMRI responses and psychophysical similarity judgments. Such comparisons 560 are challenging when the data is modeled at a lower level, because modality--specific 561 parameters must be added to each model (e.g., parameters controlling the 562 hemodynamic response function for fMRI, decision--threshold parameters for behavioral 563 judgments). The presence of these non--shared parameters makes it difficult to attribute 564 any apparent modality differences to the data rather than to the model specification. In 565 summary, RSA is a particularly attractive analysis approach for studies that emphasize 566 model comparison. 567 Although the central goal of model comparison is to select the best account of the 568 data, the finding that some models are not dissociable under the current experimental 569 context also has important implications for the design of future studies. Here we 570 demonstrated that Gaussian--distributed exemplar--coding models are less likely to 571 account for human face coding, while accounts based on sigmoidal ramp tuning and 572 Gabor filter outputs perform very similarly. This suggests the need to design stimulus 573 sets that generate distinct predictions from these winning models. For example, 574 presenting face stimuli on naturalistic textured backgrounds may be sufficient to 575 adjudicate between the two models, because the Gabor--filter model lacks a mechanism 576 for figure--ground separation. In conclusion, our study exemplifies the need to test and 577 compare multiple models and suggests routes by which the sigmoidal--ramp--tuning 578 model of face--space coding could be further evaluated. 579 Five additional participants participated in data collection up to the first MRI data 590 recording day, but were not invited to complete the study due to difficulties with 591 vigilance, fixation stability, claustrophobia and/or head movements inside the scanner. 592

Materials and Methods
The analyses reported here include all complete datasets that were collected for the 593 study. 594 Data and software availability 595 The distance matrices we estimated for cortical and perceptual face spaces are 596 available, along with software to re--generate all computational model fits (separate 597 copies deposited on https://osf.io/5g9rv; https://doi.org/10.5281/zenodo.242666). 598 Raw fMRI data will also be available on openfmri.org in the near future. 599 Sampling the reference PCA face space 600 We generated faces using a norm--based model of 3D face shape and texture, which 601 has been described in detail previously (15,25). Briefly, the model comprises two PCA 602 solutions (each trained on 200 faces), one based on 3D shape estimated from laser scans 603 and another based on texture estimated from digital photographs. The components of 604 each PCA solution are considered dimensions in a space that describes natural variation 605 in facial appearance. All stimulus generation was performed using the PCA solution 606 offered by previous investigators, and no further fitting was performed for this study. 607 We yoked the shape and texture solutions in all subsequent analyses since we did not 608 have distinct hypotheses for these. 609 We developed a method for sampling faces from the reference PCA space in a 610 manner that would maximize dissimilarity variance. This is related to the concept of 611 design efficiency in univariate general linear modeling (50), and involves maximizing 612 the variance of hypothesized distances over the stimulus set. Because randomly 613 sampled distances in high dimensional spaces tend to fall in a narrow range of distances 614 relative to the norm (51), we reduced each participant's effective PCA space to 2D by 615 specifying a plane which was centered on the norm of the space and extended at a 616 random orientation. The face exemplars constituted a polar grid on this plane, with 4 617 directions at 60 degrees separation and 3 eccentricity levels (scaled at 30%, 100% and 618 170% of the mean eccentricity in the training face set). The resulting half--circle grid on a 619 plane through the high--dimensional space is adequate for addressing our hypotheses 620 concerning the relative role of direction and eccentricity coding under the assumption 621 that the high--dimensional space is isotropic. The use of a half--circle also serves to 622 address a potential concern that apparent eccentricity sensitivity might arise as a 623 consequence of adaptation to the experimental stimuli (52). Such adaptation effects are 624 only collinear with eccentricity (ie, prototype) coding if the prototype is located at the 625 average position over the experienced stimulus images. For our stimulus space, this 626 average position would fall approximately between sub--and typical faces and between 627 the second and third direction in the PCA space. Contrary to an adaptation account, 628 there was no suggestion in our data that cortical distances were exaggerated as a 629 function of distance along this axis. The orientation of the PCA--space slice was 630 randomized between participants and model fits were based on cross--validation over 631 participants. Under these conditions, any non--isotropicity is only expected to impair 632 generalization performance. In preliminary tests we observed that this method yielded 633 substantially greater dissimilarity variance estimates than methods based on Gaussian 634 or uniform sampling of the space. 635 Face animation preparation 636 We used Matlab software to generate a 3D face mesh for each exemplar. This mesh 637 was rendered at each of the orientations of interest in the study in a manner that 638 centered the axis of rotation on the bridge of the nose for each face. This procedure 639 ensured that the eye region remained centered on the fixation point throughout each 640 animation in order to discourage eye movements. Renders were performed at sufficient 641 increments to enable 24 frames per second temporal resolution in the resulting 642 animations. Frames were converted to gray--scale and cropped with a feathered oval 643 aperture to standardize the outline of each face and to remove high--contrast mesh edges 644 from the stimulus set. Finally, we performed a frame--by--frame histogram equalization 645 procedure where the average histogram for each frame was imposed on each individual 646 face. Thus, the histogram was allowed to vary across time but not across faces. Note that 647 histogram matching implies that the animations also have identical mean gray--scale 648 intensity and root--mean--square contrast. 649 A potential concern with these matching procedures is that they could affect the 650 validity of the comparison to the reference PCA space. However, we found that the 651 opposite appeared to be true: distances in the reference PCA space were more 652 predictive of pixelwise correlation distances in the matched images than in the original 653 images. Thus, the matching procedure did not remove features that were encoded in the 654 PCA space and may in fact have acted to emphasize such features. 655 656 We used a pair--of--pairs task to characterize perceptual similarity (S1 Movie). 657

Perceptual similarity judgment experiment
Participants were presented with two vertically offset pairs of faces on a standard LCD 658 monitor under free viewing conditions, and judged which pair was relatively more 659 dissimilar with a button press on a USB keyboard (two--alternative force choice). Each with GRAPPA acceleration (acceleration factor 2 x 2, 40 x 40 PE lines). Each participant's 707 functional dataset (7376 volumes over 8 scanner runs for the main experiment) was 708 converted to NIFTI format and realigned to the mean of the first recording day's first 709 experimental run using standard functionality in SPM8 710 (fil.ion.ucl.ac.uk/spm/software/spm8/). A structural T1--weighted volume was collected 711 in the first recording day using a multi--echo MPRAGE sequence (1mm isotropic 712 voxels)(54). The structural image was de--noised using previously described methods 713 (55), and the realigned functional dataset's header was co--registered with the header of 714 the structural volume using SPM8 functionality. The structural image was then skull--715 stripped using the FSL brain extraction tool (fmrib.ox.ac.uk/fsl), and a re--sliced version 716 of the resulting brain mask was applied to the fMRI dataset to remove artifacts from 717 non--brain tissue. We constructed design matrices for each experimental run by 718 convolving the onsets of experimental events with the SPM8 canonical hemodynamic 719 response function. Slow temporal drifts in MR signal were removed by projecting out 720 the contribution of a set of nuisance trend regressors (polynomials of degrees 0--4) from 721 the design matrix and the fMRI data in each run. 722 723 We estimated the neural discriminability of each face pair for each region of interest 724 using a cross--validated version of the Mahalanobis distance (56). This analysis improves 725 on the related Fisher's linear discriminant classifier by providing a continuous metric of 726 discriminability without ceiling effects. Similarly to the linear discriminant, classifier 727 weights were estimated as the contrast between each condition pair multiplied by the 728 inverse of the covariance matrix of the residual time courses, which was estimated using 729 a sparse prior (57). This discriminant was estimated separately for the concatenated 730 design matrix and fMRI data in each possible leave--one--out split of the experimental 731 runs, and the resulting weights were transformed to unit length and projected onto the 732 contrast estimates from each training split's corresponding test run (16 estimates per 733 contrast). The 16 run--specific distance estimates were averaged to obtain the final 734 neural discriminability estimate for that participant and region. When the same data is 735 used to estimate the discriminant and evaluate its performance, this algorithm returns 736 the Mahalanobis distance, provided that a full rather than sparse covariance estimator is 737 used (56). However, unlike a true distance measure, the cross--validated version that we 738 use here is centered on 0 under the null hypothesis. This motivates summary--statistic 739 random--effects inference for above--chance performance using conventional T tests. 740

Cross--validated discriminant analysis
We developed a variant of this discriminant analysis where effects that might be 741 broadly described as region--mean--related are removed (S4 Fig). This control analysis 742 involved two modifications to how contrasts were calculated at the level of forming the 743 discriminant and at the level of evaluating the discriminant on independent data. First, 744 each parameter estimate was set to a mean of zero in order to remove any additive 745 offsets in response levels between the conditions. Second, for each pair of mean--746 subtracted parameter estimates, the linear contribution of the mean estimate over the 747 pair was removed from each estimate before calculating the contrast. This corrects for 748 the case where a single response pattern is multiplicatively scaled by the conditions. The 749 resulting control analysis is insensitive to effects driven by additive or multiplicative 750 scaling offsets between the conditions. 751 752 We used a multiple regression model to estimate the relative contribution of 753 eccentricity and direction to cortical and perceptual face--space representations. 754

Multiple regression RSA
Multiple regression fits to distance estimates can be performed after a square transform, 755 since squared distances sum according to the Pythagorean theorem. We partitioned the 756 squared distances in the reference PCA space into variance associated with eccentricity 757 changes by creating a distance matrix where each entry reflected the minimum distance 758 for its eccentricity group in the squared reference PCA matrix (that is, cases along the 759 group's diagonal where there was no direction change). The direction matrix was then 760 constructed as the difference between the squared reference PCA matrix and the 761 eccentricity matrix (Fig 2a) 774 We used a conventional block--based functional localizer experiment to identify 775 category--selective and visually--responsive regions of interest in human visual cortex. 776

Functional regions of interest
Participants fixated a central cross on the screen while blocks of full--color images were 777 presented (36 images per block presented with 222ms on, 222ms off, 16 s fixation). 778 Participants were instructed to respond to exact image repetitions within the block. Gabor filter model 828 The Gabor filter model is an adaptation of a neuroscientifically--inspired model that 829 has previously been used to successfully predict single--voxel responses in the early 830 visual cortex (24, github.com/kendrickkay/knkutils/tree/master/imageprocessing). 831 The model is composed of 5 banks of Gabor filters varying in spatial position, phase (2 832 values) and orientation (8 directions, Fig 4). We measured each filter's response to the 833 last frame of each animation, and corrected for phase shifts by collapsing the two signed 834 phase value filter outputs into a single non--negative estimate of contrast energy 835 (specifically, the square root of the sum over the two squared phase values). This is a 836 standard processing stage in the Gabor filter model (24). The resulting rectified 837 response vectors were weighted according to filter bank membership (5 free 838 parameters). We estimated measurement--level population averaging using two pooling 839 stages: a hemifield--specific pool, where filters were pooled according to whether their 840 centers fell left or right of the vertical meridian, followed by a global pool. 841 Pixelwise correlation predictor 842 We used a fixed control predictor to estimate whether coding based on pixelwise 843 features would produce the same face--space warping we observed in our data (Fig 4). 844 The pixelwise correlation predictor was generated by stacking all the pixels in each of 845 the face animations into vectors and estimating the correlation distance between these 846 intensity values. 847 Estimating the noise ceiling 848 We estimated the noise ceiling for Z--transformed Pearson correlation coefficients 849 based on methods described previously (56). This method estimates the explained 850 variance that is expected for the true model given noise levels in the data. Although the 851 true noise level of the data cannot be estimated, it is possible to approximate its upper 852 and lower bounds in order to produce a range within which the true noise ceiling is 853 expected to reside. The lower bound estimate is obtained by a leave--one--participant--out 854 cross--validation procedure where the mean distance estimates of the training split are 855 correlated against the left--out--participant's distances, while the upper bound is obtained 856 by performing the same procedure without splitting the data. These estimates were 857 visualized as a shaded region in figures after reversing the Z--transform (Fig 4). 858 859 All statistical inference was performed using T--tests at the group--average level 860 (N=10 in all cases except the occipital face area and transverse occipital sulcus, N=9). 861

Statistical inference
Correlation coefficients were Z--transformed prior to statistical testing. Average Z 862 statistics were reverse--transformed before visualization for illustrative purposes. 863 Fold--wise generalization performance estimates are partially dependent, which can 864 lead to sample variance underestimates (58,59) and greater than intended false positive 865 rates when conventional parametric statistics are used. However, we simulated the 866 effects of this potential bias and found no consistent inflation in false--positive rates for 867 simulations of the parameters used in the current study (S1 Code, S9 Fig, S10 Fig). Thus, 868 the inferential statistics reported in the current study appear to be robust to this slight 869 dependence and maintain their intended frequentist properties. 870 performance for two arbitrary models, using methods that closely matched the ones 1115 described in this manuscript (for details, see S1 Code). We estimated potential bias by 1116 comparing cross--validated generalization performance (panels in leftmost column) with 1117 generalization to a withheld validation set (left column, subtraction in right column). 1118 The probability of rejecting the null hypothesis (p<0.05, T test) over 100000 simulations 1119 is plotted for tests of either model against zero (one--tailed test, first two rows of panels), 1120 and of zero difference between the models' generalization performance (two--tailed test,