Information-theoretical analysis of the neural code for decoupled face representation

Processing faces accurately and efficiently is a key capability of humans and other animals that engage in sophisticated social tasks. Recent studies reported a decoupled coding for faces in the primate inferotemporal cortex, with two separate neural populations coding for the geometric position of (texture-free) facial landmarks and for the image texture at fixed landmark positions, respectively. Here, we formally assess the efficiency of this decoupled coding by appealing to the information-theoretic notion of description length, which quantifies the amount of information that is saved when encoding novel facial images, with a given precision. We show that despite decoupled coding describes the facial images in terms of two sets of principal components (of landmark shape and image texture), it is more efficient (i.e., yields more information compression) than the encoding in terms of the image principal components only, which corresponds to the widely used eigenface method. The advantage of decoupled coding over eigenface coding increases with image resolution and is especially prominent when coding variants of training set images that only differ in facial expressions. Moreover, we demonstrate that decoupled coding entails better performance in three different tasks: the representation of facial images, the (daydream) sampling of novel facial images, and the recognition of facial identities and gender. In summary, our study provides a first principle perspective on the efficiency and accuracy of the decoupled coding of facial stimuli reported in the primate inferotemporal cortex.


Introduction
Recognizing faces and facial expressions with high accuracy is central for many cognitive and social tasks that primates (and possibly other animals) perform every day [1].Several studies reported single neurons in the ventral visual stream -and particularly in the so-called "face patches" of the inferotemporal (IT) cortex -that are exquisitely sensitives to faces [2,3].
While the above studies assess that decoupling information is a key ingredient of facial processing in primates, it is still unclear why this is the case.A plausible formal rationale for the decoupling of shape and texture parameters (as done in the AAM and related models) is that they might vary independently in real life conditions.For example, small variations in facial expressions entail a significant change of shape but not texture, whereas different conditions of luminosity and age may induce significant variations in texture but not shape [4].This line of reasoning leads to the untested idea that decoupled coding entails not just a more accurate but also a more efficient (or compact) description of facial data.Indeed, various normative principles have been proposed that characterize the efficient coding of data from a source in terms of information demands [11,12,13] (see references in [14]).From an information theoretic perspective, a formal measure of code efficiency is its description length: the best model is the one that minimizes the amount of information (bits) required to encode both the data, in terms of the model's latent variables, and the model parameters themselves [15,16,17].This implies that a more complex model, which has more free parameters and requires more memory to be encoded, will only outperform a simpler model if it affords significantly more data compression -which in turn requires that it captures well the statistical structure of the data.
Here we use the notion of efficient coding to ask whether, why, and in which conditions the neural code for face representation found in monkey IT neurons, which is based on texture-shape decoupling (R D coding), is more efficient than a simpler description in terms of principal components of facial images, without texture-shape decoupling (eigenface coding R E ).For this, we compare the description length of (the principal components of) the two elements of R D coding -namely, shape-free texture and shape coordinates -with the description length of (the principal components of) the original facial images, using the same stimuli dataset as in the monkey study of [4].To preview our results, we show that the neural code based on texture-shape decoupling (R D ) is more efficient than the eigenface coding (R E ).This is because storing the principal components of few (significant) landmark coordinates comes at the cost of little extra information, but it confers the advantage of uniforming (shape-free) facial images.In turn, the uniformed facial images are significantly more correlated than the original set of facial images and can be described using fewer principal components -hence yielding an overall positive information gain.In keeping, our results reveal that the advantage of decoupled coding increases with image resolution and when encoding variants of training set images that differ for facial expressions.This result is interesting, as it shows that the decoupled coding is most effective in a conditions that is frequent in social cognitive tasks, such as the identification of changes of expression or age in known faces.Finally, to further consolidate our findings, we show that decoupled coding leads to a higher efficiency in a range of cognitively relevant tasks, which include the daydream generation of novel faces, the synthesis of unknown faces, and the recognition of facial identities and gender.

Database
In our analysis, we use the FEI database [18,19], which was also used in the characterisation of the neural code of facial identity in macaques [4].The FEI database comprises N = 400 b/w pictures of dimension w max × h max = 250 × 300 pixels, accompanied by the spatial coordinates of n = 46 standard landmarks for each image.

Texture and shape coordinates
Let the training set consist of N tr facial images, I = {I(n)} Ntr n=1 , where I(n) is the n-th image, and of N tr vectors of shape coordinates L = { (n)} Ntr n=1 , where (n) is the vector of shape coordinates characterising the geometry of the n-th facial image.All images are vectors I(n) = (I 1 (n), . . ., I dt (n)) of dimension d t = w × h, where w, h are the width and height of the images in pixels (grid spacing units).The -vector components are the x or y Cartesian coordinates of n representative landmarks of the n-th facial image: (n) = ( 1 (n), . . ., ds (n)), with d s = 2n .

Formal definitions of eigenface coding (R E ) and decoupled coding (R D )
We consider two alternative neural codes for facial images: eigenface (R E ) coding and decoupled (R D ) coding; see Table 1 and Figure  x i = 0.This representation does not make use of the shape coordinates.
2. Decoupled coding (R D ) represents the image in terms of two sets of PCs, one for shape and one for texture facial coordinates.To obtain these coordinates, each original image I(n) in the training set is first deformed by means of image-deformation algorithms (see [20,21,22] and the Supporting information for details), in such a way that its landmark coordinates (n) are dragged to the average position of the landmark coordinates in the training-dataset, and that the rest of the image pixels are deformed coherently (so that the resulting facial image is as much realistic as possible).The resulting image will be called the uniformed image Î(n) (see figure 1).We refer to uniformed texture coordinates, or simply texture coordinates, as the uniformed (shape-free) image coordinates Î, of an image I given (and the average position of the landmarks = 0).This procedure permits decoupling the original x 9 g j p Z 3 < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " F Z M / 2 1 K Z r Y j g 5 / P y / p 4 k X 3 l a m I M = " > A dataset in two datasets of coordinates: the (texture-free) shape coordinates L and the (shape-free) The novel image I to be represented is then decomposed in PCs in texture and shape spaces separately, p are the eigenvectors of Code R E (I, )

Description length analysis 2.4.1. Intuition behind the description length analysis
The Principal Component Analysis representation of a given set of coordinates in the face space with p principal components (p-PCA) can be viewed both as a generative model, inducing a Gaussian distribution over facial coordinates, and as a form of data compression and dimensionality reduction [23].These two aspects are naturally linked by the notion of description length [17].PCA is a form of dimensionality reduction, since it describes each d-dimensional vector x as a shorter, p-dimensional vector x p = E p • x.In turn, this implies a compression ability.Consider a dataset in which each coordinate x i (say, each pixel value, if the vectors x are images) varies uniformly in a range R. In the absence of any prior knowledge regarding the dataset content, the amount of information per sample and coordinate needed to store the raw dataset with precision per coordinate is simply l 0 = log 2 (R/ ) bits.Normally, the information needed to store the p principal components of each vector of the dataset D = {x p (s)} N s=1 is lower than l 0 , even if p = d.Indeed, if the dataset exhibits significant pairwise correlations between couples of variables, many principal components will exhibit a variance λ i lower than the average variance R 2 /12, and they will consequently require fewer bits to be stored.
The amount of information necessary to encode a dataset D in terms of (the latent variables of) a probabilistic model M of the dataset vectors is called description length, L M (D).Crucially, the description length is formally related to the Bayesian data evidence, or joint marginal likelihood of the dataset D according to the model M in the following way: where P M (D) is the data evidence according to M and is the precision per coordinate with which the dataset should be described.Description length is therefore equivalent to -and provides an information-theoretic interpretation of -Bayesian model evidence.The value of p for which the dataset presents a higher Bayesian evidence is the one presenting an optimal accuracy/complexity trade-off and, consequently, the one presenting a lower description length.
In other words, description length analysis evaluates the efficiency of a particular code, taking into account both its accuracy and its complexity.In this perspective, a good code is the one that does not employ too much information to describe a given input with a given tolerance.Indeed, the model that presents lower description length at fixed precision, is also the one that manages to describe the dataset with a smaller error, log 2 = − log 2 P M (D) − L, when the amount L of available storing information is fixed.
In the case that we study here, the model M is p-PCA and the explicit expression for P (D) is interpretable [17].The description length may be decomposed in two terms, L(D) = S(D|θ * )+O(θ * ), that we will call the empirical entropy S(D|θ * ) and the Occam length O(θ * ). 2 These two terms may be interpreted as the amount of information needed to encode (without losses 3 ) the dataset D in terms of p principal components, and the model parameters θ * once for all the vectors (that are needed to recover each vector x from its principal components x ), respectively.When increasing the number of model parameters p, the empirical entropy of the training dataset decreases, but the Occam length generally increases, since more eigenvectors E p must be stored -and they must be stored with a higher precision.Overfitting occurs when this balance is no longer worth, and the description length increases for increasing p. 3 Crucially, the p-PCA model induces a (Gaussian) probabilistic model defined on the d-dimensional linear space of the data also when p < d.The d − p-dimensional subspace not expanded by the p fitted empirical eigenvectors (corresponding to the p largest eigenvalues λ j ) is described with a constant, degenerated noise eigenvalue λ.As a consequence, despite the PCA representation of vectors in terms of p principal components is a lossy representation for p < d, the induced likelihood and empirical entropy S terms in the description length of each vector do take into account the losses.

Definition of the information gap criterion for the comparison of decoupled coding and eigenface coding
We measured the description length (in bits) of the alternative coding schemes R E and R D of facial images I that belong to a set of known images that have been used to train the model (training set) and to a set of unknown images that have not been used to train the model (test set).
Eigenface coding (R E ) encodes the original images I in terms of their principal components.We denote the description length associated to the compression of a dataset I according to R E as L Itr,p (I).The two sub-indices of L specify the model; they are, respectively, the training set with which the model parameters have been trained 4 , and the value of p.This information completely determines the p-PCA model.
Decoupled coding R D describes each facial image I in terms of the principal components of (shapefree) uniformed texture and (texture-free) landmark coordinates Î , .This decoupling is motivated by the hypothesis that shape-free uniformed images can be compressed more easily compared to the original images and their unknown variants; or in other words, that there is a parsimonious description of the dataset of uniformed faces Î, when represented in terms of their principal components Î .However, since decoupled coding (R D ) exploits both the the texture and the shape coordinates ( and Î) of each facial vector, it has to store both sets of principal components ( and Î ) to represent the original image.Moreover, it has to store the principal axes in both the space of texture and shape coordinates 5 .
The key question addressed here is whether the extra information cost required to store shape coordinates might be compensated by the smaller cost to store the uniformed texture principal components Î .In principle, the uniformed set of images Î might be compressed more easily, given that the inhomogeneities induced by the difference in landmark positions have been removed from the dataset -at least, if the resolution of the image is large enough.This implies that encoding the uniformed images could in principle require a smaller number of PCs without loss of precision, with respect to the set of raw images.
To quantify the difference in description length between uniformed texture coordinates, non-uniformed texture coordinates, and shape coordinates, we define a summary measure that we call an information gap and which jointly considers two factors.The former factor (G 1 ), the texture information gap, accounts for the difference in the description lengths of the non-uniformed and uniformed image datasets: 4 PCA induces a multivariate normal distribution whose average vector µ is the unbiased estimator of this quantity in the dataset Itr, and whose covariance matrix C shares the p largest-eigenvalues and corresponding eigenvectors with the sample covariance matrix of the set Itr (see the details in the Supplementary information). 5Eigenface coding is usually understood as a representation in terms of non-local principal components.Please note that it would where Î is the dataset composed by the uniformed facial images in I.Note that in both the description length terms, the model M is assumed to be p-PCA.In these equations, p may be taken as the optimal value according to Bayesian model selection, i.e., the value (say, p * ) for which the description length of I is minimum, and the same for p * .
The latter factor (G 2 ) is the description length of the set of shape coordinates L = { (n)} n : where L is the set of landmarks corresponding to the images I, and L tr to those in I tr .
The information gap combines these two factors (G = G 1 −G 2 ) and measures the efficiency (in informationtheoretic terms) of decoupled coding R D compared to eigenface coding R E : Decoupled coding can be considered more efficient if the information gap G is greater than zero.In other words, given a dataset of facial images, the representation of R D is more efficient than that of R E to the extent that it provides a more accurate description of the dataset, using the same amount of available information (see also the Supporting information).
Note that in practice, our comparison boils down to computing and comparing the description length of different datasets: Î, I, L (the first one being computed from the last two), in terms of their principal components, i.e., using the same statistical model (p-PCA) for all the three datasets, but with different covariance matrices (inferred from Îtr , I tr , L tr ) and values of p.We can therefore assess the efficiency of decoupled coding R D by considering the difference in description length between I and Î -and whether it compensates for the description length of L6 .

Precision of shape and texture coordinates and estimation of the description length
For the calculation of the description length L Dtr,p (D) we exploit the analytical solution of the Bayesian evidence of a multivariate normal distribution [24].Note that in the case of texture coordinates, that are strongly undersampled N d t , it is essential to use such (asymptotically) exact expression, instead of its more common Bayesian Information Criterion approximation [17,25], see the Supporting information.The training dataset and the number of principal components completely define the parameters of the inferred normal distribution M = (D tr , p), whose Bayesian evidence P M (D) can be estimated analytically (see [24] and the formulae in the Supplementary information).The estimation of the description length depends, as said before, on a precision per coordinate with which the data should be described by the probabilistic The second term in this equation is equivalent to the factor transforming the differential entropy of a continuous probability distribution into a genuine entropy [26,27,28] once the continuous variables have been discretised in bins of of size .
In the case of texture coordinates, in which the vectors are images and the vector components I i are 8-bit grayscale values in the range (0, 255), the natural cutoff value is t = 1.In the case of shape coordinates, the coordinates i are grid integers varying in a range [1, h], so that the natural choice should be s = 1 for the largest resolution h max , and h max /h for lower resolutions h < h max (since we change the coordinates resolution by scaling the largest-resolution coordinates as i → (h/h max ) i ).Scaling the precision in this way, the empirical entropies of shape coordinates do not depend on the resolution (see also Likelihood and evidence of shape coordinates in the Supplementary information).We actually intentionally underestimate the shape coordinates' cutoff and set s = 0.1(h max /h), so that we overestimate the description length of shape coordinates, in order to present (see the the next section) a conservative estimation of the precision range in which the Decoupled coding prevails.

Information gap for known facial images in the training set, at different resolutions
In this section, we analyse how the efficiency of the R D coding varies as a function of the resolution of the dataset images.We expect that the information gap increases with the resolution.If the resolution is so low that the distance between pixels (normalised to the image height, h −1 ), is of the same order of the typical deviation of landmark coordinates from their average, 2 i 1/2 , the uniformation will not have an effect and consequently the R D code may not be worth in terms of description length.In the opposite situation, h −1 , we expect a larger information gap.
To test this hypothesis, we calculate p * for every kind of coordinate and resolution, as the minimum of the L p curves.p * results to be lower than N in the three kinds of coordinates (shape, non-uniformed texture, uniformed texture).The description length of the image dataset is slightly over-linear in the number of pixels d t , as shown by the lack of superposition of the curves in figure 2. Indeed, the largest images actually contain more information per pixel: this is the information that, according to the p-PCA model, has been lost when lowering their resolution to construct the lower-resolution datasets.
As a reference value for the analysis of description length curves, it is useful to compare the values in the figure with the uniform length l 0 /(d t N ), or the minimum amount of information per sample and pixel that would take to store a dataset consisting in images whose pixels fluctuate independenly around their average value in an interval of length R, being R such that the variance per pixel is equal to the empirical average variance vt of the dataset I (roughly equal to 37 units per pixel out of 256 in 8-bit grayscale encoding). 7A significant observation is that the standard deviation per pixel, v1/2 t , of the order of 37 units, is not significantly smaller in the dataset of uniformed images Î.This means that uniformed images are more easily compressible, not simply because the dataset is less-varying, or more homogeneous, but because of the presence of stronger pairwise correlations between pixels in the uniformed images.Stronger correlations induce a more inhomogeneous spectrum of C (t) , say λ p and, consequently, a lower empirical entropy S of 7 In other words, if one assumes that the pixel values are uniformly distributed around their average in a dt-dimensional hypercube of size R = (12vt) 1/2 , then l 0 /dN = (1/2) log 2 (12vt) − log 2 .This value is very close to the empirical entropy of the dataset corresponding to a PCA model with p = 0 (see the Supporting information): L 0 = S 0 = (1/2){log 2 (2π) + 1 + log 2 (ṽ)} − log 2 , where log 2 ṽ = log 2 λ (see the proximity of L 0 and l 0 in figure 2).

Information gap for known facial images in the training set that show different facial expressions
In this analysis, we test the hypothesis proposed in the introduction that decoupled coding is particularly effective when encoding variants of known facial images which differ only in facial expressions.By definition, variations in facial expression are expected to change mainly shape coordinates, and much less texture coordinates (that are independent of the positions of the landmarks and hence nearly independent of the facial expression).The information gap should increase in this situation, since the texture coordinates of facial images differing in expression should be more redundant, correlated and easily compressed -or, in the language of probability, they should exhibit a larger likelihood.
To test this hypothesis, we computed the training-set information gap for two dataset of length N tr = 200: the former (called "neutral") consisting of neutral expression images of 200 different subjects and the latter (called "mixed") corresponding to both the neutral and the smiling portraits of the same 100 (randomly selected) subjects.The blue shadowed area in figure 3 indicates the difference in information gap between the "mixed" and the "neutral" training sets.While the information gap of the "mixed" dataset is indistinguishable from the "full" dataset of N = 400 images, the "neutral" dataset presents a lower information gap.
This analysis is consistent with our initial hypothesis.Notice that this result is not a trivial consequence of the fact that the "mixed" dataset (consisting in N = 200 portraits of the same 100 subjects) is more easily compressible than the "neutral" one (consisting in N = 200 portraits of 200 different subjects): indeed, for both uniformed and non-uniformed facial images, the description length of the "mixed" dataset is lower than that of the "neutral" dataset.What is less trivial is that the gap G is higher for mixed images.

Information gap for unknown facial images in the test set that show different facial expressions
Here, we perform a variant of the above analysis aimed to test that the decoupling is particularly efficient when encoding unknown (not belonging to the training set) facial images that correspond to subjects that are present in the training set, with a different facial expression.
We have already seen that the training-set of uniformed images exhibits lower description length (and empirical entropy) than the training set of raw dataset images.It is hence reasonable to suppose that decoding does not only reduce the bias error (of the training-set) but also the variance error in the description of unknown facial images, belonging to a test-set. 8  To test this hypothesis, we calculated the information gap in a test-set Figure 4 reveals that the information gap of the test-set is significantly higher compared to the information gap of the training-set, with a p-value lower than 10 −4 (notice the small errorbars of the training-set information gap in figure 3).The increment in information gap per sample (roughly 1/6 of the test-set gap) corresponds to the bits that one saves using the decoupled coding for unknown smiling faces, not belonging to the training-set.This implies, as expected, that the R D coding reduces both the bias and the variance errors for variants of known faces differing in facial expression only.R D is, hence, particularly efficient, in terms of information, to encode facial images of known subjects, differing in facial expressions.
It is interesting to compare this result with the texture information gap of a different test-set, which we call "non-overlapping", in which the single folds are composed by N/K = 80 facial images corresponding to 40 subjects with both smiling and non-smiling expression.The non-overlapping test-set contains, in this way, images of subjects that are not present in the training-set (so that test-and training-sets contain information regarding different subject identities).Figure 4 shows how the texture information gap G 1 of the nonoverlapping dataset is even lower than the training-set gaps.This result shows that the decoupling code R D is less efficient to encode unknown facial images corresponding to unknown subjects.Furthermore, this results shows that while uniforming variants of known faces that differ only in facial expression implies a reduction of both the bias and the variance terms of the entropy (see footnote 8), uniforming facial images of unknown individuals leads to a reduction of the bias term but to a positive increment of the variance term.In any case, we stress that the texture information gap G 1 is still larger than G 2 for the non-overlapping test-set.
Consequently, the decoupling code is more efficient even when processing unknown-identity facial images, 8 In the language of probability, we have seen before that the uniformed images present stronger between-pixel correlations C ij while presenting a roughly equal total variance (or tr (C)).This is the reason for which, for uniformed faces, the training-set empirical entropy ( i ln λ i , up to a constant) is lower (hence the likelihood is higher).A lower test-set entropy would simply imply that also the term tr (Cte • C −1 tr /2) (the difference between test-and training-set entropies, up to a constant) is significantly lower.We will call bias and variance terms of the entropy to the terms i ln λ i and tr (Cte  albeit in this case the description length gap is lower.

Summary of the results of the description length analyses
In sum, our analysis shows that decoupled coding leads to a more efficient encoding of known facial images (i.e., in the training set) compared to eigenface coding, when the images are shown at high enough resolutions and in particular when they differ in facial expressions.Furthermore, our results show that the efficiency of the decoupled coding is magnified when the task consists in encoding unknown variants of known faces differing in facial expression.

Analysis of the performance of decoupled and eigenface coding in face processing tasks
So far, we have used the normative construct of description length to assess the efficiency of decoupled coding.Here, we ask how the normative advantage of decoupled coding translates into a better performance in facial processing tasks and what are exactly the advantages.For this, we compare the performance of eigenface and decoupled coding in three face processing simulations that help illustrate the most important differences between the coding schemes; namely, (1) sampling artificial facial images from the learned generative model, (2) recognizing facial identity, and (3) reconstructing unknown faces.Please see the Supporting information for a supplementary (gender classification) simulation.

Simulation 1: Sampling synthetic faces from the trained generative model
The generation of artificial faces is a widely used task in AI to demonstrate the quality of a learning algorithm or encoder.In this simulation, our goal is not to challenge the performance of mainstream machine learning approaches that use deep nets with millions of parameters [29,30,31,32], but rather to test the hypothesis that a very simple (20 degrees of freedom) linear model can generate realistic images, when it is based on decoupled coding.
Each PCA-based representation of the training set I induces a simple generative model of faces (see the Supporting information for details).In particular, R E induces a multivariate Gaussian distribution in the space of facial images.Rather, R D , induces two separate Gaussian distributions over uniformed texture and shape coordinates, respectively.It is possible to create synthetic facial images by sampling from the respective probability distributions of R E and R D .In the case of R D , after sampling from both probability distributions, it is necessary to de-uniformize the sampled uniformed texture coordinates given the sampled shape coordinates (see the Supporting information for details about the de-uniformation procedure).
Figure 5 shows example synthetic images created by sampling I and ( Î, ) from the models induced by R E and R D , respectively.In both cases, we used p = 20 degrees of freedom, randomly chosen among the first 40 principal components of each model. 9Please note that, the larger the value of p, the higher the dimension of (the vector space of) the sampled facial images.When small values of p are used, the generative models produce low-dimensional variations of the average face; this implies that the synthetic faces are realistic (free from artefacts) but very stereotyped, with low variability.Rather, using larger values of p is a more compelling task, since the generative models are free to produce faces with high variability -but at the same time it is harder for them to produce realistic faces that are free from artefacts.
Figure 5 permits appreciating that with a relatively high value p = 20, both eigenface coding (R E ) and decoupled coding (R D ) produce produce faces with high variability.However, only the faces produced by the latter are realistic and free from artefacts.This simulation therefore shows that a very simple linear model based on decoupled coding (but not on eigenface coding) can produce realistic and varied facial images.
Please note that for this comparison we consider a simple variant of the decoupling R D coding scheme, which we call concatenated coding, R c , consisting in the principal components of the concatenated set of texture and shape coordinates (see the Supporting information in section 5 for a formal description).The reason is that R c permits choosing a single number of principal components, p, in common with R E -which therefore permits comparing the two codes with the same number of parameters.The use of R D would require, instead, to choose separately p t , p s for a fixed p t + p s = p.Indeed, the results of figure 5 are qualitatively identical if one directly compares R E with R D , varying p t = p and fixing p s to its maximum value, = d s = 92).We remark that the concatenated coding is considered here and in the next subsection only for the sake of simplicity of the R E -R D comparison 10 . 9In particular, we sample 20 principal components x i from their respective distributions, where the index i may take the values 1, . . ., p = 40.The remaining 20 coordinates among the first 40 coordinates are set to zero. 10 Since dt ds, and since, as we mention in the Supplementary information, both coordinates are weakly correlated, the eigenvalues  See the main text for explanation.
d t = w × h) and on the number of landmarks n (determining the shape coordinate dimension d s = 2n ).
Using a larger number of landmarks will enhance the relative relevance of shape coordinates.Moreover, figure 6 reveals as well that the distance based on the concatenated code R c , which exploits the correlation between shape and uniformed texture coordinates, does not perform significantly better than that based on uniformed images.This is due to the fact that the images contain more information than the shape coordinates (since d t d s ), and that shape and texture coordinates are only weakly correlated (see the Supporting information).
The reader may find a discussion on the qualitative differences in the shape of the error rate curves of texture and shape coordinates in the Supplementary information.

Simulation 3: Reconstructing novel faces
The reconstruction task consists in representing novel facial images that do not belong to the training set, in terms of an expansion in p principal components only.In mathematical terms, if x is a facial image, or its shape coordinates, the reconstruction of x in terms of the first p principal axes is , where E p is the p × d matrix of the first p (row) eigenvectors. 13In the case of the R E code, we simply project the original image (x = I) onto the first p eigenfaces, In the case of R c , we perform this operation for both Îps and pt and, afterwards, we perform a de-uniformation (see the Suplementary material), leading to a non-uniformed reconstructed image ( Îps , pt ) → I. Rather, the border artifact is absent for R c , and the representation for high p is slightly more faithful.This is quantitatively illustrated in figure 8, which reports the Mahalanobis distance (see footnote 12) between the target facial image, t, and its reconstruction from p principal components, t p , according to the representations R E and R c .Crucially, in this analysis the target vectors are N te = 20 neutral, randomly chosen test-set images, not belonging to the training set from which the "reconstructing matrix" E † p • E p are calculated.Also crucially, the matrix C defining the Mahalanobis distance As for Simulation 1, the results of this subsection are qualitatively identical if one directly compares R E with R D , varying p t = p and fixing p s to its maximum value, = d s = 92).

Summary of the results of the face processing tasks
In sum, our analysis shows that a tiny model using decoupled coding and only 20 degrees of freedom can be sampled to produce realistic synthetic images, whereas sampling from a model using eigenface coding  produces less realistic faces with artefacts.Furthermore, decoupled coding greatly facilitates the recognition of familiar faces with novel facial expressions -especially thanks the fact that texture coordinates remain stable across different expressions.Finally, decoupled coding outperforms eigenface coding in the reconstruction of novel faces.Please see the Supporting information for an additional simulation of gender classification using the two coding schemes.

Discussion and Conclusions
Recent research in neuroscience claims that the neural coding for facial identity in the inferotemporal (IT) cortex of macaques [4,7] implements a decoupled coding scheme in which distinct subpopulations of neurons project facial images onto two distinct sets of axes, which encode the geometric shape of a face and its texture separately.From a computational perspective, decoupled coding affords accurate face processing, permitting the linear decoding of facial features from single cell responses; and it outperforms widely used schemes in vision research, such as eigenface coding [4,7,8].
In this article, we aimed to elucidate the normative reasons for this advantage, by appealing to the notion of description length, which permits quantifying the efficiency of neural coding schemes in info-theoretic terms.The general idea is that the best model is the one that minimizes the amount of memory (bits) required to encode both the data and the model parameters themselves [15,16].
In particular, we analysed the description length of the decoupled coding (R D ), aka the Active Appearance Model (AAM), which encodes both the principal components of uniformed (shape-free) facial images and their shape (landmark) coordinates.We compared such description length with that of the set of principal components of the (non-uniformed) raw images, the widely used eigenface coding (R E ).At difference with previous studies that compared alternative (biologically plausible) neural codes [4,7], here we performed an information-theoretic analysis: we evaluated the gain in information entailed by the uniformation of facial images, and compared it with the amount of information required to encode the landmark coordinates.This evaluation necessarily requires calculating the information length of (the principal components of) non-uniformed images, which is precisely what eigenface coding (R E ) does -and hence it results in an implicit comparison of decoupled coding (R D ) and eigenface coding (R E ).
Our simulations, on the same database (FEI, see [19]) as in the monkey study of [4], show that decoupled coding (R D ) requires less information to represent the images compared to eigenface coding (R E ), despite the latter does not require coding for the geometric coordinates of faces.Remarkably, the efficiency gain of decoupled coding (R D ) is especially prominent for high resolution images and for variants of training set images that only differ in facial expressions.
The number of landmarks n or, equivalently, the dimension of the shape coordinates (d s = 2n ) is, by construction, much lower than the image resolution d t .In information theoretical terms, it is precisely this condition d s d t that makes the decoupled coding efficient: encoding the position of few landmarks costs an information which is much lower than the information that is saved encoding the resulting uniformed facial images (for high enough image resolution).Indeed, a control analysis shows that this advantage is in place up until the landmarks become too numerous, i.e., of the order of a few hundreds (see the Supplementary information, paragraph "How many landmarks are too many?").
Furthermore, we found that the probabilistic generative model induced by decoupled coding (R D ) achieves good performance in face processing tasks, including sampling artificial or novel faces, recognising face identity and reconstructing novel faces with p principal components.Rather, a model using eigenface coding (R E ) performs less accurately and produces less realistic faces with artefacts.
Taken together, these results shed light on the normative advantages of the decoupled coding for faces that was empirically reported in monkey inferotemporal (IT) cortex of macaques [4,7], showing that it is both more efficient (in info-theoretic terms) and more accurate than the alternative eigenface scheme widely used in computer vision.
The dataset of uniform facial images does not present, as one may have expected, significantly lower variance than the set of non-uniform images.The efficiency of the decoupled coding comes, instead, from the fact that pixels of uniform images are more correlated and, consequently, they can be compressed in terms of few principal components without loss of precision.The amount of information saved in encoding uniform images compensates, for high enough resolution, the amount of information needed to codify the principal components of shape coordinates.
Shape (texture free) and texture (shape-free) coordinates carry information regarding naturally different aspects of human faces and can vary independently.Namely, variations in facial expressions and variations in perspective (e.g., small rotations) mainly modify shape coordinates, but not texture coordinates.Rather, variations in luminosity, suntan, or make-up only modify texture coordinates [35].This perspective helps explain our finding that decoupled coding is particularly advantageous when encoding variants of known faces or unknown faces (in the test set).When encoding variants of known images, one of the two sets of (shape and texture) coordinates will tend to remain the same as in the known, reference image.
Of note, the above described advantages of decoupled coding come at the price of performing some prerequisite nonlinear computations over facial images: at least landmark detection and image deformation (see the Supporting information).We speculate that if the neural code for facial identification implements a variant of the AAM [4], these nonlinear computations might be putatively realised by early visual areas, which lie below the IT in the neural hierarchy.This speculation remains to be tested in future research, since the neuronal mechanisms of landmark detection and image deformation have not been identified yet.
Finally, future research could address the conditions under which the decoupling emerges spontaneously in deep architectures, as well as a study of the AMM efficiency in which the cost of landmark detection and uniforming is taken into account.

Supporting information
Relation with Bayesian Model Selection.. Bayesian model selection consists in choosing the model M that maximises the Bayesian evidence of a given dataset D. The best model M is, equivalently [17], the one that minimises the description length min M L M (D).To verify the validity of the condition (3), we have, instead, compared the description length of different datasets: I, Î, L, according to the same probabilistic model, which corresponds to the multivariate normal distribution whose correlation matrix takes, respectively, the pt and C (s) ps , for the three datasets.Here, C p is the matrix whose p largest eigenvalues and corresponding eigenvectors are the same of the sample eigenvalues and eigenvectors of the training-set of non-uniformed images and the remaining d − p eigenvalues are set to a constant (see, for example, [24]), and the same for ps .This is, hence, the opposite situation with respect to Bayesian model selection, in which one compares the evidence of the same dataset according to different models.It is important to remark that, in the present work, we do not aim to perform a comparison, on Bayesian grounds, between eigenface and decoupled codings, understood as probabilistic models over the common dataset of original, non-uniformed facial images.
Indeed, the representation R D induces a probability distribution in the space of non-uniformed images I, that is no longer a Gaussian distribution (even if the distributions over Î and are) since it involves the nonlinear image deformation operations, that we have completely neglected in our information-theoretical analysis.Within our working hypothesis, we neglect the uniformation I, L → Î (previous to the PCA) and de-uniformation I p , L p ← Î , L (posterior to the PCA) operations from the information-theoretical analysis.
In other words, here we consider three probabilistic descriptions over separate spaces: non-uniformed images, uniformed images and shape coordinates.Our conclusions merely rely in the information-theoretical interpretation of the Bayesian evidence of a dataset according to a model, that is related to the amount of information needed to store the dataset in terms of the model lattent variables, with a given precision.
The decoupled code, understood as a probabilistic model on the space of the original images would be more complex than Gaussian.It would implicitly contain, in some (texture) latent variables, a description of the input image somehow invariant under shape transformations; other (shape) latent variables would be invariant under texture transformations of the original image.Our current analysis is to be understood as an estimation of the information theoretical gain of the facial representation in terms of (principal components of) uniformed images and landmarks, neglecting the non-linear operations of landmark detection and image deformation that lead to these two facial coordinates, from the original dataset of images.
Instead, we do perform a genuine Bayesian model selection when choosing the values of p, p t , p s that minimise the description length (that maximise the Bayesian evidence) of each dataset, i.e., of each type of

coordinate.
Image uniformation and de-uniformation..The creation of the uniformed texture Î coordinates from the original images I and their shape coordinates in R D is implemented, as said before, through image deformation algorithms based on similiarity transformations [20].Such algorithms map the original image into an image whose landmark positions will now occupy their average value in the dataset .Vice-versa, the reconstruction of novel images in R D requires creating a non-uniformed facial image from the reconstructed shape and texture coordinates p , Îp .This operation we will be called de-uniformation: where the subscripted arrow indicates the image deformation algorithm transforming an image ( 1 , I 1 ) 2 → ( 2 , I 2 ) so that the pixel values of I 2 in the positions given by 2 are those of I 1 in 1 (say, I 2 ( 2j ) = I 1 ( 1j ) where 1j are the original Cartesian positions of the j-th landmark), and the rest of the pixel values of I 2 are changed consequently, under the requirement of smoothness.As a consistency check, we have verified that uniforming and consequently de-uniforming dataset images, leads to new images that are visually indistinguishable from the initial ones.
In fig. 9 we illustrate the effect of the used image deformation algorithm on a picture of the FEI database.
How many landmarks are too many?.The results presented in the main text lead to a picture of the origin of the efficiency of the decoupling code R D .In substance, the above analysis suggests that, for the decoupling to be worth, the number of landmarks should be low enough, relatively to the number of pixels.In this situation, few landmarks, of the order of some tenths, require few information to be encoded and, at the same time, they imply a large texture information gap in since the resulting uniformed images are more compressible.such an optimal value would be much lower than the above estimation for its maximum value, probably of the order of few hundred of landmarks, for h 300.
Likelihood and evidence of the normal distribution..We report the well-known expression for the Bayesian evidence, and related formulae, of the normal distribution associated to the p-PCA representation with p principal components.Given a dataset D composed by N d-dimensional vectors, p-PCA induces a likelihood probability distribution which is the normal distribution (supposing null averages): where Σ is the unbiased estimator of the correlation matrix of the data D, and where the parameters θ = C are the theoretical correlation matrix, which in p-PCA is subject to exhibit its d − p lowest eigenvalues equal to a common noise-level value v.The maximum likelihood estimation θ * for C and v are: where U is an orthogonal matrix whose top p eigenvectors are those of Σ, and where the diagonal matrix Λ contains the top p eigenvalues of Σ, Λ ii = λ i for i ≤ p, and the remaining d − p diagonal elements equal to For completeness, we report the expressions for the description length, the empirical entropy and the Occam factor, making explicit the dependence on the number of principal components p: S p (D|θ) = − ln P p (D|θ) − d ln( ) where, as mentioned in the main text, in these equations θ * refers to the maximum likelihood estimator.The equation for the Bayesian evidence (under certain assumptions on the prior variance) takes the form, up to a constant factor, and for sufficiently large N [24]: m := dp − p(p + 1)/2 (13) and where: where the λ's are the eigenvalues of Σ in decreasing order, λj = λ j for j ≤ p but = v p for j > p.
In the case d > N , this last term takes the form: while for d ≤ N , it is: Likelihood and evidence of shape coordinates..For shape coordinates, and for the datasets considered here, it is d s < N .In figure 10 (upper panel) we show the behaviour of the training-and test-set (logarithms of the) likelihood, along with the training-and test-set (logarithms of the) Bayesian evidence of shape coordinates (respectively, ln P (L tr |C s ), ln P (L te |C s ), ln P (L tr ), ln P (L te )).We observe that the training-set evidence behaviour is qualitatively similar to that of the the test-set likelihood (contrary to the case of texture coordinates, see below).
When commenting the results of figure 3, we mentioned the fact that the empirical entropy of shape coordinates does not depend on the resolution.Indeed, changing the resolution in the dataset of shape coordinates amounts to multiply the Landmarks' Cartesian coordinates by a factor (w/w for horizontal, h/h for vertical coordinates).However, the relevant quantity in these experiments is not the absolute value of the coordinates in the w × h grid units, but their normalised value in units of the image heigh h.If normalised coordinates are considered, the precision should be consequently normalised to be inversely proportional to for different resolutions, using the resolution-dependent precision s = 0.1(h max /h).The overlap of different curves is a consequence of the fact that no information has been lost when scaling both the coordinates and the precision.the set of concatenated vectors y = E (c) y, hence treating shape and texture coordinates on the same ground.
For an image dataset such that shape and texture coordinates were completely uncorrelated (say, m I i = 0 ∀m, i), the concatenated code would exactly coincide with R D , in the sense that each principal axis would be a (normalised) concatenation of principal axes of and Î coordinates.The performance of the R c code in the face processing tasks presented in section Results turns to be almost identical using texture coordinates only.The reason is that shape coordinates carry a lower amount of aggregated information and, in any case, the correlations between shape and texture coordinates are significantly smaller than those in the diagonal blocks of C (c) .The advantage of using R c is that one may fix a single number of principal components.The daydream generation of novel facial images with the R D code (fixing p s = d s at its maximum value) leads to almost identical results of those of R c in figure 5.
Details of the classification algorithms..The classification tasks are performed via a nearest-neighbour classifier: every vector x is assigned to the class that minimizes the distance from x.If a class contains more than one element, as it is the case of the gender classification task (in which the male and the female classes contain 200 vectors each, corresponding to half of the raw FEI database), the distance is computed between x and the average of the elements belonging to the class.For the gender identification task we follow a leave-one-out approach: for each vector x, the training-set (from which we compute the correlation matrix, defining in its turn the Mahalanobis distance d p (•, •)) is composed by all the dataset vectors except for x itself.
The so defined training-set is as well the set from which the average vector of each class is constructed.Results of the gender classification task..In figure 13 we present the results of the gender classification task.
We observe that the shape coordinates alone are sufficient to achieve roughly 90% of successful attempts with less than 30 PC's.Consistently with the rest of the article results, the classification performed in terms of (principal components of) uniformed facial images achieves higher success rates respect to that using (principal components of) the original original facial image (i.e., the R E representation).Furthermore, the success rate plateau is reached for a lower number of PC's (p 30 versus p 40 of R E ).
Different regularisation schemes..For each generic set of facial coordinates (say, D), we have so far estimated its description length according to the p-PCA model, whose number of principal components p are those that maximise the description length p * = arg min p L D,p (D).The inferred probability distribution is a normal distribution whose correlation matrix C p * is consequently different from the empirical correlation matrix, say C p=min{N,d} , since not all empirical eigenvalues and eigenvectors are statistically significant given the dataset finiteness. 16The normal distribution whose correlation is the empirical matrix would correspond, instead, to maximum likelihood inference. 16Strictly speaking, in the N < d case, the inferred correlation matrix Cp is different from the empirical matrix even if p = N , since it has to be regularised so that its rank is = d (and not = N ).Actually, there are different ways, besides p-PCA, in which the correlation matrix may be inferred beyond the maximum likelihood criterion.An alternative is linear (identity) shrinkage (see, for example, [36]).Linear shrinkage leads to a correlation matrix which is a convex combination between the unbiased (maximum likelihood) empirical estimator C and a completely biased (and null-variance) matrix, as the identity matrix in d dimensions 1 d .In other words, the "regularised" shrunk matrix is C α = (1 − α)C + α1 d where α is a real number in [0, 1], that may be chosen by maximum (cross-validated) out-of-sample likelihood.In the p-PCA scheme, p = 0 and p = min{N, d} are the arguments of the minimum and maximum training likelihood respectively, and p * is comprised between them.Within the shrinkage scheme, these extreme cases correspond to α = 1 and 0, respectively.
In order to check the robustness of our results with respect to the regularisation scheme, we have computed the information gaps (actually, the gaps in empirical entropy) 17 resulting from the normal probability distributions associated not with p-PCA but with linear shrinkage.We have observed that the results are qualitatively consistent with those presented here.While the lowest description length of the set of landmarks G 2 is consistent with the one shown in figure 3, the description length gap G 1 is significantly larger for the largest resolution, as can be seen in figure 14.Consequently, the information gap is even larger when regularising the correlation matrices with the shrinkage method.

1 . 1 .
Both the R D and the R E codings represent facial images in terms of Principal Components (PCs), but over different facial coordinates of the training set (i.e., over different datasets).Specifically, they represent a generic image I as follows: Eigenface coding (R E ) represents the image in terms of its PCs, I .In mathematical terms, I = E (E) p • I, where E (E) p is the p × d t matrix composed of the first p (row) eigenvectors of the unbiased estimator of the correlation matrix C of training-set images, C ij = I i (n)I j (n) , where • = (1/N tr n •) is the empirical average over the training-set, and where all the vector components are null-averaged, t e x i t s h a 1 _ b a s e 6 4 = " l x N s c Z L v k e c B 1 r m L t N k J 5 7 e n f l k = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 o Q 9 l s N + 3 S z S b s T o Q S + i O 8 e F D E q 7 / H m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O d 9 O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / / A P T x q m S T T j D d Z Ih P d C a n h U i j e R I G S d 1 L N a R x K 3 g 7 H t z O / / c S 1 E Y l 6 x E n K g 5 g O l Y g E o 2 i l d t 4 L I 3 I / 7 b t V r + b N Q V a J X 5 A q F G j 0 3 a / e I G F Z z B U y S Y 3 p + l 6 K Q U 4 1 C i b 5 t N L L D E 8 p G 9 M h 7 1 q q a M x N k M / P n Z I z q w x I l G h b C s l c / T 2 R 0 9 i Y S R z a z p j i y C x 7 M / E / r 5 t h d B 3 k Q q U Z c s U W i 6 J M E k z I 7 H c y E J o z l B N L K N P C 3 k r Y i G r K 0 C Z U s S H 4 y y + v k t Z F z f d q / s N l t X 5 T x F G G E z i F c / D h C u p w B w 1 o A o M x P M M r v D m p 8 + K 8 O x + L 1 p J T z B z D H z i f P + e Q j 0 U = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " l x N s c Z L v k e c B 1 r m L t N k J 5 7 e n f l k = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 o Q 9 l s N + 3 S z S b s T o Q S + i O 8 e F D E q 7 / H m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O d 9 O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / / A P T x q m S T T j D d Z I h P d C a n h U i j e R I G S d 1 L N a R x K 3 g 7 H t z O / / c S 1 E Y l 6 x E n K g 5 g O l Y g E o 2 i l d t 4 L I 3 I / 7 b t V r + b N Q V a J X 5 A q F G j 0 3 a / e I G F Z z B U y S Y 3 p + l 6 K Q U 4 1 C i b 5 t N L L D E 8 p G 9 M h 7 1 q q a M x N k M / P n Z I z q w x I l G h b C s l c / T 2 R 0 9 i Y S R z a z p j i y C x 7 M / E / r 5 t h d B 3 k Q q U Z c s U W i 6 J M E k z I 7 H c y E J o z l B N L K N P C 3 k r Y i G r K 0 C Z U s S H 4 y y + v k t Z F z f d q / s N l t X 5 T x F G G E z i F c /D h C u p w B w 1 o A o M x P M M r v D m p 8 + K 8 O x + L 1 p J T z B z D H z i f P + e Q j 0 U = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " l x N s c Z L v k e c B 1 r m L t N k J 5 7 e n f l k = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 o Q 9 l s N + 3 S z S b s T o Q S + i O 8 e F D E q 7 / H m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O d 9 O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / / A P T x q m S T T j D d Z I h P d C a n h U i j e R I G S d 1 L N a R x K 3 g 7 H t z O / / c S 1 E Y l 6 x E n K g 5 g O l Y g E o 2 i l d t 4 L I 3 I / 7 b t V r + b N Q V a J X 5 A q F G j 0 3 a / e I G F Z z B U y S Y 3 p + l 6 K Q U 4 1 C i b 5 t N L L D E 8 p G 9 M h 7 1 q q a M x N k M / P n Z I z q w x I l G h b C s l c / T 2 R 0 9 i Y S R z a z p j i y C x 7 M / E / r 5 t h d B 3 k Q q U Z c s U W i 6 J M E k z I 7 H c y E J o z l B N L K N P C 3 k r Y i G r K 0 C Z U s S H 4 y y + v k t Z F z f d q / s N l t X 5 T x F G G E z i F c / D h C u p w B w 1 o A o M x P M M r v D m p 8 + K 8 O x + L 1 p J T z B z D H z i f P + e Q j 0 U = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " l x N s c Z L v k e c B 1 r m L t N k J 5 7 e n f l k = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 o Q 9 l s N + 3 S z S b s T o Q S + i O 8 e F D E q 7 / H m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O d 9 O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / / A P T x q m S T T j D d Z I h P d C a n h U i j e R I G S d 1 L N a R x K 3 g 7 H t z O / / c S 1 E Y l 6 x E n K g 5 g O l Y g E o 2 i l d t 4 L I 3 I / 7 b y H C k 1 j g I z m U d U 8 1 4 u / u d 1 U x 2 e + x k T S a q p I M W h M O V I x y g v A P W Z p E T z s S G Y S G a y I j L E E h N t a r J N C e 7 8 l x d J 6 7 T m O j X 3 z q n W r 6 B A G Q 7 g E E 7 A h T O o w w 0 0 o A k E E n i C F 3 i 1 U u v Z e r P e i 9 G S N d v Z h z + w P n 4 A 4 a a T E Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " m K M I G 7 / a W R R j h u 8 P 3 H 4 q I I / J h V U = " > A A A B 8 3 i c b V D L S g M x F L 3 j s 4 6 v q k s 3 w S K 6 K j N u d C M W 3 e i u g n 1 A p 5 R M m m l D M 5 m Q Z I Q y 9 D f c u F D U r d / h 3 o 3 4 N 2 b a L r T 1 Q O B w z r 3 c k x N K z r T x v G 9 n Y X F p e W W 1 s O a u b 2 x u b R d 3 d u s 6 S R W h N Z L w R D V D r C l n g t Y M M 5 w 2 p a I 4 D j l t h I O r 3 G / c U 6 V Z I u 7 M U N J 2 j H u C R Y x g Y 6 U g C 2 J s + m G E b k Z H n W L J K 3 t j o H n i T 0 n p 4 s M 9 l 6 9 f b r V T / A y 6 C U l j K g z h W O u W 7 0 n T z r A y j H A 6 c o N U U 4 n J A P d o y 1 K B Y 6 r b 2 T j z C B 1 a p Y

<
/ l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " p 5 C P w d h x o 3 / z v c v 3 3 K C n 9 f + y V W 8 = " > A A A B 8 3 i c b V D L S s N A F L 2 p r 1 p f V Z d u B o v o q i R u d F l 0 o 7 s K 9 g F N K J P p p B 0 6 m Y S Z G 6 G E / o Y b F 4 q 4 9 W f c + T d O 2 y y 0 9 c D A 4 Z x 7 u W d O m E p h 0 H W / n d L a + s b m V n m 7 s r O 7 t 3 9 Q P T x q m y T T j L d Y I h P d D a n h U i j e Q o G S d 1 P N a R x K 3 g n H t z O / 8 8 S 1 E Y l 6 x E n K g 5 g O l Y g E o 2 g l P / d j i q M w I v f T 8 3 6 1 5 t b d O c g q 8 Q p S g w L N f v X L H y Q s i 7 l C J q k x P c 9 N M c i p R s E k n 1 b 8 z P C U s j E d 8 p 6 l i s b c B P k 8 8 5 S c W W V A o k T b p 5 D M 1 d 8 b O Y 2 N m c S h n Z x F N M v e T P z P 6 2 U Y X Q e 5 U G m G X L H F o S i T B B M y K 4 A M h O Y M 5 c Q S y r S w W Q k b U U 0 Z 2 p o q t g R v + c u r p H 1 Z 9 9 y 6 9 + D W G j d F H W U 4 g V O 4 A A + u o A F 3 0 I Q W M E j h G V 7 h z c m c F + f d + V i M l p x i 5 x j + w P n 8 A X J u k U Q = < / l a t e x i t > l a t e x i t s h a 1 _ b a s e 6 4 = " N i C N F Q I N C F G 5 T 4 i O Y + J P x V a f 5 X A = " > A A A B 6 3 i c b V A 9 S w N B E J 2 L X z F + R S 1 t F o N g F e 5 E 0 D J o Y x n B x E B y h L 3 N J F m y u 3 f s 7 g n h y F + w s V D E 1 j 9 k 5 7 9 x L 7 l C E x 8 M P N 6 b Y W Z e l A h u r O 9 / e 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N 8 w y u 8 e d J 7 8 d 6 9 j 0 V r y S t m j u E P v M 8 f D W a O O w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " N i C N F Q I N C F G 5 T 4 i O Y + J P x V a f 5 X A = " > A A A B 6 3 i c b V A 9 S w N B E J 2 L X z F + R S 1 t F o N g F e 5 E 0 D J o Y x n B x E B y h L 3 N J F m y u 3 f s 7 g n h y F + w s V D E 1 j 9 k 5 7 9 x L 7 l C E x 8 M P N 6 b Y W Z e l A h u r O 9 / e 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N 8 w y u 8 e d J 7 8 d 6 9 j 0 V r y S t m j u E P v M 8 f D W a O O w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " N i C N F Q I N C F G 5 T 4 i O Y + J P x V a f 5 X A = " > A A A B 6 3 i c b V A 9 S w N B E J 2 L X z F + R S 1 t F o N g F e 5 E 0 D J o Y x n B x E B y h L 3 N J F m y u 3 f s 7 g n h y F + w s V D E 1 j 9 k 5 7 9 x L 7 l C E x 8 M P N 6 b Y W Z e l A h u r O 9 / e 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N 8 w y u 8 e d J 7 8 d 6 9 j 0 V r y S t m j u E P v M 8 f D W a O O w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " N i C N F Q I N C F G 5 T 4 i O Y + J P x V a f 5 X A = " > A A A B 6 3 i c b V A 9 S w N B E J 2 L X z F + R S 1 t F o N g F e 5 E 0 D J o Y x n B x E B y h L 3 N J F m y u 3 f s 7 g n h y F + w s V D E 1 j 9 k 5 7 9 x L 7 l C E x 8 M P N 6 b Y W Z e l A h u r O 9 / e 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N 8 w y u 8 e d J 7 8 d 6 9 j 0 V r y S t m j u E P v M 8 f D W a O O w = = < / l a t e x i t > I < l a t e x i t s h a 1 _ b a s e 6 4 = " l x N s c Z L v k e c B 1 r m L t N k J 5 7 e n f l k = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 o Q 9 l s N + 3 S z S b s T o Q S + i O 8 e F D E q 7 / H m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O d 9 O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / / A P T x q m S T T j D d Z I h P d C a n h U i j e R I G S d 1 L N a R x K 3 g 7 H t z O / / c S 1 E Y l 6 x E n K g 5 g O l Y g E o 2 i l d t 4 L I 3 I / 7 b h C u p w B w 1 o A o M x P M M r v D m p 8 + K 8 O x + L 1 p J T z B z D H z i f P + e Q j 0 U = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " l x N s c Z L v k e c B 1 r m L t N k J 5 7 e n f l k = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 o Q 9 l s N + 3 S z S b s T o Q S + i O 8 e F D E q 7 / H m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O d 9 O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / / A P T x q m S T T j D d Z I h P d C a n h U i j e R I G S d 1 L N a R x K 3 g 7 H t z O / / c S 1 E Y l 6 x E n K g 5 g O l Y g E o 2 i l d t 4 L I 3 I / 7 bt V r + b N Q V a J X 5 A q F G j 0 3 a / e I G F Z z B U y S Y 3 p + l 6 K Q U 4 1 C i b 5 t N L L D E 8 p G 9 M h 7 1 q q a M x N k M / P n Z I z q w x I l G h b C s l c / T 2 R 0 9 i Y S R z a z p j i y C x 7 M / E / r 5 t h d B 3 k Q q U Z c s U W i 6 J M E k z I 7 H c y E J o z l B N L K N P C 3 k r Y i G r K 0 C Z U s S H 4 y y + v k t Z F z f d q / s N l t X 5 T x F G G E z i F c / D h C u p w B w 1 o A o M x P M M r v D m p 8 + K 8 O x + L 1 p J T z B z D H z i f P + e Q j 0 U = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " l x N s c Z L v k e c B 1 r m L t N k J 5 7 e n f l k = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 o Q 9 l s N + 3 S z S b s T o Q S + i O 8 e F D E q 7 / H m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O d 9 O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / / A P T x q m S T T j D d Z I h P d C a n h U i j e R I G S d 1 L N a R x K 3 g 7 H t z O / / c S 1 E Y l 6 x E n K g 5 g O l Y g E o 2 i l d t 4 L I 3 I / 7 b t V r + b N Q V a J X 5 A q F G j 0 3 a / e I G F Z z B U y S Y 3 p + l 6 K Q U 4 1 C i b 5 t N L L D E 8 p G 9 M h 7 1 q q a M x N k M / P n Z I z q w x I l G h b C s l c / T 2 R 0 9 i Y S R z a z p j i y C x 7 M / E / r 5 t h d B 3 k Q q U Z c s U W i 6 J M E k z I 7 H c y E J o z l B N L K N P C 3 k r Y i G r K 0 C Z U s S H 4 y y + v k t Z F z f d q / s N l t X 5 T x F G G E z i F c / D h C u p w B w 1 o A o M x P M M r v D m p 8 + K 8 O x + L 1 p J T z B z D H z i f P + e Q j 0 U = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " l x N s c Z L v k e c B 1 r m L t N k J 5 7 e n f l k = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 o Q 9 l s N + 3 S z S b s T o Q S + i O 8 e F D E q 7 / H m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O d 9 O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / / A P T x q m S T T j D d Z I h P d C a n h U i j e R I G S d 1 L N a R x K 3 g 7 H t z O / / c S 1 E Y l 6 x E n K g 5 g O l Y g E o 2 i l d t 4 L I 3 I / 7 b t V r + b N Q V a J X 5 A q F G j 0 3 a / e I G F Z z B U y S Y 3 p + l 6 K Q U 4 1 C i b 5 t N L L D E 8 p G 9 M h 7 1 q q a M x N k M / P n Z I z q w x I l G h b C s l c / T 2 R 0 9 i Y S R z a z p j i y C x 7 M / E / r 5 t h d B 3 k Q q U Z c s U W i 6 J M E k z I 7 H c y E J o z l B N L K N P C 3 k r Y i G r K 0 C Z U s S H 4 y y + v k t Z F z f d q / s N l t X 5 T x F G G E z i F c / D h C u p w B w 1 o A o M x P M M r v D m p 8 + K 8 O x + L 1 p J T z B z D H z i f P + e Q j 0 U = < / l a t e x i t > Î < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 x b B h C N A O P A I b G N y h G R o P c S V z s Q = " > A A A B 8 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P R i9 4 q 2 A 9 I S t l s N + 3 S z S b s T o Q S + j O 8 e F D E q 7 / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O t + O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N k m m G W + x R C a 6 G 1 L D p V C 8 h Q I l 7 6 a a 0 z i U v B O O b 2 d + 5 4 l r I x L 1 i J O U 9 2 I 6 V C I S j K K V / G B E M Q / C i N x P + 9 W a W 3 f n I K v E K 0 g N C j T 7 1 a 9 g k L A s 5 g q Z p M b 4 n p t i L 6 c a B Z N 8 W g k y w 1 P K x n T I f U s V j b n p 5 f O T p + T M K g M S J d q W Q j J X f 0 / k N D Z m E o e 2 M 6 Y 4 M s v e T P z P 8 z O M r n u 5 U G m G X L H F o i i T B B M y + 5 8 M h O Y M 5 c Q S y r S w t x I 2 o p o y t C l V b A j e 8 s u r p H 1 R 9 9 y 6 9 3 B Z a 9 w U c Z T h B E 7 h H D y 4 g g b c Q R N a w C C B Z 3 i F N w e d F + f d + V i 0 l p x i 5 h j + w P n 8 A f T h k Q Y = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 x b B h C N A O P A I b G N y h G R o P c S V z s Q = " > A A A B 8 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 I S t l s N + 3 S z S b s T o Q S + j O 8 e F D E q 7 / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O t + O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N k m m G W + x R C a 6 G 1 L D p V C 8 h Q I l 7 6 a a 0 z i U v B O O b 2 d + 5 4 l r I x L 1 i J O U 9 2 I 6 V C I S j K K V / G B E M Q / C i N x P + 9 W a W 3 f n I K v E K 0 g N C j T 7 1 a 9 g k L A s 5 g q Z p M b 4 n p t i L 6 c a B Z N 8 W g k y w 1 P K x n T I f U s V j b n p 5 f O T p + T M K g M S J d q W Q j J X f 0 / k N D Z m E o e 2 M 6 Y 4 M s v e T P z P 8 z O M r n u 5 U G m G X L H F o i i T B B M y + 5 8 M h O Y M 5 c Q S y r S w t x I 2 o p o y t C l V b A j e 8 s u r p H 1 R 9 9 y 6 9 3 B Z a 9 w U c Z T h B E 7 h H D y 4 g g b c Q R N a w C C B Z 3 i F N w e d F + f d + V i 0 l p x i 5 h j + w P n 8 A f T h k Q Y = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 x b B h C N A O P A I bG N y h G R o P c S V z s Q = " > A A A B 8 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 I S t l s N + 3 S z S b s T o Q S + j O 8 e F D E q 7 / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O t + O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N k m m G W + x R C a 6 G 1 L D p V C 8 h Q I l 7 6 a a 0 z i U v B O O b 2 d + 5 4 l r I x L 1 i J O U 9 2 I 6 V C I S j K K V / G B E M Q / C i N x P + 9 W a W 3 f n I K v E K 0 g N C j T 7 1 a 9 g k L A s 5 g q Z p M b 4 n p t i L 6 c a B Z N 8 W g k y w 1 P K x n T I f U s V j b n p 5 f O T p + T M K g M S J d q W Q j J X f 0 / k N D Z m E o e 2 M6 Y 4 M s v e T P z P 8 z O M r n u 5 U G m G X L H F o i i T B B M y + 5 8 M h O Y M 5 c Q S y r S w t x I 2 o p o y t C l V b A j e 8 s u r p H 1 R 9 9 y 6 9 3 B Z a 9 w U c Z T h B E 7 h H D y 4 g g b c Q R N a w C C B Z 3 i F N w e d F + f d + V i 0 l p x i 5 h j + w P n 8 A f T h k Q Y = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 x b B h C N A O P A I b G N y h G R o P c S V z s Q = " > A A A B 8 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 I S t l s N + 3 S z S b s T o Q S + j O 8 e F D E q 7 / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O t + O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N k m m G W + x R C a 6 G 1 L D p V C 8 h Q I l 7 6 a a 0 z i U v B O O b 2 d + 5 4 l r I x L 1 i J O U 9 2 I 6 V C I S j K K V / G B E M Q / C i N x P + 9 W a W 3 f n I K v E K 0 g N C j T 7 1 a 9 g k L A s 5 g q Z p M b 4 n p t i L 6 c a B Z N 8 W g k y w 1 P K x n T I f U s V j b n p 5 f O T p + T M K g M S J d q W Q j J X f 0 / k N D Z m E o e 2 M 6 Y 4 M s v e T P z P 8 z O M r n u 5 U G m G X L H F o i i T B B M y + 5 8 M h O Y M 5 c Q S y r S w t x I 2 o p o y t C l V b A j e 8 s u r p H 1 R 9 9 y 6 9 3 B Z a 9 w U c Z T h B E 7 h H D y 4 g g b c Q R N a w C C B Z 3 i F N w e d F + f d + V i 0 l p x i 5 h j + w P n 8 A f T h k Q Y = < / l a t e x i t > `0 < l a t e x i t s h a 1 _ b a s e 6 4 = " / t q a p 4 x 9 U u R 6 x 6 N / 8 z b L g C c 3 K j c = " > A A A B + H i c b V D L S s N A F L 3 x W e O j U Z d u B o v o q i R u d C M W 3 b i s Y B / Q h D K Z T t q h k 0 m Y m Q g 1 F P w P N y 4 U c e t P u H f n 3 z h p u 9 D W A w O H c + 7 l n j l h y p n S r v t t L S 2 v r K 6 t l z b s z a 3 t n b K z u 9 d U S S Y J b Z C E J 7 I d Y k U 5 E 7 S h m e a 0 n U q K 4 5 D T V j i 8 L v z W P Z W K J e J O j 1 I a x L g v W M Q I 1 k b q O u X c j 7 E e h B H y K e f j 4 6 5 T c a v u B G i R e D N S u f y 0 L x 4 B o N 5 1 v v x e Q r K Y C k 0 4 V q r j u a k O c i w 1 I 5 y O b T 9 T N M V k i P u 0 Y 6 j A M V V B P g k + R k d G 6 a E o k e Y J j S b q 7 4 0 c x 0 q N 4 t B M F i n V v F e I / 3 m d T E f n Q c 5 E m m k q y

6 8 1
6 n 4 4 u W b O d f f g D 6 + M H 1 O G U s A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " w O m O / v F S M B 6 9 Z u Z g Y K o m 9 l 4 m KA Y = " > A A A B + H i c b V D L S s N A F J 3 U V 4 2 P R l 2 6 G S y i q 5 K 4 0 Y 1 Y d O O y g n 1 A G 8 p k e t M O n U z C z E S o o V / i R l A R t / 6 E e z f i 3 z h p u 9 D W A w O H c + 7 l n j l B w p n S r v t t F Z a W V 1 b X i u v 2 x u b W d s n Z 2 W 2 o O J U U 6 j T m s W w F R A F n A u q a a Q 6 t R A K J A g 7 N Y H i V + 8 0 7 k I r F 4 l a P E v A j 0 h c s Z J R o I 3 W d U t a J i B 4 E I e 4 A 5 + O j r l N 2 K + 4 E e J F 4 M 1 K + + L D P k 6 c v u 9 Z 1 P j u 9 m K Y R C E 0 5 U a r t u Y n 2 M y I 1 o x z G d i d V k B A 6 J H 1 o G y p I B M r P J s H H + N A o P R z G 0 j y h 8 U T 9 v Z G R S K l R F J j J P K W a 9 3 L x P 6 + d 6 v D M z 5 h I U g 2 C T g + F K c c 6 x n k L u M c k U M 1 H h h A q m c m K 6 Y B I Q r X p y j Y l e P N f X i S N k 4 r n V r w b t 1 y 9 R F M U 0 T 4 6 Q M f I Q 6 e o i q 5 R D d U R R S l 6 Q M / o x b q 3 H q 1 X 6 2 0 6 W r B m O 3 v o D 6 z 3 H 8 Z w l i Q = < / l a te x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " w O m O / v F S M B 6 9 Z u Z g Y K o m 9 l 4 m K A Y = " >A A A B + H i c b V D L S s N A F J 3 U V 4 2 P R l 2 6 G S y i q 5 K 4 0 Y 1 Y d O O y g n 1 A G 8 p k e t M O n U z C z E S o o V / i R l A R t / 6 E e z f i 3 z h p u 9 D W A w O H c + 7 l n j l B w p n S r v t t F Z a W V 1 b X i u v 2 x u b W d s n Z 2 W 2 o O J U U 6 j T m s W w F R A F n A u q a a Q 6 t R A K J A g 7 N Y H i V +8 0 7 k I r F 4 l a P E v A j 0 h c s Z J R o I 3 W d U t a J i B 4 E I e 4 A 5 + O j r l N 2 K + 4 E e J F 4 M 1 K + + L D P k 6 c v u 9 Z 1 P j u 9 m K Y R C E 0 5 U a r t u Y n 2 M y I 1 o x z G d i d V k B A 6 J H 1 o G y p I B M r P J s H H + N A o P R z G 0 j y h 8 U T 9

Figure 1 :
Figure 1: Schematic illustration of eigenface coding (R E , left) and decoupled coding (R D , right).See the main text for explanation.

Table 1 :
Outline of the two alternative schemes for the representation of facial images: eigenface coding (R E ) and decoupled coding (R D ).
2 θ * are the model parameters (the eigenvector matrix Ep and the vector of averages µ) fitted as the Maximum Likelihood value for a training set Dtr, that may be different from D.
be misleading to interpret the decoupled coding R D as being local instead, just because it uses landmark coordinates.Indeed, R E represents facial images in terms of principal components I , and so does R D : Î are non-local in the sense that each component is a linear combination of pixel intensities occupying different positions in the image canvas; is non-local as well, in the sense that each component is a linear combination of different landmarks' Cartesian coordinates.G 1 = L Itr,p (I) − L Îtr, p( Î) (1) bits to compress the database of non-uniformed images I − bits to compress the database of uniformed images Î

Figure 2
shows the description length of uniformed images in the training set (i.e., taking Î = Îtr in equation 1, where Î is the whole database of N = 400 uniformed smiling and neural images) as a function of p, and for four different resolutions.length per sample and pixel, L/(Nd t ) [bits]uniformed ( ), w=25 uniformed ( ), w=50 uniformed ( ), w=100 uniformed ( ), w=250 non-uniformed ( ), w=250 uniform length l 0

Figure 2 :
Figure 2: Description length of uniformed images in the training set.See the main text for explanation.

Figure 2
Figure2also shows the description length of non-uniformed images for the largest resolution, w max × h max = 250 × 300.We see that, for this resolution, the information gap in equation 1 is positive: the uniformed images are better compressed than non-uniformed images, for all values of p.The information gap per sample and coordinate G/(N d t ) is, as expected, an increasing function of the resolution -see the inset of Figure2-indicating that the information gap increases faster than linear in d t .Rather, for the two lowest resolutions, decoupled coding does not lead to a gain in information.Indeed, for w = 25 the information gap is negative, roughly equal to minus one hundred of bits per sample.The information gap per sample of the R D coding increases rapidly with the number of pixels d t , and it reaches more than 10000 bits per image for w = 250.This is evident in Figure3, which shows the information gap per sample G/N tr of training set images as a function of the resolution.Figure3also shows the description length of shape coordinates, L Ltr,p * s (L tr ), which is independent of the image resolution (horizontal line, see the details in the Supporting information).The information gap of the image degrees of freedom results comparable with the shape coordinates' description length L Ltr,p * s (L tr ) for w = 150, but it is much larger for the largest resolution w = 250.Summarising, for the largest image resolution, the texture-shape decoding entails a gain in information for the largest resolution.For d t 10 4 , the condition in equation 3 is satisfied.

Figure 3 : 8 .
Figure 3: Information gap for facial images in the training set.See the main text for details.
in Equation 1, which is composed by N/K = 80 (with K = 5) images corresponding to smiling subjects, whose neutral-expression images do belong to the training-set.Notice that we will call such a set simply "test-set".All the information-theoretical quantities are then cross-validated for different K training/test partitions of the original dataset (by means of the K-fold algorithm of cross-validation).

Figure 4 :
Figure 4: Information gap for facial images in the test set.See the main text for details.

Figure 5 :
Figure 5: Examples of synthetic faces sampled from the generative models of R E (top row) and a simple variant of the decoupling R D coding scheme, which we call concatenated coding, Rc (bottom row).See the main text for details.

Figure 6 :
Figure6: Performance of in the face recognition task using the uniformed images, the non-uniformed images, and the shape coordinates.

Figure 7
Figure 7 shows the reconstruction of a novel face according to R E and R c for different values of p.The figure illustrates that R E produces a border artifact; this is because linear combinations of different facial images with different facial contours result in an image which tend to be blurred in the margin of the face.
taken as the (training-set) correlation matrix of the eigenface coding, with the maximum number of principal components p d = N tr − N te − 2 = 378.Indeed, for a numerical assessment of the similarity between reconstructed and original faces, one needs a perceptual distance, based on a dimensionality reduction, hence, based on a representation.In fact, we have observed that, as expected[5], the simple Euclidean metrics, or the pixel-wise distance, does not allow to assess any difference between the two representations.Choosing the representation used to compute the similarity between target and reconstructed images equal to one of the representations used for the reconstruction (the eigenface code R E ), we are biasing the comparison in favour of this code.Despite the unfavorable bias towards the R c code, it exhibits a lower perceptual distance in figure8for all the values of p except for the maximum p = p d for which, by construction, the R E code presents the minumum possible distance d = 0 (since the reconstruction and the target image are expressed in terms of the same principal axes), and hence it is not possible that a different model improves it.It is remarkable that, as far as p < p d , the R c code improves R E despite such a bias.

Figure 7 :
Figure 7: Reconstruction of a test-set facial image with p principal components according to the representations: eigenface coding (R E , first row) and concatenated coding (Rc, second row).The 5 columns of the matrix of images represent, respectively: p = 5, 20, 50, 100, 395, and the original image.The image is 100 × 120.

Figure 8 :
Figure 8: Mahalanobis distance d C,p d (tp, t) among target facial images and their reconstructed counterparts with p principal components (in the abscissa), according to the codes R E , Rc (see the main text for details).The points are the average of the distance over Nte = 20 randomly selected images with neutral facial expression.

Figure 9 :
Figure 9: an example of usage of the software for image deformation.The image in the center is the image deformation of the right image with the landmarks corresponding to the left image.

h. In figure 10 (
lower panel) we plot the training empirical entropy S p (L tr |C s ) = − ln P (L tr |C s ) − d s ln s

Figure 10 :Figure 11 :
Figure 10: Left: Shape coordinates' likelihoods and evidences (in BIC approximation) of the test and training sets.Right: Empirical entropy of shape coordinates for the full Ntr = 400 training set, for different resolutions.
< l a t e x i t s h a 1 _ b a s e 6 4 = " N i C N F Q I N C F G 5 T 4 i O Y + J P x V a f 5 X A = " > A A A B 6 3 i c b V A 9 S w N B E J 2 L X z F + R S 1 t F o N g F e 5 E 0 D J o Y x n B x E B y h L 3 N J F m y u 3 f s 7 g n h y F + w s V D E 1 j 9 k 5 7 9 x L 7 l C E x 8 M P N 6 b Y W Z e l A h u r O 9 / e 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N n G q G b Z Y L G L d i a h B w R W 2 L L c C O 4 l G K i O B j 9 H k N v c f n 1 A b H q s H O 0 0 w l H S k + J A z a n O p h 0 L 0 q z W / 7 s 9 B V k l Q k B o U a P a r X 7 1 B z F K J y j J B j e k G f m L D j G r L m c B Z p Z c a T C i b 0 B F 2 H V V U o g m z + a 0 z c u a U A R n G 2 p W y Z K 7 + n s i o N G Y q I 9 c p q R 2 b Z S 8 X / / O 6 q R 1 e h x l X S W p R s c W i Y S q I j U n + O B l w j c y K q S O U a e 5 u J W x M N W X W x V N x I Q T L L 6 + S 9 k U 9 8 O v B / W W t c V P E U Y Y T O I V z C O A K G n A H T W g B g z E 8 w y u 8 e d J 7 8 d 6 9 j 0 V r y S t m j u E P v M 8 f D W a O O w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " N i C N F Q I N C F G 5 T 4 i O Y + J P x V a f 5 X A = " > A A A B 6 3 i c b V A 9 S w N B E J 2 L X z F + R S 1 t F o N g F e 5 E 0 D J o Y x n B x E B y h L 3 N J F m y u 3 f s 7 g n h y F + w s V D E 1 j 9 k 5 7 9 x L 7 l C E x 8 M P N 6 b Y W Z e l A h u r O 9 / e 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U Nn G q G b Z Y L G L d i a h B w R W 2 L L c C O 4 l G K i O B j 9 H k N v c f n 1 A b H q s H O 0 0 w l H S k + J A z a n O p h 0 L 0 q z W / 7 s 9 B V k l Q k B o U a P a r X 7 1 B z F K J y j J B j e k G f m L D j G r L m c B Z p Z c a T C i b 0 B F 2 H V V U o g m z + a 0 z c u a U A R n G 2 p W y Z K 7 + n s i o N G Y q I 9 c p q R 2 b Z S 8 X / / O 6 q R 1 e h x l X S W p R s c W i Y S q I j U n + O B l w j c y K q S O U a e 5 u J W x M N W X W x V N x I Q T L L 6 + S 9 k U 9 8 O v B / W W t c V P E U Y Y T O I V z C O A K G n A H T W g B g z E8 w y u 8 e d J 7 8 d 6 9 j 0 V r y S t m j u E P v M 8 f D W a O O w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " N i C N F Q I N C F G 5 T 4 i O Y + J P x V a f 5 X A = " > A A A B 6 3 i c b V A 9 S w N B E J 2 L X z F + R S 1 t F o N g F e 5 E 0 D J o Y x n B x E B y h L 3 N J F m y u 3 f s 7 g n h y F + w s V D E 1 j 9 k 5 7 9 x L 7 l C E x 8 M P N 6 b Y W Z e l A h u r O 9 / e 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U Nn G q G b Z Y L G L d i a h B w R W 2 L L c C O 4 l G K i O B j 9 H k N v c f n 1 A b H q s H O 0 0 w l H S k + J A z a n O p h 0 L 0 q z W / 7 s 9 B V k l Q k B o U a P a r X 7 1 B z F K J y j J B j e k G f m L D j G r L m c B Z p Z c a T C i b 0 B F 2 H V V U o g m z + a 0 z c u a U A R n G 2 p W y Z K 7 + n s i o N G Y q I 9 c p q R 2 b Z S 8 X / / O 6 q R 1 e h x l X S W p R s c W i Y S q I j U n + O B l w j c y K q S O U a e 5 u J W x M N W X W x V N x I Q T L L 6 + S 9 k U 9 8 O v B / W W t c V P E U Y Y T O I V z C O A K G n A H T W g B g z E8 w y u 8 e d J 7 8 d 6 9 j 0 V r y S t m j u E P v M 8 f D W a O O w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = "N i C N F Q I N C F G 5 T 4 i O Y + J P x V a f 5 X A = " > A A A B 6 3 i c b V A 9 S w N B E J 2 L X z F + R S 1 t F o N g F e 5 E 0 D J o Y x n B x E B y h L 3 N J F m y u 3 f s 7 g n h y F + w s V D E 1 j 9 k 5 7 9 x L 7 l C E x 8 M P N 6 b Y W Z e l A h u r O 9 / e 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N n G q G b Z Y L G L d i a h B w R W 2 L L c C O 4 l G K i O B j 9 H k N v c f n 1 A b H q s H O 0 0 w l H S k + J A z a n O p h 0 L 0 q z W / 7 s 9 B V k l Q k B o U a P a r X 7 1 B z F K J y j J B j e k G f m L D j G r L m c B Z p Z c a T C i b 0 B F 2 H V V U o g m z + a 0 z c u a U A R n G 2 p W y Z K 7 + n s i o N G Y q I 9 c p q R 2 b Z S 8 X / / O 6 q R 1 e h x l X S W p R s c W i Y S q I j U n + O B l w j c y K q S O U a e 5 u J W x M N W X W x V N x I Q T L L 6 + S 9 k U 9 8 O v B / W W t c V P E U Y Y T O I V z C O A K G n A H T W g B g z E8 w y u 8 e d J 7 8 d 6 9 j 0 V r y S t m j u E P v M 8 f D W a O O w = = < / l a t e x i t > shape-free texture Î < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 x b B h C N A O P A I b G N y h G R o P c S V z s Q = " > A A A B 8 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 I S t l s N + 3 S z S b s T o Q S + j O 8 e F D E q 7 / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O t + O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N k m m G W + x R C a 6 G 1 L D p V C 8 h Q I l 7 6 a a 0 z i U v B O O b 2 d + 5 4 l r I x L 1 i J O U 9 2 I 6 V C I S j K K V / G B E M Q / C i N x P + 9 W a W 3 f n I K v E K 0 g N C j T 7 1 a 9 g k L A s 5 g q Z p M b 4 n p t i L 6 c a B Z N 8 W g k y w 1 P K x n T I f U s V j b n p 5 f O T p + T M K g M S J d q W Q j J X f 0 / k N D Z m E o e 2 M 6 Y 4 M s v e T P z P 8 z O M r n u 5 U G m G X L H F o i i T B B M y + 5 8 M h O Y M 5 c Q S y r S w t x I 2 o p o y t C l V b A j e 8 s u r p H 1 R 9 9 y 6 9 3 B Z a 9 w U c Z T h B E 7 h H D y 4 g g b c Q R N a w C C B Z 3 i F N w e d F + f d + V i 0 l p x i 5 h j + w P n 8 A f T h k Q Y = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 x b B h C N A O P A I b G N y h G R o P c S V z s Q = " > A A A B 8 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 I S t l s N + 3 S z S b s T o Q S + j O 8 e F D E q 7 / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O t + O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N k m m G W + x R C a 6 G 1 L D p V C 8 h Q I l 7 6 a a 0 z i U v B O O b 2 d + 5 4 l r I x L 1 i J O U 9 2 I 6 V C I S j K K V / G B E M Q / C i N x P + 9 W a W 3 f n I K v E K 0 g N C j T 7 1 a 9 g k L A s 5 g q Z p M b 4 n p t i L 6 c a B Z N 8 W g k y w 1 P K x n T I f U s V j b n p 5 f O T p + T M K g M S J d q W Q j J X f 0 / k N D Z m E o e 2 M 6 Y 4 M s v e T P z P 8 z O M r n u 5 U G m G X L H F o i i T B B M y + 5 8 M h O Y M 5 c Q S y r S w t x I 2 o p o y t C l V b A j e 8 s u r p H 1 R 9 9 y 6 9 3 B Z a 9 w U c Z T h B E 7 h H D y 4 g g b c Q R N a w C C B Z 3 i F N w e d F + f d + V i 0 l p x i 5 h j + w P n 8 A f T h k Q Y = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 x b B h C N A O P A I b G N y h G R o P c S V z s Q = " > A A A B 8 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 I S t l s N + 3 S z S b s T o Q S + j O 8 e F D E q 7 / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O t + O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N k m m G W + x R C a 6 G 1 L D p V C 8 h Q I l 7 6 a a 0 z i U v B O O b 2 d + 5 4 l r I x L 1 i J O U 9 2 I 6 V C I S j K K V / G B E M Q / C i N x P + 9 W a W 3 f n I K v E K 0 g N C j T 7 1 a 9 g k L A s 5 g q Z p M b 4 n p t i L 6 c a B Z N 8 W g k y w 1 P K x n T I f U s V j b n p 5 f O T p + T M K g M S J d q W Q j J X f 0 / k N D Z m E o e 2 M 6 Y 4 M s v e T P z P 8 z O M r n u 5 U G m G X L H F o i i T B B M y + 5 8 M h O Y M 5 c Q S y r S w t x I 2 o p o y t C l V b A j e 8 s u r p H 1 R 9 9 y 6 9 3 B Z a 9 w U c Z T h B E 7 h H D y 4 g g b c Q R N a w C C B Z 3 i F N w e d F + f d + V i 0 l p x i 5 h j + w P n 8 A f T h k Q Y = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 x b B h C N A O P A I b G N y h G R o P c S V z s Q = " > A A A B 8 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 I S t l s N + 3 S z S b s T o Q S + j O 8 e F D E q 7 / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O t + O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N k m m G W + x R C a 6 G 1 L D p V C 8 h Q I l 7 6 a a 0 z i U v B O O b 2 d + 5 4 l r I x L 1 i J O U 9 2 I 6 V C I S j K K V / G B E M Q / C i N x P + 9 W a W 3 f n I K v E K 0 g N C j T 7 1 a 9 g k L A s 5 g q Z p M b 4 n p t i L 6 c a B Z N 8 W g k y w 1 P K x n T I f U s V j b n p 5 f O T p + T M K g M S J d q W Q j J X f 0 / k N D Z m E o e 2 M 6 Y 4 M s v e T P z P 8 z O M r n u 5 U G m G X L H F o i i T B B M y + 5 8 M h O Y M 5 c Q S y r S w t x I 2 o p o y t C l V b A j e 8 s u r p H 1 R 9 9 y 6 9 3 B Z a 9 w U c Z T h B E 7 h H D y 4 g g b c Q R N a w C C B Z 3 i F N w e d F + f d + V i 0 l p x i 5 h j + w P n 8 A f T h k Q Y = < / l a t e x i t > original image I < l a t e x i t s h a 1 _ b a s e 6 4 = " l x N s c Z L v k e c B 1 r m L t N k J 5 7 e n f l k = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 o Q 9 l s N + 3 S z S b s T o Q S + i O 8 e F D E q 7 / H m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O d 9 O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / / A P T x q m S T T j D d Z I h P d C a n h U i j e R I G S d 1 L N a R x K 3 g 7 H t z O / / c S 1 E Y l 6 x E n K g 5 g O l Y g E o 2 i l d t 4 L I 3 I / 7 bt V r + b N Q V a J X 5 A q F G j 0 3 a / e I G F Z z B U y S Y 3 p + l 6 K Q U 4 1 C i b 5 t N L L D E 8 p G 9 M h 7 1 q q a M x N k M / P n Z I z q w x I l G h b C s l c / T 2 R 0 9 i Y S R z a z p j i y C x 7 M / E / r 5 t h d B 3 k Q q U Z c s U W i 6 J M E k z I 7 H c y E J o z l B N L K N P C 3 k r Y i G r K 0 C Z U s S H 4 y y + v k t Z F z f d q / s N l t X 5 T x F G G E z i F c /D h C u p w B w 1 o A o M x P M M r v D m p 8 + K 8 O x + L 1 p J T z B z D H z i f P + e Q j 0 U = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " l x N s c Z L v k e c B 1 r m L t N k J 5 7 e n f l k = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 o Q 9 l s N + 3 S z S b s T o Q S + i O 8 e F D E q 7 / H m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O d 9 O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / / A P T x q m S T T j D d Z I h P d C a n h U i j e R I G S d 1 L N a R x K 3 g 7 H t z O / / c S 1 E Y l 6 x E n K g 5 g O l Y g E o 2 i l d t 4 L I 3 I / 7 b t V r + b N Q V a J X 5 A q F G j 0 3 a / e I G F Z z B U y S Y 3 p + l 6 K Q U 4 1 C i b 5 t N L L D E 8 p G 9 M h 7 1 q q a M x N k M / P n Z I z q w x I l G h b C s l c / T 2 R 0 9 i Y S R z a z p j i y C x 7 M / E / r 5 t h d B 3 k Q q U Z c s U W i 6 J M E k z I 7 H c y E J o z l B N L K N P C 3 k r Y i G r K 0 C Z U s S H 4 y y + v k t Z F z f d q / s N l t X 5 T x F G G E z i F c / D h C u p w B w 1 o A o M x P M M r v D m p 8 + K 8 O x + L 1 p J T z B z D H z i f P + e Q j 0 U = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " l x N s c Z L v k e c B 1 r m L t N k J 5 7 e n f l k = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 o Q 9 l s N + 3 S z S b s T o Q S + i O 8 e F D E q 7 / H m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O d 9 O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / / A P T x q m S T T j D d Z I h P d C a n h U i j e R I G S d 1 L N a R x K 3 g 7 H t z O / / c S 1 E Y l 6 x E n K g 5 g O l Y g E o 2 i l d t 4 L I 3 I / 7 b t V r + b N Q V a J X 5 A q F G j 0 3 a / e I G F Z z B U y S Y 3 p + l 6 K Q U 4 1 C i b 5 t N L L D E 8 p G 9 M h 7 1 q q a M x N k M / P n Z I z q w x I l G h b C s l c / T 2 R 0 9 i Y S R z a z p j i y C x 7 M / E / r 5 t h d B 3 k Q q U Z c s U W i 6 J M E k z I 7 H c y E J o z l B N L K N P C 3 k r Y i G r K 0 C Z U s S H 4 y y + v k t Z F z f d q / s N l t X 5 T x F G G E z i F c / D h C u p w B w 1 o A o M x P M M r v D m p 8 + K 8 O x + L 1 p J T z B z D H z i f P + e Q j 0 U = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " l x N s c Z L v k e c B 1 r m L t N k J 5 7 e n f l k = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 9 4 q 2 A 9 o Q 9 l s N + 3 S z S b s T o Q S + i O 8 e F D E q 7 / H m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O d 9 O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / / A P T x q m S T T j D d Z I h P d C a n h U i j e R I G S d 1 L N a R x K 3 g 7 H t z O / / c S 1 E Y l 6 x E n K g 5 g O l Y g E o 2 i l d t 4 L I 3 I / 7 b t V r + b N Q V a J X 5 A q F G j 0 3 a / e I G F Z z B U y S Y 3 p + l 6 K Q U 4 1 C i b 5 t N L L D E 8 p G 9 M h 7 1 q q a M x N k M / P n Z I z q w x I l G h b C s l c / T 2 R 0 9 i Y S R z a z p j i y C x 7 M / E / r 5 t h d B 3 k Q q U Z c s U W i 6 J M E k z I 7 H c y E J o z l B N L K N P C 3 k r Y i G r K 0 C Z U s S H 4 y y + v k t Z F z f d q / s N l t X 5 T x F G G E z i F c / D h C u p w B w 1 o A o M x P M M r v D m p 8 + K 8 O x + L 1 p J T z B z D H z i f P + e Q j 0 U = < / l a t e x i t > y = (`, Î) < l a t e x i t s h a 1 _ b a s e 6 4 = " 4 J M y P g n x D / 6 z g n a M b f S D X i 3 w W / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " A c 9 2 U Y Y k 2 X T d 9 n c Y A 3 m W + c Z + g f E = " > A A A C B X i c b V D L S s N A F J 3 U V 6 2 v q E t d D C 1 C R S m J G 9 0 I R T e 6 q 2 A f 0 I Q y m U 7 a o Z N J m J k I I W T j x h / w I 9 y 4 U M S t / + C u f + M 0 7 U J b D 1 w 4 n H M v 9 9 7 j R Y x K Z V l j o 7 C 0 v L K 6 V l w v b W x u b e + Y u 3 s t G c Y C k y Y O W S g 6 H p K E U U 6 a i i p G O p E g K P A Y a X u j 6 4 n f f i B C 0 p D f q y Q i b o A G n P o U I 6 W l n n m Y O p 4 P k w x e w q p D G D t 1 h k j l 2 m 1 2 3 D M r V s 3 K A R e J P S O V e t k 5 e R 7 X k 0 b P / H b 6 IY 4 D w h V m S M q u b U X K T Z F Q F D O S l Z x Y k g j h E R q Q r q Y c B U S 6 a f 5 F B o + 0 0 o d + K H R x B X P 1 9 0 S K A i m T w N O d A V J D O e 9 N x P + 8 b q z 8 C z e l P I o V 4 X i 6 y I 8 Z V C G c R A L 7 V B C s W K I J w o L q W y E e I o G w 0 s G V d A j 2 / M u L p H V W s 6 2 a f a f T u A J T F M E B K I M q s M E 5 q I M b 0 A B N g M E j e A F v4 N 1 4 M l 6 N D + N z 2 l o w Z j P 7 4 A + M r x 9 q V J o 0 < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " A c 9 2 U Y Y k 2 X T d 9 n c Y A 3 m W + c Z + g f E = " > A A A C B X i c b V D L S s N A F J 3 U V 6 2 v q E t d D C 1 C R S m J G 9 0 I R T e 6 q 2 A f 0 I Q y m U 7 a o Z N J m J k I I W T j x h / w I 9 y 4 U M S t / + C u f + M 0 7 U J b D 1 w 4 n H M v 9 9 7 j R Y x K Z V l j o 7 C 0 v L K 6 V l w v b W x u b e + Y u 3 s t G c Y C k y Y O W S g 6 H p K E U U 6 a i i p G O p E g K P A Y a X u j 6 4 n f f i B C 0 p D f q y Q i b o A G n P o U I 6 W l n n m Y O p 4 P k w x e w q p D G D t 1 h k j l 2 m 1 2 3 D M r V s 3 K A R e J P S O V e t k 5 e R 7 X k 0 b P / H b 6 I Y 4 D w h V m S M q u b U X K T Z F Q F D O S l Z x Y k g j h E R q Q r q Y c B U S 6 a f 5 F B o + 0 0 o d + K H R x B X P 1 9 0 S K A i m T w N O d A V J D O e 9 N x P + 8 b q z 8 C z e l P I o V 4 X i 6 y I 8 Z V C G c R A L 7 V B C s W K I J w o L q W y E e I o G w 0 s G V d A j 2 / M u L p H V W s 6 2 a f a f T u A J T F M E B K I M q s M E 5 q I M b 0 A B N g M E j e A F v 4 N 1 4 M l 6 N D + N z 2 l o w Z j P 7 4 A + M r x 9 q V J o 0 < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 9 8 z 6 U D / 0 h s 2 1 n 9 w 5 9 8 4 T b P Q 1 g M X D u f c y 7 3 3 e L H g C i z r 2 y g t L a + s r p X X K x u b W 9 s 7 5 u 5 e W 0 W J p Kx F I x H J r k c U E z x k L e A g W D e W j A S e Y B 1 v f D 3 1 O w 9 M K h 6 F 9 z C J m R u Q Y c h 9 T g l o q W 8 e p o 7 n 4 0 m G L 3 H N Y U K c O i M C u X a b n f T N q l W 3 c u B F Y h e k i g o 0 + + a X M 4 h o E r A Q q C B K 9 W w r B j c l E j g V L K s 4 i W I x o W M y Z D 1 N Q x I w 5 a b 5 F x k + 1 s o A + 5 H U F Q L O 1 d 8 T K Q m U m g S e 7 g w I j N S 8 N x X / 8 3 o J + B d u y s M 4 A R b S 2 S I / E R g i P I 0 E D 7 h k F M R E E 0 I l 1 7 d i O i K S U N D B V X Q I 9 v z L i 6 R 9 V r e t u n 1 n V R t X RR x l d I C O U A 3 Z 6 B w 1 0 A 1 q o h a i 6 B E 9 o 1 f 0 Z j w Z L 8 a 7 8 T F r L R n F z D 7 6 A + P z B 0 5 r l y U = < / l a t e x i t > y 0 = E c y < l a t e x i t s h a 1 _ b a s e 6 4 = " y C

Figure 12 :
Figure 12: Schematic representation of the concatenated code.

Figure 13 :
Figure 13: the success rates in the tasks of gender classification and face recognition are here displayed -on the left-as functions of the number P of principal components; in the right column the same data (except for the geometric coordinates) are shown in a close-up.

Figure 14 :
Figure 14: The same as figures 3,4 but with the addition of the test-set texture gap per sample G 1 /N computed with the shrinkage regularisation method.

Figure 15 : 2 1
Figure 15: First five principal axes of the concatenated code Rc (five largest-eigenvalue eigenvectors of C (c) ).The j-th column represents the j = th eigenvector.In particular, the i-th row of the table represents the points that have all the coordinates,in the base of the principal axes, equal to zero except the i-th one, that ranges from −2σ (left) to −2σ (right); σ is taken equal to the square root of the largest eigenvalue λ 1 of the correlation matrix.In other words, the image, say I in the i-th row, j-th column is obtained by de-uniformation ( , Î) → ( , I), where and Î are obtained as: ( , Î) = y = E † • y , and where y is the vector that exhibits null principal components except by the i-th, y i = (j − 3)λ 1/2 1 , and E is the matrix of row eigenvectors of C (c) .