Fusion Tensor Subspace Transformation Framework

Tensor subspace transformation, a commonly used subspace transformation technique, has gained more and more popularity over the past few years because many objects in the real world can be naturally represented as multidimensional arrays, i.e. tensors. For example, a RGB facial image can be represented as a three-dimensional array (or 3rd-order tensor). The first two dimensionalities (or modes) represent the facial spatial information and the third dimensionality (or mode) represents the color space information. Each mode of the tensor may express a different semantic meaning. Thus different transformation strategies should be applied to different modes of the tensor according to their semantic meanings to obtain the best performance. To the best of our knowledge, there are no existing tensor subspace transformation algorithm which implements different transformation strategies on different modes of a tensor accordingly. In this paper, we propose a fusion tensor subspace transformation framework, a novel idea where different transformation strategies are implemented on separate modes of a tensor. Under the framework, we propose the Fusion Tensor Color Space (FTCS) model for face recognition.


Introduction
Subspace transformation (or subspace analysis [1]), a main type of feature extraction, has gained huge popularity over the past few years. Principal Component Analysis (PCA) [2] seeks the optimal projection directions according to maximal variances. Linear Discriminant Analysis (LDA) [3] uses discriminant information to search for the directions which are most effective for discrimination by maximizing the ratio between the between-class and within-class scatters. Both PCA and LDA aim to preserve global structures of the samples. Locality Preserving Projections (LPP) [4] aims to preserve the local structure of the original space in the projective subspace. Discriminant Locality Preserving Projections (DLPP) [5] encodes discriminant information into LPP to further improve the discriminant performance of LPP for face recognition. These algorithms need to vectorize the objects (samples).
In the real world, however, many objects are naturally represented by multidimensional arrays, i.e., tensors, such as a color facial image used in face recognition (see Fig. 1). If these objects are vectorized, their natural structure information will be lost [6]. As such, a great deal of interests are aroused in the field of tensor [7][8] [9][10] [11]. Among the subspace transformation techniques, tensor subspace transformation has also become a highly discussed topic. Multilinear Principal Component Analysis (MPCA) [12], a tensor version of PCA, applies PCA transformation on each mode (or dimensionality) of tensors. Similarly, Discriminant Analysis with Tensor Representation (DATER) [13], General Tensor Discriminant Analysis (GTDA) [14], Tensor Subspace Analysis (TSA) [15], and Discriminant Tensor Subspace Analysis (DTSA) [16] apply LDA, Maximum Scatter Difference (MSD) [17], LPP, and DLPP to transform each mode of tensors, respectively. These tensor subspace transformation methods use a certain vector subspace transformation method to transform every modes of tensors.
However, each mode of tensors may express a different semantic meaning. For example, a color facial image can be treated as a 3rd-order tensor, where mode-1 and mode-2 represent the facial spatial information and mode-3 representing the color space information (see Fig. 1). The facial spatial information and color space information are two different types of information, which should be handled by two different transformations to obtain better performance. In other words, for color facial images, we should implement a transformation strategy on the first two modes and another transformation strategy on the third mode. As such, each type of information should implement a transformation strategy best suited for the semantic meaning.
To the best of our knowledge, there are no existing tensor subspace transformation algorithm, which implements different transformation strategies on different modes of tensors according to their semantic meanings. To address this problem, we propose the fusion tensor subspace transformation framework, which shows an novel idea that different transformation strategies can be implemented on different modes of tensors. Under the framework, we propose the Fusion Tensor Color Space (FTCS) model for face recognition.

Tensor Fundamentals and Denotations
A tensor is a multidimensional array. It is the higher-order generalization of scaler (zero-order tensor), vector (1st-order tensor), and matrix (2nd-order tensor). In this paper, lowercase italic letters (a, b,...) denote scalars, bold lowercase letters (a, b,...) denote vectors, bold uppercase letters (A, B,...) denote matrices, and calligraphic uppercase letters (A, B,...) denote tensors. The formal definition is given below [18]: We can generalize the product of two matrices to the product of a tensor and a matrix.
By using tensor decomposition, any tensor A can be expressed as the product where U n , n~1,2, . . . ,N, is an orthonormal matrix and contains the ordered principal components for the nth mode. C is called the core tensor. Unfolding the above equation, we have where operator 6 is the Kronecker product of the matrices.
The Connection among PCA, 2D-PCA and MPCA Before introducing the fusion tensor subspace transformation framework, we firstly investigate the connection among PCA, 2D-  OUTPUT: U n , n = 1,2,…,N Initialize U n with a set of identity matrices; for t = 1 to T max do for n = 1 to N do Y i(n) r the mode-n unfolding matrix of Y (n) i ; end for n-th transformation on M matrices Y i(n) to obtain U n ; (*) end for if t.2 and EU n {U pre n E 2 ve n ,n~1,2, . . . ,N, where U pre n is U n in the previous iteration.   PCA [19] and MPCA. From the previous section, we know that a tensor is the higher-order generalization of scaler, vector and matrix. Similarly, MPCA is the higher-order generalization of PCA and 2D-PCA.
such that Y i ,(i~1,2, . . . ,M) captures most of the variations observed in the original tensor objects X i .
where X X { (n) denotes the mode-n unfolding matrix of the mean values of X , X m(n) denotes the mode-n unfolding matrix of the mean values of X m and In Eq. (7), U W (n) is to use the fixed N{1 projection matrices If we only transform mode-2, U 1 , is an identity matrix of size I 1 . Then,U W 2 ð Þ is also an identity matrix and X m 2

ð Þ~X
T . In this case, Eq. (7) is simplified to Eq. (10) is exactly the image covariance (scatter) matrix G t in 2D-PCA [19]. So, 2D-PCA is a special case of MPCA. When objects are represented by matrices and only rows of matrices need to be transformed, MPCA degenerates into 2D-PCA.
When X i ,(i~1,2, . . . ,M) are 1st-order tensors, Eq. (6) is simplified to In this case, Eq (7) is simplified to The above equation is exactly the scatter matrix in PCA. So, PCA is a special case of MPCA. When objects are represented by vectors, MPCA degenerates into PCA.
Following the above analysis, 2D-PCA applies PCA transformation on rows of matrices, and MPCA applies PCA transformation on all modes of tensors.
Similarly, through the above analysis, one can notice that DATER also applies LDA transformation on all modes of tensors. Likewise, GTDA, TSA and DTSA also applies MSD, LPP and DLPP transformation on all modes of tensors respectively. There are several other tensor subspace transformation methods that also applies a single type of transformation on all modes of tensors, however due to page limit we chose to only mention a portion of these algorithms.

Fusion tensor subspace transformation framework
Tensor subspace transformation method firstly initializes N projection matrices U 1 , . . . ,U n{1 ,U n ,U nz1 , . . . ,U N as identity matrices or random matrices, then fixes N{1 projection matrices U 1 , . . . ,U n{1 ,U nz1 , . . . ,U N . Following, the matrices are used to transform X i , and the transformed results are unfolded on moden. Finally, U n is obtained by implementing a certain transformation on the mode-n unfolding matrices. We can see that the solution of U n depends on the other projection matrices. N projection matrices are solved by constructing an iterative procedure.
Existing tensor subspace transformation methods only implement one transformation strategy on all modes. In the real world, each mode of tensors may represent a different type of information. We should implement different transformation strategies on different modes according to their semantic meaning. So we propose a Fusion Tensor Subspace Transformation (FTSA) framework, which is described in Table 1. In Table 1, the statement denoted (*) is the core statement in the framework. For a certain represented object, different transformations are used on different modes according to their semantic meaning.
In the algorithm, we use the maximal iterative times T max to deal with the problem that the algorithms may be not convergent. Actually, the convergence of many tensor subspace analysis algorithms cannot be generally proved, the classification results based on these algorithms show to be stable after rounds of iterations as illustrated in these papers (e.g. DATER, 2D LDA [20]). The convergence of FTSA depends on the specific transformations. Fusion tensor color space model Recently, researches showed that color information may help to improve the face recognition accuracy. While, the R, G, and B component images in the RGB color space are correlated. Decorrelation among the components of these images helps reduce redundancy and is an important strategy to improve the accuracy of subsequent recognition method [21]. Liu [22] proposed the Uncorrelated Color Space (UCS), the Independent Color Space (ICS), and the Discriminating Color Space (DCS). Specifically, the UCS applies PCA to decorrelate the R, G, and B component images. The ICS and DCS further enhance the discriminating power for the subsequent recognition method by means of Independent Component Analysis (ICA [23]) and LDA, respectively. The experimental results showed that ICS obtains the best color space because its components are not only uncorrelated but also independent.
Many papers have reported that the discriminant analysis methods on facial images can enhance subsequent recognition method [3] [5]. Color Image Discriminant model (CID) [24], borrowing the idea of LDA, aims to seek an optimal color space and an effective recognition method of color images in a unified framework. Tensor Discriminant Color Space (TDCS) [25] model, borrowed the idea of DATER [13], seeks two discriminant projection matrices U 1 , U 2 corresponding to the facial spatial information and one color space transformation matrix U 3 corresponding to the color space. Actually, TDCS uses LDA   transformation on both facial spatial information and color space information. They [26] also used elastic net to propose Sparse Tensor Discriminant Color Space (STDCS).
For color space information, however, ICA transformation is better than LDA transformation [22]. Motivated by the insights, we explore a Fusion Tensor Color Space (FTCS) model which applies discriminant analysis on the facial spatial information and applies ICA on the color space information.
A color facial image is naturally represented by a 3rd-order tensor, where mode-1 and mode-2 of a tensor are facial spatial information and mode-3 of tensor is the color space information. For instance, a RGB image with size I 1 |I 2 is represented as a tensor A[R I1|I2|I3 , where I 3~3 . The mode-3 of A is the color variable in the RGB color space which has 3 components corresponding to R, G and B in RGB space. FTCS uses LDA on the first two modes and ICA on the third mode.
Assuming C is the number of individuals, X c i is the ith color facial image of the cth individual, and M c is the number of color facial images of the cth individual, where M~M 1 zM 2 z . . . zM C . the FTCS algorithm seeks two discriminant projection matrices where U 1 and U 2 are obtained by using discriminant analysis and U 3 is obtained by using ICA. The mean image of the c-th individual and the mean image of all individuals are defined by: The between-class scatter and within-class of color images are defined as: and We can define mode-n between-class scatter matrix S (n) b and mode-n within-class scatter matrix S (n) w as: and whereŨ U n~UN 6 . . . 6U nz1 6U n{1 6 . . . 6U 1 , n~1,2,3. Then, the between-class scatter of the projected tensors Y b (Y) and the within-class scatter of the projected tensors Y w (Y) can be rewritten as follows: and So, given U 2 and U 3 (or U 1 ,U 3 ), U 1 (or U 2 ) can be obtained by the following discriminant analysis: According to Rayleigh quotient, Eq. (21) is maximized if and only if the matrix U n consists of L n generalized eigenvectors, which corresponds to the largest L n generalized eigenvalues of the matrix pencil (S (n) b ,S (n) w ), which satisfies: Since S (n) b and S (n) w are dependent on U 1 , . . . ,U n{1 ,U nz1 , . . . ,U N , we can see that the optimization of U n depends on the projections of other modes.
In order to obtain U 3 , we use ICA (For ICA operations, we used Hyvarinen's fixed-point algorithmhttp://www.cis.hut.fi/projects/ ica/fastica/) to decorrelate the RGB color space. we use U 1 and U 2 , which are obtained through the above discriminant analysis, to transform: i are concatenated to a 4th-order tensor F [R L 1 |L 2 |I 3 |M . The mode-3 unfolding matrix F (3) is a 3|K matrix, where K~L 1 |L 2 |M and the three rows of F (3) corresponding to the three components in RGB space, respectively.
The color space transformation matrix U 3 may be derived using ICA on F (3) . The ICA of F (3) factorizes the covariance matrix S F into the following form: where +[R 3|3 is diagonal real positive and U 3 transforms RGB color space to a new color space whose three components are independent or the most independent three component possible. The U 3 in Eq. (24) may be derived using Comon's ICA algorithm by calculating mutual information and high-order statistics. As a result, an iterative procedure can be constructed to obtain U 1 , U 2 and U 3 .

Experiments and results on the AR database
The AR database contains over 4,000 color facial images of 126 people. Each individual participated in two photo sessions. In both sessions, the pictures were taken under identical requirements and conditions. In our experiments, we selected 100 people, where 14 images of each individual are selected and occluded face images are excluded. These facial images have been cropped [27] and can be downloaded from the AR face database official web (http:// www2.ece.ohio-state.edu/ aleix/ARdatabase.html). All images are cropped and resized to 32|32 pixels. The sample images for one individual of the AR database are shown in Fig. 2, where the images on the top row are from the first session as the training set, and the images on the bottom row are from the second session as the testing set.
In this experiment, we trained FTCS, TDCS (Although STDCS [26] is better than TDCS, we still did not compare FTCS to STDCS. Because the motivation of the paper is to implement different transformations on different modes of a tensor.) and CID. The convergence threshold e was set as 0:1 and Using these three matrices, we obtained three color components D 1 , D 2 , D 3 of CID; three color components T 1 , T 2 , T 3 of TDCS and three color components F 1 , F 2 , F 3 of FTCS (see Fig. 3).
Meanwhile, we carried out LDA and 2D-LDA on corresponding gray images. In LDA and CID, only 99 discriminant projection basis vectors were extracted. For 2D-LDA, TDCS and FTCS, the spatial dimensions of the two modes are both reduced to 10. The score matrices were generated by Manhattan distance and Euclidean distance, respectively. The ROC curves of the five methods are shown in Fig. 4. The results indicate that the performance of FTCS with Manhattan distance obtains the best performance. However, the space between two curves of FTCS is narrower than the space between two curves of TDCS. This shows that FTCS is more robust to the type of distance used and results of both Manhattan and Euclidean distance produces closer results than those of TDCS.
In the five algorithms, LDA and CID are used on vectorized face images. Overall, their performances are poorer than the other three algorithms based on tensorized face images. This shows that the facial spatial structure information is important to the face recognition. Specially, when the false accept rate is less than 0:03, the performance of 2D-LDA outperforms that of TDCS with the color information. In the case, the color information fails to work for face recognition. This is due to the fact that the color space information is transformed by LDA, which is not an optimal transformation for color space information in comparison to ICA. Whereas, FTCS uses ICA on the color space information. As a results, FTCS obtains the best performance. Table 2 list the verification rates of the five methods with 0.1 FAR. In both cases of Manhattan distance and Euclidean distance, FTCS gets the best verification rate among the five methods. For Manhattan distance, 2D-LDA without the color information is better than TDCS with the color information. Whereas, FTCS outperforms 2D-LDA. This also shows that ICA is better than LDA to transform the color space information.
Experiments and results on the LFW face database These components are illustrated in Fig. 6. These three matrices are not the same as Eq. (25), Eq. (26) and Eq. (27) due to the different training sets. Using these three matrices, we got three color components D 1 , D 2 , D 3 of CID; three color components T 1 , T 2 , T 3 of TDCS and three color components F 1 , F 2 , F 3 of FTCS.

Discussion
Recently, tensor subspace transformation is a highly mentioned topic, because many real objects can be represented by tensors. For different objects, the semantic meaning of tensorial modes are different. Even when the objects are the same, each mode of tensors may express a different semantic meaning. To the best of our knowledge, there aren't any existing tensor subspace transformation algorithms which implements different transformation strategies on different mode of tensors according to their semantic meaning. In this paper, we propose the fusion tensor subspace analysis framework, which shows an novel idea that different transformation strategies can applied on different modes of tensor. Under the framework, we propose FTCS for face recognition. The experimental results show the performances of the proposed algorithm is better than existing tensor subspace transformation algorithms. FTCS is only an example of fusion tensor subspace transformation framework. Under the framework, many algorithms can be developed for action recognition, microexpression recognition, EEG recognition and so on.