Tensor decomposition-based unsupervised feature extraction applied to matrix products for multi-view data processing

In the current era of big data, the amount of data available is continuously increasing. Both the number and types of samples, or features, are on the rise. The mixing of distinct features often makes interpretation more difficult. However, separate analysis of individual types requires subsequent integration. A tensor is a useful framework to deal with distinct types of features in an integrated manner without mixing them. On the other hand, tensor data is not easy to obtain since it requires the measurements of huge numbers of combinations of distinct features; if there are m kinds of features, each of which has N dimensions, the number of measurements needed are as many as Nm, which is often too large to measure. In this paper, I propose a new method where a tensor is generated from individual features without combinatorial measurements, and the generated tensor was decomposed back to matrices, by which unsupervised feature extraction was performed. In order to demonstrate the usefulness of the proposed strategy, it was applied to synthetic data, as well as three omics datasets. It outperformed other matrix-based methodologies.


Introduction
In the current era of big data, it is often that massive datasets are obtained, including samples with many features. For example, a video dataset can be regarded as time points (samples) vs pixels (features). Audio files consist of time points (samples) vs amplitude (features), and sets of DNA sequences consist of individuals (samples) vs nuclear acid sequences (features). All of these are provided in the form of a matrix, whose rows and columns are features and samples, respectively (of course, rows and columns are exchangeable). Although processing massive datasets is itself problematic, integrating distinct types of datasets is even more difficult. This problem is often annotated as multi-view data processing. For example, an audio visual file can be regarded as time points (samples) vs pixels and amplitudes(both features). In this paper, I consider two types of specific multi-view data processing: one is sharing samples (hereinafter called Case I) and another is sharing features (hereinafter, called Case II). It is formally possible to deal with these two cases as unified; multiple (m > 0) views of data, i.e., X (k) , k = 1, . . . m, each of which is N k features times M samples shared with multiple views (Case I), a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 can be regarded as a (∑ k N k ) × M matrix where X T is the transposed matrix of X, while, if X (k) , k = 1, . . . m, are M k samples times N features shared with multiple views (Case II), can be regarded as a N × (∑ k M k ) matrix ðX ð1Þ ; X ð2Þ ; Á Á Á ; X ðmÞ Þ: ð2Þ For both cases, however, we are not sure what will happen by simply merging distinct features as one matrix.
In order to address this problem, a variety of multi-view data processes have been proposed [1,2]. Independent of the strategy to integrate multi-view datasets, there must be some weights attributed to each view. Since there are no a priori criteria to optimize these weights, some kind of artificial criteria are required. For example, if samples are classified, weights can be optimized so as to discriminate samples coincidentally from classes. Alternatively, if feature extraction is a task, weights can be optimized so as to generate the "best" features regardless of which features are considered good.
The reason weights are required for individual views is that we do not know whether the same weights are acceptable when simply creating new variables by merging or linearly combining them. Suppose x ðkÞ ij are the observed values attributed to the ith feature of the jth sample in the kth view. Generating a merged matrix is to have a matrix where x ðkÞ ij is placed at the (i+∑ k 0 < k−1 N k 0 )th row and the jth column, as introduced in Eq (1) (Case I). Alternatively, a merged matrix can be generated where x ðkÞ ij is placed at the ith row and the (j + ∑ k 0 < k−1 M k 0 )th column, as introduced in Eq (2) (Case II). This is not necessarily as simple as it may appear. For example, in Case I, if N k varies drastically from view to view, the results may be dominated by the views with the maximum number of features. However, it is not clear if views with more features are more important. Alternatively, if the new feature x 0 ij ¼ P i;ðkÞ C ðkÞ i x ðkÞ ij is generated with the linear combination where C ðkÞ i s are coefficient, there are similar problems. If C ðkÞ i s do not vary dependent upon (k), views with more features may dominate the outcome. Reverting row and column, I will discuss Case II as well. In order to compensate for this discrepancy, each view must be weighted based on criteria which are not naturally unique. No previously proposed strategies were free from this problem.

Converting multi-view matrices into a tensor with multiplication
If we generate a new feature with neither summation nor merging, the product for Case I -x i 1 ;i 2 ;:::;i m ;j ¼ Q m k¼1 x ðkÞ i k ;j -can be regarded as an (m+1) mode tensor. As each newly generated feature is composed of one feature from individual views, no weight is needed. Similarly, in Case II, x j 1 ;j 2 ;:::;j m ;i ¼ Q m k¼1 x ðkÞ i;j k , can be regarded as an (m+1) mode tensor. These tensors are hereinafter called Type 1.
Alternatively, instead of simply multiplying matrix components with the shared columns or rows, they can be summed up as follows:x i 1 ;i 2 ;:::;i m ¼ P j Q m k¼1 x ðkÞ i k ;j (Case I) andx j 1 ;j 2 ;:::;j m ¼ P i Q m k¼1 x ðkÞ i;j k (Case II). These can be regarded as m-mode tensors and are hereinafter called Type II. All variables associated with Type II tensors are written with tildes.
These newly generated m-mode (type II) or (m+1)-mode (type I) tensors can be processed using any kind of tensor manipulation. For example, for a reduced number of features whose combination can express tensors, TD can be used to gain such features.
In the following subsection, I consider four combinations of types and cases, i.e., type I or II tensors for Case I or II multi-view data, case by case.

Definition and terminology of TD
Since TD is not a popular methodology and the usage of TD for FE is rare, I will briefly introduce TD in this subsection.
TD is the expansion of tensor x n 1 ,n 2 ,. . .,n m , n k = 1, . . ., N k , 1 k m in the form x n 1 ;n 2 ;...;n m ¼ where x n k ,ℓ k , 1 k m, are orthogonal matrices. Since x n 1 ,n2,. . .,n m is as large as G(n 1 , n 2 , . . ., n m ), this formula is clearly overcomplete and does not give unique expansion. In this study, in order to decide G(n 1 , n 2 , . . ., n m ), x n k ,ℓ k , 1 k m uniquely, I employ the higher order singular value decomposition (HOSVD) algorithm [23], which has successfully used to analyse microarrays [24] previously. G(n 1 , n 2 , . . ., n m ) is a core matrix. x n k ,ℓ k , 1 k m, are singular value matrices and their column vectors are singular value vectors. G(n 1 , n 2 , . . ., n m ), having larger absolute values, has more contribution to x n 1 ,n 2 ,. . .,n m . Since the combination of x n k ,ℓ k , 1 k m, associated with G(n 1 , n 2 , . . ., n m ) to which larger absolute values were attributed contributes more collectively to x n 1 ,n 2 ,. . .,n m , they are more likely to be associated with one another. For type I tensors this expression is straightforward. (m+1) modes correspond to m+1 components, i 1 , i 2 , . . ., i m , j (Case I) or j 1 , j 2 , . . ., j m , i (Case II), respectively. On the other hand, for type II tensors, m modes correspond to m components, i 1 , i 2 , . . ., i m (Case I) or j 1 , j 2 , . . ., j m (Case II), respectively. Thus, singular value vectors for jth sample (Case I) or ith feature (Case II) are missing. These can be computed viax ðkÞ

The relation to HO GSVD
Higher order generalized singular value decomposition [25] (HO GSVD) is the method that corresponds to singular value vectors when TD is applied to type I tensors. As HO GSVD converts X (k) = U (k) ΛV T where U (k) , Λ and V are the N k × M left singular value matrix, M × M eigenvalue matrix, and the M × M right singular value matrix, U (k) are regarded as feature singular value matrices and V is regarded as a unique (common) sample singular value matrix in the present implementation.

Synthetic dataset
The synthetic dataset used for demonstrating the usefulness of TD based unsupervised FE is defined as: mRNA and miRNA expression profile mRNA and microRNA(miRNA) expression profiles of multi-omics data were downloaded from gene expression omnibus (GEO) using GEO ID GSE28884. At first, GSE28884_RAW.tar was downloaded and expanded. For mRNA, 161 files whose names ended by the string "c.txt. gz" were used. Each file was loaded into R by read.csv command and the second column named "M" was employed as mRNA expression values. Probes not associated with Human Genome Organisation (HUGO) gene names were discarded and 13393 probes were remained. For miRNA, 161 files whose names ended by the string "geo.txt.gz" and the corresponding samples of mRNA expression which were measured were used. Each file was loaded into R by read.csv command and the second column ("Count") was summed using the same third column ("Annotation") values. Sum totals of less than 10 were discarded. As a result, 755 features remained. Finally, the miRNA expression profile matrix is x miRNA i 2 ;j ; 1 i 2 755; 1 j 161, and the mRNA expression profile matrix is x mRNA i 1 ;j ; 1 i 1 13393; 1 j 161. mRNA expressions of epidermal growth factor (EGF) treated breast cancers were downloaded from GEO using GEO ID GSE84096. The file named GSE84096_series_matrix.txt.gz included in "Series Matrix File(s)" was downloaded. Gene expression was divided into 14 control samples and 14 EGF treated samples named x control i;t 1 and x EGF i;t 1 , respectively. mRNA expressions of vaccination experiments were downloaded from GEO using GEO ID GSE18323. Files named GSE18323-GPL570_series_matrix.txt.gz and GSE18323-GPL571_ser-ies_matrix.txt.gz included in "Series Matrix File(s)" were downloaded. As these included two distinct platforms, 22277 commonly included probes were used. Fifty eight samples annotated as "Protected group" (P) at time points T1 to T5, 52 samples annotated as "Delay group" (D) at time points T1 to T5, and 72 samples annotated as "Non-protected group" (NP) at time points T1 to T5, were used. They were named as x Pl i;t 1 , x D i;t 2 and x NP i;t 3 respectively. All expression profiles were standardized as ∑ i x ij = 0, and

PCA-based unsupervised FE
PCA. In contrast to the usual use of PCA, where samples are embedded, the genes were embedded in this implementation. Suppose x ij s satisfy P i x ij ¼ 0; P i x 2 ij =N ¼ 1 and X is a matrix whose elements are x ij . The gram matrix G is defined as G XX T . Eigenvectors u k ¼ ðu k1 and . . . ; u kN Þ T s (1 k min(M, N)) are then obtained as Gu k ¼ l k u k , where u ki is the kth PC score attributed to gene i and λ k s are the eigenvalues ordered as λ k ! λ k+1 . The kth PC loadings attributed to the jth sample v kj are defined as PCA-based unsupervised FE applied to mRNA/miRNA expression. First, the five initial PC loadings v ℓ 1 , j s (for mRNA) and v 0 ' 2 ;j s (for miRNA), 1 ℓ 1 , ℓ 2 5, were confirmed to have significant sample dependence (P-values < 0.05) with categorical regression, where C 0 ' k and C 1 ' k ;S (k = 1, 2) are regression coefficients. δ Sj takes 1 when jth sample belongs to category S, and isotherwise 0.
Then, assuming that PC scores u ℓ 1 ,i1 s (for mRNA) and u 0 ' 2 ;i 2 s (for miRNA) are normally distributed, the P-values were attributed to the i 1 th mRNA and i 2 th miRNA using a χ squared distribution as where s u ' 2 and s u 0 is the probability that the argument is larger than x under the assumption that the arguments obey a χ squared distribution. The P-values were further adjusted by the Benjamini-Hochberg (BH) criterion [26], and those genes associated with adjusted P-values less than 0.01 were selected as the genes associated with the difference between multiple classes.
PCA-based unsupervised FE applied to vaccination experiments. Although similar to the previous section, some differences are: (i) Only the second PC scores attributed to mRNAs were used for FE. (ii) Instead of mRNA and miRNA, patient category groups, P, D, and NP were used. (iii) mRNAs commonly included in three independent FEs performed considering P, D, and NP, were considered.

TD-based unsupervised FE
For type I tensor generated from multi-omics dataset, in order to identify miRNAs and mRNAs associated with identified sample singular value vectors, it was assumed that x mRNA follow multiple normal distributions, and P-values were attributed to the i 1 th mRNA and the i 2 th miRNA using χ 2 distribution.
where σ ℓ 1 (σ ℓ 2 ) are standard deviations of x mRNA ' 1 ;i 1 ðx miRNA ' 2 ;i 2 Þ. P i k s were adjusted by using the BH criterion, and mRNAs and miRNAs associated with the adjusted P-value lower than 0.01 for mRNA and 0.05 for miRNA were selected as those associated with identified sample singular value vectors when TD was applied to type I tensor.
When TD was applied to type II tensor, the computations were similar, excluding that adjusted P-value lower than 0.01 and 1 ℓ 2 5 were used for miRNA, where x mRNA ' 1 ;i 1 and x miRNA ' 2 ;i 2 were replaced withx mRNA ' 1 ;i 1 andx miRNA ' 2 ;i 2 , respectively. For EGF treated cell lines and vaccination experiments, similar procedures were repeated by replacing gene singular value vectors with x ℓ 3 = 2,i (EGF, type I) orx control ;i , (vaccination, type II), respectively. Where HO GSVD was applied to multi-omics datasets, the feature singular value matrices were replaced with the first five column vectors of U (k) (k = 1 for mRNA and k = 2 for miRNA).
Adjusted P values used as thresholds are always 0.01 for EGF treated cell lines, vaccination, and HO GSVD.
Conversion prob ID to HUGO gene name/ensembl gene ID/GENBANK accession ID Coincidences between prob ID (mRNA), HUGO gene names, Ensembl gene IDs were downloaded from GEO using GEO ID GPL3676 for multi-omics datasets, and GPL571/570 for vaccination experiments. GENBANK accession ID attributed to probes identified in the EGF treatment were extracted from GPL16686.

Enrichment analysis of g:profiler
Ensembl gene IDs were uploaded to g:profiler [27]. 13393 Ensembl gene IDs were used as background.
Enrichment analysis of genes identified as outliers using each of the first five mRNA singular value vectors obtained by applying TD based unsupervised FE to type I tensor generated from multi-omics datasets Five distinct P-values were attributed to the i 1 th mRNA using χ 2 distribution: P-values were adjusted using the BH criterion and mRNAs associated with adjusted P-values less than 0.01 were identified as outliers (see S2 Table). S2 Table was uploaded to g:Cocoa in g: profiler with "Gene Ontology/ Biological Process" specified as the targeted ontology.

Enrichment analysis of MSigDB
HUGO gene IDs or GENBANK accession IDs associated with identified mRNAs were uploaded to http://software.broadinstitute.org/gsea/msigdb/annotate.jsp (registration and login are needed). "CGP: chemical and genetic perturbations" was selected for multi-omics data and EGF treated cell lines, while "C7: immunologic signatures" was selected for vaccination experiments.

Correlations in Fig 2(c) and 2(g)
The correlations were computed between two vectors of the length T control +T EGF , (Fig 2(g)), where T control and T EGF are total number of samples in each treatment, respectively. Adjusted P-values attributed to the correlation coefficients were computed via the fdrtool [28] function in the fdrtool package in R [29].

Scaling and shifting prior to plotting Fig 2(d) and 2(h)
As each individual gene expression has its own base line and amplitude, they must be scaled and shifted before being overdrawn. To this end, the linear regression analysis  was employed (Fig 2(c)) where a i and b i are regression coefficients commonly used for control and EGF treated samples. For Fig 2( ;t 2 , respectively. Then, fitted values are used for plots. P-values that exhibit distinction between control and EGF treated sample at time point t were computed by two-sided t test between fa i x control i;t 1 ¼t þ b i j 1 i Ng and fa i x EGF i;t 2 ¼t þ b i j 1 i Ng within N = 558 (Fig 2(d)) or N = 398 (Fig 2(h)) selected mRNA probes.

Correlations in Fig 3(c)
Similar to Fig 2, the correlations were computed between two vectors of length T P +T D +T NP , where T P = 58, T D = 52, and T NP = 72 are the total number of samples in each patient category, respectively. Adjusted P-values were computed via the fdrtool function in the fdrtool package in R [29]. The linear regression analysis was employed, where a i and b i are regression coefficients commonly used for three patient categories (these values differ from those used in Fig 2). Fitted values are used for plots. P-values that exhibit distinction among three patient groups at time point t = 1, 2, 3, 4, 5 days were computed by categorical regression for 104 commonly selected mRNA probes in the three categories. C 0 t and C 1 tS 0 are regression coefficients fitted for fx S i;t j 1 i 104; S 2 ðP; D; NPÞg with fixed t.

Statistical analysis
All statistical analyses were performed in R [29]. HOSVD was performed using the HOSVD function in the rTensor package. PCA was performed using the prcomp function in R. SAM was performed using SAM function in siggenes package. limma was performed in limma function in the limma package. Adjusted P-values were computed by p.adjust function with "BH" options. P-values by χ 2 distribution was computed by pchisq function in R. Categorical regression was performed using the lm function in R. RF was performed using randomForest function in randomForest package. KCCA was performed by KCCA function in the kernlab package.

Results
A Work flow chart and list of the variables introduced are in S1 File.

Synthetic dataset
In order to demonstrate the efficacy of our strategy, I applied TD based unsupervised FE to synthetic data. In the following interpretation, I assumed two views of Case I for synthetic dataset, however interpreting it as Case II is straightforward, thus, I do not consider Case II specifically. The method applied to the synthetic dataset, TD based unsupervised FE, is the extension from the recently proposed PCA based unsupervised FE, which has been successfully applied to various bioinformatics problems [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22].
First, two matrices are generated, each of which is composed of N features times M samples. They are notated as X (k) , and k = 1, 2, respectively, whose components are denoted as x ðkÞ i k ;j , and k = 1, 2, respectively. The first N 0 row vectors (features), x ðkÞ i k ;j , and 1 i k N 0 , are the noise added linear combination (Fig 4(d) and 4(e)) of constant, linear, and half period sinusoidal function (Fig 4(a) and 4(c)). However, since the coefficients of linear combinations were selected such that the correlation between X (1) and X (2) is negligible, identifying a correlation between two matrix row vectors is usually impossible (Fig 4(f)). Remaining row vectors x ðkÞ i k ;j and N 0 < i k N are simply random number 2[0, 1]. The tasks are (i) identify N 0 ordered features, and (ii) identify latent correspondence between two views.
The three mode tensor (type I) shows the first three sample mode singular value vectors, x ℓ 3 ,j , ℓ 3 = 1, 2, 3, obtained with HOSVD, It is obvious that Thus, TD based unsupervised FE applied to type I tensors generated from matrices' products can not only decompose linear combinations, but can also identify the limited number of features, 1 i 1 , i 2 , N 0 , that contribute to correlations between two matrices, X (k) , k = 1, 2.
Although I could successfully demonstrate that my strategy works well, there is one drawback; it is the computationally extensive method, since its memory as well as computational time are proportional to MN 2 . In order to reduce computational resources as much as possible, I summed x ð1Þ i 1 ;j x ð2Þ i 2 ;j and generated the m(= 2) mode tensor (type II), Two sample singular value vectors can be computed as Since I employed the two views problem, although I occasionally got the two mode tensor, i.e.,  Fig 4(d), respectively. Thus, TD applied to type II tensors also successfully identified latent correlations between X (k) , k = 1, 2 (Fig 6(h) and 6(i)). However, it could not depict orthogonal base functions (Fig 4(a)-4(c)) that can be detected by TD applied to type I tensors (Fig 4(g)-4(i)). Additionally, N 0 features were successfully identified as outliers (Fig 5(c) and 5(d)). Thus, at the expense of recognition of orthogonal base functions, TD applied to type II tensors successfully reduced the computational resources needed by 1/M and fulfilled tasks (i) and (ii) as defined above.
In order to see if these strategies are also useful in practice, integrated analyses of multiomics datasets were performed with this strategy and are described in the next subsection.

Multi-omics dataset
In the previous subsection, I demonstrated that applying TD based unsupervised FE to tensors generated from matrices' products could determine latent structures behind pairs of matrices. However, this may only be feasible using synthetic data, as described in the previous subsection. Thus, in order to see if it also works in the situation not prepared specifically fitted to it, I need to show that it works in real situation.
The analysed dataset is composed of two omics profiles. These are mRNA and miRNA profiles which were measured for multi-class breast cancer samples including normal breast tissues [30]. As the samples are shared, the multi-omics data corresponds to Case I data. TD based unsupervised FE was applied to the dataset in order to identify disease critical genes and latent relations between miRNA and mRNA.
At first, TD was applied to the type I tensor generated from the mRNA and miRNA profiles as follows:  are expressions of the i 1 th mRNA and the i 2 th miRNA from the jth sample. In order to determine whether TD can identify disease critical features, categorical regression analysis was applied to sample singular value vector x ℓ 3 ,j , in order to identify coincidences with defined sample classes. If the obtained sample singular value vectors are coincident with sample class labels, it is evidence that TD can process omics profiles properly, as this approach does not employ class labels explicitly. Fig 1 shows the first five sample singular  value vectors, x ℓ 3 ,j , 1 ℓ 3 5, that show significant sample class dependence. Thus, TD could successfully generate disease associated features. This is not a trivial outcome, as sample classification was not used.
Next, I attempted to extract features using the mRNA and miRNA singular value vectors To accomplish this, it was necessary to first identify mRNA and miRNA singular value vectors x mRNA ' 1 ;i 1 and x miRNA ' 2 ;i 2 , associated with sample singular value vectors, x ℓ 3 ,j , 1 ℓ 3 5 identified above. This can be done by investigating G(ℓ 1 , ℓ 2 , ℓ 3 ) since the combinations (ℓ 1 , ℓ 2 , ℓ 3 ) associated with larger absolute G values are regarded as more coincident with one another. Table 1 shows the ranking of G for 1 ℓ 1 , ℓ 2 , ℓ 3 5 (ranked from 1 to 10) based upon their absolute values. The first five sample singular value vectors x ℓ 3 ,j , 1 ℓ 3 5, are associated with the first two miRNA singular value vectors x miRNA ' 2 ;i 2 ; ' 2 ¼ 1; 2 as well as the first five mRNA singular value vectors, x mRNA ' 1 ;i 1 ; 1 ' 1 5, as only these combinations appear within top ten ranked G(ℓ 1 , ℓ 2 , ℓ 3 )s that represent the amount of coincidence among mRNA and miRNA and samples.
In order to evaluate obtained mRNAs associated with the 427 selected mRNA probes and 7 miRNAs biologically, the mRNAs were uploaded for enrichment analysis using MSigDB [31] and the miRNAs to DIANA-mirpath [32]. The top 10 enriched gene sets in MSigDB are chiefly related to breast cancer (eight out of ten, see Table 2), while the top ranked pathways identified DIANA-mirpath are "MicroRNAs in cancer" (Table 3). Thus, I concluded that our strategy is successful despite fully unsupervised nature.
Next, type II tensor generated from mRNA and miRNA profiles was considered. Applying TD to type II tensor,x  TD based unsupervised FE applied to multi-view data sets Table 2. Overlap between mRNAs identified (S1 Table) and MSigDB. Top 10 ranked gene sets are presented. Upper rows: type I, lower rows: type II tensors are considered in each gene set name, respectively. The word "BREAST_CANCER/_DUCTAL_CARCINOMA" was presented in bold face in order to emphasize the overlap with breast cancer related gene sets. K: The number of genes in each gene set, k: The number of genes overlapped. gives us two sample singular value vectors The first five are shown separately in (Fig 1(b) and 1(c)). It is obvious that all of the ten sample singular value vectors are significantly coincident with sample classifications.
To determine whether TD applied to type II tensors could depict latent correlation between miRNA and mRNA, hierarchical clustering was performed betweenx mRNA Fig 7(a)). Here,x mRNA ' 3 ;j ; 3 ' 3 10 are always paired with one ofx miRNA ' 3 ;j ; 3 ' 3 10 (Fig 7(a)). Thus, TD applied to type II tensor could successfully identify latent correlation between two views, i.e., mRNA and miRNA.
In conclusion, TD applied to type II tensor works well.

Temporally differentially expressed genes
Although the application of TD based unsupervised FE to multi-omics data was successful, one may wonder if the application to muti-omics data is reasonable, as synthetic datasets were  more difficult to deal with due to a lack of class labelling. Have I intentionally tried the easiest case? In order to address this possible query, I considered two examples of a more difficult problem: identification of temporally differentially expressed genes. The task is as follows: Given more than one temporal gene expression, identify genes expressed differently among multiple expressions at specific time points. For example, expression of a particular gene obeys f(t) as a function of t under one set of conditions, while it obeys f(t) + C with a constant C under another. Conventionally, this is differentially expressed between the two conditions; however this kind of differential expression is often not of interest when temporal gene expression is considered. One distinction of note would be, for example, differences between expressions at certain time points and not at others (a distinct time dependency between two conditions). However, there are no de facto standard methods that automatically achieve this. In order to address it, TD based unsupervised FE was applied to this problem. The first example of this application is the comparison of non-small lung cancer cell (NSCLC) line H1975, with and without EGF treatment [33]. Gene expression matrices were divided into two groups (with and without EGF treatment). The type I tensor x t 1 ,t2,i is then generated as i;t 2 are ith gene expressions of cell lines with and without EGF treatment, at time points t 1 and t 2 after the EGF or control treatments. As they share features (though not samples) in contrast to the previous application which was Case I data, this example uses Case II data. As the samples in this example are divided into two groups based on EGF treatment, it is not fully unsupervised. It is, however, unsupervised in the sense that the type of temporal difference sought is not defined. The tensor was expanded by HOSVD as shows the x control ' 1 ;t 1 and x EGF ' 2 ;t 2 for ℓ 1 , ℓ 2 = 1, 2, respectively. Obviously, those for ℓ 1 = ℓ 2 = 1 do not have any time dependence while those for ℓ 1 = ℓ 2 = 2 do. Some temporal difference was observed in the latter, however its significance is unclear. In order to determine said significance, genes identified as outliers had to be selected. This selection process began with identifying the gene singular value vectors associated with x control ' 1 ¼2;t 1 and x EGF ' 2 ¼2;t 2 . Table 1 shows the top ranked G(ℓ 1 , ℓ 2 , ℓ 3 )s with larger absolute values. It is obvious that x ℓ 3 = 2,i is associated with x control ' 1 ¼2;t 1 and x EGF ' 2 ¼2;t 2 , as the absolute values of G(2, 2, 1) and G(2, 1, 2) are the second and the third largest in the table. Next, 558 mRNA probes (associated mRNAs are in S1 Table) identified as outliers based on x ℓ 3 = 2,i were selected . Fig 2(c) shows the histogram of correlation coefficients between the vectors generated by connecting x control ' 1 ;t 1 and x EGF ' 2 ;t 2 vs the 558 selected mRNA probes. These are highly correlated (adjusted P-values are less than 0.01). It is remarkable since G(2, 2, 1) and G(2, 1, 2) are smaller than one tenth of G(1, 1, 1) whose absolute value is the largest. This suggests that the amount of contributions of G(2, 2, 1) and G(2, 1, 2) is too little to govern individual gene expression. The high correlation despite this fact speaks to the soundness of our methodology. The next step was to determine whether the 558 mRNA probes selected exhibit temporal differences. The 558 mRNA probes selected are scaled, shifted, and over drawn as boxplot (Fig 2(d)). Though it is difficult to observe, P-values were computed by two-sided t tests between expressions, with and without EGF treatments at time points, 0.5, 1, 2, 4, 6 and 48 hours (Fig 2). These values are significant only at limited time points with and without EFG treatment. These were merely temporally differently expressed genes. Thus, TD based unsupervised FE applied to type I tensors is effective.
To determine the biological reliability of the selected genes, genes associated with selected 558 mRNA probes were uploaded to MSigDB. The second significant gene set was found to be KOBAYASHI_EGFR_SIGNALING_24HR_DN (the adjusted P value is 1.37 × 10 −96 ), which is reasonable as the genes sought were expressed differently with and without EGF treatments.
The next task was to determine whether type II tensors produce similar results. Type II tensorx t 1 ;t 2 , which in this example is the matrix since two views are considered, is defined as a summation of a type I tensor over gene index, that is expanded by HOSVD, which is a simple SVD in the present casẽ and 2(f) shows thex control ' 1 ;t 1 andx EGF ' 2 ;t 2 for ℓ 1 , ℓ 2 = 1, 2. Obviously,x control ' 1 ¼1;t 1 andx EGF ' 2 ¼1;t 2 do not have any time dependence whilex control ' 1 ¼2;t 1 andx EGF ' 2 ¼2;t 2 do, as in the case where TD was applied to a type I tensor. Some temporal difference was observed betweenx control ' 1 ¼2;t 1 andx EGF ' 2 ¼2;t 2 . Again, the significance of this temporal difference was unclear. Genes identified as outliers had to be selected to determine the significance of this temporal difference. While there are two gene singular value vectors,x control 485 and 471 mRNA probes identified as outliers were selected usingx control ' 3 ¼' 1 ;i andx EGF ' 3 ¼' 2 ;i , respectively, among which 398 mRNA probes (associated mRNAs are shown in S1 Table) were commonly selected. Fig 2(g) shows the histogram of correlation coefficients between the vector generated by connectingx control ' 1 ;t 1 andx EGF ' 2 ;t 2 vs 398 commonly selected mRNA probes. These were highly correlated (adjusted P-values are less than 0.01). This is noteworhty as the smallest contribution from the second singular value vector was 1 × 10 −5 . This suggests that the amount of contributions of the second singular value vector were too small to govern individual gene expressions. As before, the high correlation despite this fact speaks to the soundness of our methodology.
The next task was to determine whether the genes selected exhibit temporal difference. The genes selected are scaled, shifted, and over drawn as boxplot (Fig 2(h)). Though it is difficult to observe, P-values computed by a two-sided t test between expression at time points, 0.5, 1, 2, 4, 6 and 48 hours (Fig 2) are significant only at limited time points with and without EFG treatment. Although this is of less significance than TD applied to a type I tensor, these are merely temporally differently expressed genes. Thus, TD based unsupervised FE applied to type II tensor is effective.
In order to see the biological reliability of selected genes, the mRNAs associated with commonly selected 398 mRNA probes were uploaded to MSigDB. The second most significant gene set was determined to be KOBAYASHI_EGFR_SIGNALING_24HR_DN (the adjusted P value is 1.37 × 10 −128 ), which, again, was reasonable as the genes sought expressed differently with and without EGF treatments. Although the number of genes selected was less than that by TD applied to type a I tensor, since the P-value is smaller, the significance was greater than that of TD applied to a type I tensor.
As a whole, both TD applied to type I type II tensors is effective. The next temporally differentially expressed gene detection example is vaccine infection experiment [34]. Patients were divided into three groups, P, D and NP. As sample classification was used, it is also not fully unsupervised. It is, however, unsupervised in the sense that no temporal functional forms are assumed. x P i;t 1 , x D i;t 2 , and x NP i;t 3 are the ith gene expressions of protected, delayed, and non-protected, patients at time points t 1 , t 2 and t 3 after vaccine treatments, respectively. Type I tensor x t 1 ,t 2 ,t 3 ,i , was defined as x t 1 ,t 2 ,t 3 ,i would be expanded by HOSVD as however, the total memory required to store all of this expansion is too large to be prepared. Fortunately, as can be seen in the application to the first temporally differentially expressed gene identification, TD applied to a type II tensor that requires much smaller (1/N) memory can be as effective as TD applied to type I tensor for temporally differentially expressed gene identification. Thus, for this example, I employ only type II tensorx t 1 ;t 2 ;t 3 , defined as that can be expanded asx by HOSVD .  Fig 3(a) and 3(b) shows thex P ' 1 ;t 1 ,x D ' 2 ;t 2 andx NP ' 3 ;t 3 for ℓ 1 ,ℓ 2 = 1, 2. Obviously,x P ' 1 ¼1;t 1 ,x D ' 2 ¼1;t 2 andx NP ' 3 ¼1;t 3 do not have any time dependence while those for ℓ 1 = ℓ 2 = ℓ 3 = 2 do, as in the EGF treated cell line cases. Thoughx P ' 1 ¼2;t 1 ,x D ' 2 ¼2;t 2 andx NP ' 3 ¼2;t 3 also seem to have some temporal difference, its significance is again unclear. To determine the significance of this difference, genes identified as outliers had to be selected. There are three gene singular value vectors: Using these three gene singular value vectors with ℓ 1 = ℓ 2 = ℓ 3 = 2, 104 mRNA probes identified commonly as outliers were selected (S1 Table). Fig 3(c) shows the histogram of correlation coefficients between the vector generated by connectingx P ' 1 ;t 1 ,x D ' 2 ;t 2 andx D ' 3 ;t 3 vs selected 104 mRNA probes. These are highly correlated (adjusted P-values are less than 0.01). This is noteworthy asGð1; 2; 2Þ;Gð2; 1; 2Þ;Gð2; 2; 1Þ, being the three largest core tensors associated with these three gene singular value vectors, had the contributions as small as 1 × 10 −3 ofGð1; 1; 1Þ, the largest one. This suggests that the amount of contributions of the second gene singular value vector is too small to govern individual gene expression. The high correlation despite this fact suggests the soundness of our methodology.
The next task was to determine whether the mRNA probes selected exhibit temporal difference. The mRNA probes selected are scaled, shifted, and overdrawn as boxplot (Fig 3(d)). Though it is difficult to observe, P-values computed by categorical regression assuming three classes (P, D, NP) at time points, 1, 2, 3, 4 and 5 days (Fig 3) are significant at all time points between the three classes. This was clearly not due to simple baseline shifts, as can be seen in Fig 3(d). This is due to as temporally differently expressed genes. Thus, TD based unsupervised FE applied to type II tensor is also effective.
In order to determine the biological reliability of selected genes, the mRNAs associated with the selected 104 mRNA probes were uploaded to MSigDB. The top six significant gene sets were determined to be: GSE13485_X_VS_Y_YF17D_VACCINE_PBMC_DN, In conclusion, TD based unsupervised FE is also effective also for the identification of temporally differentially expressed genes.

Discussion
In the following section I discuss the strategy for applying TD based unsupervised FE to tensors built from matrix products, methodological points of view, and outcomes obtained by applying this strategy to multi-omics datasets from the biological point of views.

Comparisons with various methods applicable to synthetic data
The synthetic datasets that presented in the above sections are very difficult to analyse using standard supervised statistical analysis methods. In the supervised methodology, all background knowledge of given datasets is required, e.g., classification labels or assumed functional forms (for example, monotonic increase/decrease or periodicity). Alternatively, TD applied to type I tensors is Eq (3) and applied to type II tensor is Eq (4), which can be performed in the synthetic dataset prepared without any information in advance. To my knowledge, there are no applicable supervised methodologies for the synthetic datasets presented above. Thus, in the following I discuss only unsupervised methods.
As the first N 0 features are derived from the common bases shown in Fig 4(a)-4(c), it is possible to detect them by computing correlations between them. However, as can be seen in Fig 4(d)-4(f), the correlations are highly non-linear, and it is therefore impossible to detect them (in actuality, the Pearson correlation between two variables shown is Fig 4(f) is as small as -0.01).
One may wonder if correlation analysis considering linear combinations, e.g., canonical correlation analysis (CCA), can depict latent correlation. However, in CCA, as M dimensional vectors as numerous as N must be compared, it is an overcomplete problem when M < N (and this is the present case). Thus, canonical correlation coefficients generated from linear combinations are always 1.0. This means that there is no way to detect latent correlation between N 0 (<N) features. Similar problems stand for nonlinear correlation analysis like kernel CCA (kCCA). KCCA was applied to matrices X (k) , k = 1, 2 in the above synthetic examples. The ten components (this is the kCCA default) generated are classified into two types, each of which are distinct from the first N 0 features and remaining (Fig 8(a)-8(d)). Thus, the correlation is apparently successful, however, both results in the correlation coefficients were as large as 1.0 meaning that kCCA evaluated the correlation of two views between the first N 0 features and that between remaining features the latter composed of simple random numbers as demonstrated in the above. Thus, kCCA cannot distinguish between latent correlation between the first N 0 features and random numbers, and cannot be successfully applied.
Finally, PCA based unsupervised FE, which was recently proposed and successfully applied to various integrated analyses of multi-omics datasets, was again applied here. As PCA is equivalent to singular value decomposition (SVD), ;j , respectively. u ℓ 1 ,i1 ,ℓ 1 = 1, 2 is Fig 8(g) and u ℓ 2 ,i 2 , ℓ 2 = 1, 2 is Fig 8(h). λ ℓ 2 and λ ℓ 3 are the eigen values computed with PCA. Thus Compared with Eq (3), if and PCA is equivalent to TD applied to a a type I tensor. Alternatively, compared with Eq (4), ifG then PCA is equivalent to TD applied to a type II tensor. However, HOSVD does not always produce the solution satisfying the above. For example, since Eq (4) computed with HOSVD a standard SVD,Gð' 1 ; ' 2 Þ is diagonal, while Eq (5) cannot be diagonal since v ℓ 1 ,j is not orthogonal to v 0 ' 2 ;j . Although PCA based unsupervised FE successfully identified the first N 0 features associated with latent correlations as outliers (Fig 8(g) and 8(h)), PC loadings attributed to samples (Fig 8(e) and 8(f)) are almost identical to Fig 4(d) and 4(e), which are not correlated (Fig 4(f)). Therefore PCA based unsupervised FE cannot identify latent correlation. In the previous applications of PCA based unsupervised FE aimed at integrated analysis of multi-omics data [14,22], it was critical to identify pairs of highly correlated PC loadings, otherwise it was not possible to identify which PCs should be used for FE. In this context, PCA based unsupervised FE failed to detect latent correlations among multi-views.
While PCA was unsuccessful, HOSVD was applied to type I tensors in an attempt to decompose v ℓ 1 ,j and v 0 ' 2 ;j (Fig 8(e) and 8(f)), which are not orthogonal, into orthogonal bases x ℓ 3 ,j as shown in Fig 4(g)-4(i). For this reason, TD applied to type I tensors is superior to PCA when applied to the matrix products of synthetic datasets and can depict latent correlation that PCA failed to identify. As for HOSVD applied to type II tensors, Fig 6( Fig 4(d) shows HOSVD applied to type II tensors can depict the latent correlation that PCA based unsupervised FE failed to detect.

Methodological discussion of TD based unsupervised FE applied to multi-omics data
In contrast to synthetic data (to which no supervised methods apply), the supervised method can be applied to the multi-omics data used in this study can be treated with supervised method as the data uses class labels. Strictly speaking, there are no unsupervised methods applicable to multi-view data processing other than TD based unsupervised FE. We can, therefore, even conclude that is the above strategy is sound. That said, it would be beneficial to demonstrate that the unsupervised methodology is superior to the supervised methodology.
As mentioned in above, most multi-view data processing methodologies require optimization of weights. I do not consider such methodologies, since optimizing weights is a complicated unnecessary process. Therefore, they are not comparable with the unsupervised strategy, which involves no parameter optimization processes.
Thus, the multi-view data processing methodology is considered applicable to merged matrices shown in Eqs (1) or (2), which does not include any weight optimization. Here, I consider three alternative methodologies, Significance Analysis of Microarrays (SAM), [35] Limma [36] and randomforest [37] (RF). As SAM and Limma evaluate features independently, weights attributed to views are not required. Although RF evaluates features in more collective ways, the evaluations are tree based, therefore the absolute values of each feature are not important. Table 5 shows the pertinent results. All three methods are inferior to the methodology presented above, as they failed to identify a significantly small number of features. Limma selected all of 755 miRNAs as significant. Roughly speaking, at least half of mRNAs were identified by these three methods.
Although these might be enough to demonstrate the superiority of the methodology presented above, measures were also undertaken to reduce the number of mRNAs identified by reducing threshold P-values ensuring that these three methods identify 426 mRNA probes, which is the same number identified by TD applied to type I tensors. As threshold P-values which are too small without any statistical justifications may not be acceptable, in order to evaluate these three methods, unnatural threshold P-values were intentionally used. Top ranked mRNAs selected by using intentionally reduced threshold P-values were uploaded to MSigDB server. However, breast cancer was rarely identified (Table 6). Breast cancer was identified only once by SAM, and was not detected in either limma or RF. These outcomes are in contrast to Table 2, where eight out of ten significant gene sets are breast cancer-related when TD was applied to either type I or type II tensor. Thus, it is obvious that these methodologies are inferior to the methodology presented above in the present study, i.e., TD based unsupervised FE.
Finally, PCA based unsupervised FE was applied to mRNAs and miRNAs separately ( Table 5). PCA based unsupervised FE identified smaller numbers of mRNAs and miRNAs than the above three methodologies. Especially, since mRNAs selected are almost identical to those identified by TD based unsupervised FE applied to type I tensor (Table 4), PCA based unsupervised FE is as effective for the identification of biologically reliable mRNAs. However, it cannot identify latent correlation between miRNAs and mRNAs, as hierarchical clustering of PC loading attributed to samples identified no pairs of miRNAss and mRNA (Fig 7(b)), which were identified when TD applied to type II tensors was considered and without which no integrated analysis of multi-omics datasets by PCA based unsupervised FE were successful. This indicates that the present datasets were more complex and cannot be dealt with using PCA based unsupervised FE in an integrated manner. Thus, I conclude that TD based unsupervised FE applied to type I or type II tensors is the only method for achieving two tasks: (i) identifying sufficiently small numbers of biologically important features, and (ii) identify latent correspondence between multi-omics profiles. Table 5. The numbers of identified mRNAs and miRNAs (multi-omics) and mRNAs (vaccination) using various methodologies. Multi-omics: Among 427 mRNA probes and 12 miRNAs identified PCA based unsupervised FE, 408 mRNA probes (Table 4) and 9 miRNAs were also identified with TD based unsupervised FE applied to type I tensor.  Table 6. Top 10 significant overlap gene set in MSigDB with top ranked 400 (approx) mRNA probes identified by alternative methods SAM, limma, and RF, as well as 374 mRNA probes identified by HO GSVD. "BREAST_CANCER" was presented in bold to emphasize the overlap with breast cancer, whose counts are in parentheses at the right side of method names. Here, PCA based unsupervised FE is shown to be the only strategy that can compute with the TD based unsupervised FE applied to matrix products. A more detailed comparison of these two strategies may enable us to understand the functionality of TD based unsupervised FE. As can be seen in the application to synthetic data, TD applied to type I tensors attempted to decompose v ℓ 1 ,j and v 0 ℓ 2 ,j into orthogonal bases x ℓ 3 ,j . This also occurred in the application to multi-omics datasets. First, I identified that the first miRNA PC loading, v 0 ' 2 ¼1;j , dominates subsequent PC loadings (more than 80%). Next, I also identified that the first mRNA PC loadings attributed to samples v ℓ 1 ,j and 1 ℓ 1 5, are identical to the first five sample singular value vectors x ℓ 3 ,j , 1 ℓ 3 5. The Pearson correlations between these five loadings and singular value vectors are −0.94, −0.91, 0.88, −0.97 and −0.97 (Here the signs do not mean anything), respectively. It is also shown the regression analysis of the first miRNA PC loading v 0 ' 2 ¼1;j with the top five mRNA PC loadings, v ℓ 1 ,j , 1 ℓ 1

SAM
covers more than 40% of the first miRNA PC loading v 0 ' 2 ¼1;j . Since the dimension of vector is M = 161, only five components that can cover this amount is highly significant. This suggests that TD can easily decompose single miRNA PC loadings v 0 ' 2 ¼1;j into five mRNA PC loading v ℓ 1 ,j , 1 ℓ 1 5. The result is that the first five sample singular value vectors (x ℓ 3 ,j , 1 ℓ 3 5), are almost identical to the first five mRNA PC loadings (v ℓ 1 ,j , 1 ℓ 3 5). Thus, TD based unsupervised FE can decompose the first dominant miRNA PC loading into five basic (orthogonal) mRNA PC loading. This result is analogous to that seen in the application to synthetic dataset (Fig 4).
It is also important to show that the five mRNA PC loadings v ℓ 1 ,j , 1 ℓ 1 5 (or the first five sample singular value vectors x mRNA ' 3 ;j ; 1 ' 3 5) are distinct also from biological point of view. To demonstrate this, five sets of outliner mRNA probes (associated mRNAs are in S2 Table) were selected using each of the first five mRNA singular value vectors (x mRNA ' 1 ;i 1 ; 1 ' 1 5) each of which is coincident with the first five sample singular value vectors, (x mRNA ' 3 ;j ; 1 ' 3 5)as G(ℓ 1 , ℓ 2 , ℓ 3 )s with 1 ℓ 1 = ℓ 3 5 have larger absolute values (Table 1). GO (biological process) BP term enrichments were tested by uploading S2 Table to g:Cocoa in g:profiler [27] (S2 File). It is obvious that five sets of mRNAs identified as outliers using each of the first five mRNA singular value vectors are biologically distinct from one another.
Methodological discussion of TD based unsupervised FE applied to identification of temporally differentially expressed genes At certain time points in EGF treatment experiments, there is only one measurement, which prevents the application of some statistical methods. Therefore, only vaccination samples were considered. Again, SAM, limma, RF and PCA based unsupervised FE were considered. Here, the samples are assumed to be classified into five time points times three treatments (P, D, NP) equalling 15 classes. Results are shown in Table 5. As with multi-omics data, SAM, limma, and RF failed to identify sufficiently small numbers. This is possibly due to the fact that there are 15 classes. Since even the detection of differences between pairs of any two of the 15 classes can effect results, there are many genes identified as having significant differences. On the other hand, 18 mRNA probes were identified by PCA based unsupervised FE with considering common set when separately applied to three gene expression profiles, x P i;t 1 x D i;t 2 and x NP i;t 3 . Thus, relatively successful.
In order to determine biological significance, a reduced number of gene sets was uploaded to MSigDB. Similar to the multi-omics case, SAM indicated too many mRNAs with adjusted P-value = 0, therefore no reduced sets could be generated. Two 300 top ranked mRNA probes sets were generated from limma and RF and associated mRNAs were uploaded to MSigDB together with the genes associated with 18 mRNA probes obtained by PCA based unsupervised FE. Within top 10 ranked significant genes set, no vaccination related genes sets were identified for limma or RF. PCA based supervised FE has only two vaccination related genes sets. GSE13485_X_VS_Y_YF17D_VACCINE_PBMC_DN, (X, Y) = (DAY3, DAY7) and (DAY1, DAY7), were identified as the second (adjusted P = 1.86 × 10 −6 ) and the fourth (adjusted P = 4.30 × 10 −5 ) significant genes sets, which were smaller than those identified when TD was applied to type II tensors, which identified five gene sets associated with vaccination out of six top ranked gene sets.
Thus, one of four tested methods, PCA based unsupervised FE, could produce some significant results which were inferior to those produced by TD based unsupervised FE applied to type II tensors. As a result, TD based unsupervised FE proved more effective than the other methodologies analysed above.

Comparison with HO GSVD
To my knowledge, although there are no methods that comprise a tensor from multiple matrices and applies TD to it, similar trials aiming integration of multiple matrices exist. For example, higher order generalized singular value decomposition (HO GSVD) [25] is one such method. Although HO GSVD does not generate tensor, the outcome is quite similar; a set of feature singular value vectors and a unique (common) sample singular value vector which is equivalent to what TD applied to type I tensors generated from Case I data produces. Although Ponnapalli et al [25] employed the distinct terminology from the present study, I continue to use my own terminology in this subsection to avoid confusion.
First, HO GSVD was applied to synthetic data. Fig 9(a)-9(d) shows the results. Its outcome is close to that when PCA based unsupervised FE was applied to dataset (Fig 8(e)-8(h)). It is in some sense reasonable, since HO GSVD is essentially PCA excluding the fact multiple views share the unique sample singular value matrix, V, where the first the second column vectors correspond to the first PC loading of the first and the second views, respectively.
Next, HO GSVD was applied to multi-omics data. Coincidence between sample singular value vectors and class labeling is shown in Fig 1(d). Although four out of five vectors are significantly coincident with class labeling, significance was substantially less than when TD was applied to type I and II tensors, as the P-values were larger. Thus, HO GSVD can perform well but is less effective than TD applied to type I or II tensors.
Next, 374 mRNA probes identified as outliers using the first five mRNA feature singular value vectors were selected (associated mRNAs are shown in S1 Table). Uploading the mRNAs associated with 374 mRNA probes to MSigDB, I found that only five out of ten top ranked significant genes sets were related to breast cancer (Table 6), while eight out of ten were related to breast cancer when TD was applied to type I or II tensors (Table 2).
HO GSVD cannot be applied to identification of temporally differentially expressed genes, since HO GSVD can be applied only to Case I data where samples are shared between multiple views.
These slightly poorer outcomes of HO GSVD than TD applied to type I or II tensors suggest the usefulness of tensors when analysing multi-view datasets.

Biological validations of mRNAs identified in multi-omics data analysis
In the previous subsection, TD based unsupervised FE applied to product of multi-omica profile matrices was validated chiefly from the methodological perspective, and validated partially from the biological perspective.
In this subsection, I try to validate outcomes biologically in more detail, mainly based upon the consideration from oncology.
The samples analysed are essentially proposing the comparison between tumors with and without metastasis. Thus, it is expected that selected genes are mainly related to cancer oncogenesis related to metastasis.
Since Farazi et al [30] who produced the original study, mainly discuss the aberrant expression of miRNAs among samples, there is no in depth discussion about the role of miRNA/ mRNA in metastasis. However, as can be seen in the below, much can be discussed from their dataset.
In order to biologically investigate a set of mRNAs identified when type I tensors were considered, to the mRNAs were uploaded to g:profiler (see S3 Table). A large number of enrichments of biological terms were identified.
For example, in GO BP terms, "leukocyte activation" (GO:0045321) was enriched. It was reported to be related to metastasis. Ihnen et al [38] reported a tumor biological context of activated leukocyte cell adhesion molecules (ALCAM) for the development of metastases in breast cancer. Strell et al [39] concluded that the first two steps of the extravasation of tumor cells and leukocytes, rolling and adhesion, seem to have similarities with regards to the mechanisms and receptors involved. King et al [40] identified ALCAM in metastasis of breast cancer cells to the lung. These suggested that metastasis was mediated by the extravasation similar to that of leukocytes. In relaton to this, "positive regulation of leukocyte chemotaxis" (GO:0002690) was also enriched. Wu [41] reported the role of chemotaxis in cell migration. Gradient of chemotaxis mediates cell migration, and possibly metastasis, too.
In GO (cellular component) CC terms, "extracellular region" (GO:0005576) was enriched. Cho et al [42] reported that Herceptin binds to the juxtamembrane region of HER2, identifying this site as a target for anticancer therapies, while overexpression of HER2 is found in 20-30% of human breast cancers, and correlates with more aggressive tumours and a poorer prognosis. It is also primary biomarker of breast cancer in the original study [30]. More generally, Versteeg et al [43] suggested the importance of extracellular signaling pathway in cancer metastasis. It mediates blood vessel wall damage, which may allow tumours to migrate through blood vessels.
In GO (molecular function) MF terms, "CXCR3 chemokine receptor binding" (GO:0048248) was enriched. CXCR3 was reported as a molecular target in breast cancer metastasis [44]; it inhibits tumor cell migration and promotes of host anti-tumour immunity. As suggested in the above, chemotaxis mediates cell migration and chemokine receptor CXCR3 agonist prevents human T-cell migration [45]. Other than in relation to metastasis, inhibition of CXCR3 was also known to mediate tumor growth [46]. "RAGE receptor binding" (GO:0050786) was also enriched. RAGE was reported to mediate tumor progression and metastasis through binding to S100A7 by modulating the tumor microenvironment [47]. It recruits MMP9-positive tumor-associated macrophages and mediates cell migrations.
Other than in GO terms enrichment, transcription factor (TF) SOX9 target genes were enriched. The relation between SOX9 and metastasis was pointed out by many papers. Got et al [48] reported that co-expression of Slug and Sox9 promotes the tumorigenic and metastasis-seeding abilities of human breast cancer cells. SOX9 protein, which is normally nuclear, was instead localized in the cytoplasm of 25-30% invasive ductal carcinomas (IDCs) and lymph node metastases [49]. Lei et al [50] also suggested that Sox9 expression is related to breast cancer metastasis. Although the above are all related to breast cancer, Sox9 was frequently reported to be related to metastasis in various other cancers.
KEGG pathway "Primary immunodeficiency" (KEGG:05340) was also enriched. Development of cancer in patients with primary immunodeficiencies was reported [51]. Monozygotic twin brothers with primary immunodeficiency presented with metastatic adenocarcinoma of unknown primary [52].
In conclusion, our method and strategy correctly identified many cancer related biological terms/concept enrichments, especially metastasis in breast cancer, which is coincident with the purpose of the original study that did not produce results produced here.
Although not all are strictly related to breast cancer, all seven miRNAs identified are frequently reported to be related to metastasis.

Conclusion
In this paper, a new strategy aiming at multi-view data processing that makes use of tensors generated from multi-view matrices products was proposed. As tensors can be generated from individual measurements, observation under combined conditions, which is generally required to produce tensors from datasets, is not necessary. FEs were performed using singular value vectors generated from TD and biological feasibility was confirmed via comparisons with previously generated annotated gene expression profiles. As this strategy is not restricted to gene expression, its application to other datasets is feasible.
Supporting information S1 File. Flow chart and variables. A Work flow chart and list of the variables introduced are in S1 File. (PDF) S2 File. BP term enrichment. BP term enrichments by uploading S2 Table to g:Cocoa in g: profiler. (PDF) S1