Identifying Multi-Dimensional Co-Clusters in Tensors Based on Hyperplane Detection in Singular Vector Spaces

Co-clustering, often called biclustering for two-dimensional data, has found many applications, such as gene expression data analysis and text mining. Nowadays, a variety of multi-dimensional arrays (tensors) frequently occur in data analysis tasks, and co-clustering techniques play a key role in dealing with such datasets. Co-clusters represent coherent patterns and exhibit important properties along all the modes. Development of robust co-clustering techniques is important for the detection and analysis of these patterns. In this paper, a co-clustering method based on hyperplane detection in singular vector spaces (HDSVS) is proposed. Specifically in this method, higher-order singular value decomposition (HOSVD) transforms a tensor into a core part and a singular vector matrix along each mode, whose row vectors can be clustered by a linear grouping algorithm (LGA). Meanwhile, hyperplanar patterns are extracted and successfully supported the identification of multi-dimensional co-clusters. To validate HDSVS, a number of synthetic and biological tensors were adopted. The synthetic tensors attested a favorable performance of this algorithm on noisy or overlapped data. Experiments with gene expression data and lineage data of embryonic cells further verified the reliability of HDSVS to practical problems. Moreover, the detected co-clusters are well consistent with important genetic pathways and gene ontology annotations. Finally, a series of comparisons between HDSVS and state-of-the-art methods on synthetic tensors and a yeast gene expression tensor were implemented, verifying the robust and stable performance of our method.


Introduction
Clustering analysis has become a fundamental tool in statistics, machine learning and signal processing [1]. A number of clustering algorithms have been developed, with the general idea of seeking groups among different objects in a full feature space. However, this process has several limitations, such as the adoption of a global-feature similarity among objects and the multiple synthetic and biological tensors are used, and the detected significant co-clusters are analyzed according to genetic pathways and gene ontology (GO) annotations [32] to attest the biological significance of these co-clusters [33].

Notations and Preliminaries
An Nth-order (N-mode) tensor A A ¼ fa i 1 i 2 ...i N ; i k ¼ 1; . . . ; I k g can be defined as a multi-dimensional array, where N is the number of dimensions [23]. Here we adopt a boldface and Eulerscript letter A A to denote a tensor, with its entry notated as a i 1 i 2 . . .i N . Accordingly, a column vector can be denoted using a boldface and lowercase letter, e.g. a, with its ith entry notated as a i ; and a matrix is denoted by a boldface and uppercase letter, e.g. A, with its entry in the ith row and jth column denoted as a ij .
Further, the ith row and jth column of A can be notated as a i: and a :j , respectively. Additionally, fibers and slices of A A can be defined. For example, the column-, row-and tube-fibers of a 3-mode tensor A A 2 R I 1 ÂI 2 ÂI 3 are denoted as a :jk , a i:k , and a ij: , respectively. The horizontal, lateral and frontal slices of this tensor are notated as A i:: , A :j: , and A :: k , respectively.
The process of transforming a tensor into a 2D matrix is called unfolding or flattening. The mode-n unfolded matrix A (n) of a tensor A A 2 R I 1 ÂI 2 Â...ÂI N is a matrix of size I n Â ð Q k6 ¼n I k Þ, with its mode-n fibers reduced to the columns [34]. Unfolded matrices play an important role in the product of a tensor and a matrix. Similar to the product of two 2D matrices, the mode-n product of a tensor A A 2 R I 1 ÂI 2 Â...ÂI N with a matrix U 2 R JÂI n can be denoted as A A Â n U, which is a tensor (size I 1 × . . . × I n−1 × J × I n+1 × . . . × I N ) composed of the following entries, This can also be described by a matrix product in Eq (2), using the unfolded matrices.
Biclustering in Matrices Biclusters in Matrices. In a data matrix A, a bicluster is a sub-matrix of A and represents a coherent pattern [12]. Specifically, we notate a bicluster as A IJ , where I = {i 1 , i 2 , . . ., i s } stands for a subset of rows and J = {j 1 , j 2 , . . ., j t } is a subset of columns. Based on A IJ , we can further define several types of generally-discussed biclusters: 1. constant biclusters, i.e. {a ij = μ j i 2 I, j 2 J}; 2. constant-row or constant-column biclusters, i.e. fa T iJ ¼ m i 1 J j i 2 Ig or {a Ij = μ j 1 I j j 2 J} where 1 J or 1 I is a column vector of ones; 3. additive-row or additive-column biclusters, i.e. fa T iJ ¼ m i 1 J þ a T kJ j i; k 2 Ig or {a Ij = μ j 1 I + a Ik j j, k 2 J}; 4. multiplicative-row or multiplicative-column biclusters, i.e. fa T iJ ¼ m i a T kJ j i; k 2 Ig or {a Ij = μ j a Ik j j, k 2 J}.
Let us consider the analysis of gene expression data ([gene, experimental condition]) as an example. Our aim is to identify gene groups with similar behaviors or functions, such as a group of genes that are highly correlated under a group of experimental conditions [2,12,35]. In this regard, a constant bicluster means that a group of genes have the same expression level under a group of conditions, thus exhibiting certain kinds of homogeneity [35]. Similarly, constant-row, constant-colonm, addtive and multiplicative biclusters can also reveal genes with related behaviours or functions, which can be coordinately investigated and targeted in gene regulation [6,12]. Specifically, a constant-row bicluster means that each gene in a group has the same expression level under all conditions in a group, but different genes may have different expression levels. A constant-column bicluster means that all genes in a group have the same expression level under each condition in a group, but a gene may have different expression levels for different conditions. In an additive bicluster, expression levels of all genes in a group under one condition is higher or lower by a constant than those under another condition. In a multiplicative bicluster, expression levels of all genes in a group under one condition is a multiple of those under another condition.
Most biclustering techniques permutate the original matrix and optimize a scoring function. Commonly-used scoring functions include the sum of squares [11] in Eq (3) and the mean squared residue score [2] in Eq (4), HðI; JÞ ¼ 1 jIjjJj where A IJ is the mean of sub-matrix A IJ , and a iJ and a Ij are the means of row a iJ and column a Ij , respectively. A IJ is defined as a δ-bicluster if H(I, J) δ, where δ(δ > 0) is a pre-specified residue score threshold value.
Unfortunately, H(I, J) can be used only to detect biclusters of types (a), (b) and (c), but not type (d). Therefore, a more general scoring function was proposed [36] for a thorough bicluster-search. This function S(I, J), as expressed in Eq (5), is derived from Pearson's correlation: is Pearson's correlation between two vectors x and y. Normally, a lower S (I, J) value represents a stronger coherence among the involved rows or columns [36]. Similarly as above, we can define a δ-bicluster if S(I, J) δ, where δ > 0. Biclustering Based on Singular Value Decomposition (SVD) and Clustering in Singular Vector Matrices. In biclustering analysis, SVD-based methods play an important role and have been broadly applied to detect significant biclusters. Representative methods include sparse SVD (SSVD), regularized SVD (RSVD), robust regularized SVD (RobRSVD), nonnegative matrix factorization (NMF) and nonsmooth-NMF (nsNMF) [36][37][38][39].
Let A be a matrix of size N × M, and its SVD can be defined as follows, where r is the rank of A, U = [u 1 u 2 . . . u r ] is an N × r matrix of left orthonormal singular vec- is an M × r matrix of right orthonormal singular vectors, and S = diag(σ 1 σ 2 . . . σ r ) is an r × r diagonal matrix with positive singular values (σ 1 ! σ 2 ! . . . ! σ r ). In Eq (6), s i u i v T i is a rank-one matrix that is called an SVD layer. Normally, the SVD layers corresponding to large σ k values are regarded as effective signals, while the rest are considered as noise [36]. Based on effective SVD layers, a rank-l (l < r) approximation of A can be derived by minimizing the squared Frobenius norm: In order to develop SVD-based techniques for co-cluster analysis of high-order tensors, let us first study the properties of biclusters in 2D singular vector spaces. For simplicity, merely coherent patterns along columns are considered here.
Proposition 1: If A c is a bicluster with size n × d, then rank(A c ) 2.
Proof: First, we can rewrite A c as follows, If A c corresponds to a constant or constant-column bicluster, then rank(A c ) 1. Otherwise, if A c corresponds to a multiplicative-column bicluster, then A c can be formulated as Eq (9), As well, here the rank of A c is no more than one (rank(A c ) 1), since k 2 a c :1 ; . . . ; k d a c :1 are linearly dependent on a c :1 . At last, if A c corresponds to an additive bicluster, then the following equation can be derived, Here, at most two vectors, such as a c :1 and a c :1 þ b 2 1 n , are linearly independent. Therefore, rank (A c ) 2. Proof: According to Proposition 1, the rank of A c is at most 2. Because s c That is, Thus, each column of A c can be represented as a linear combination of first two columns, u c 1 and u c 2 , of U c . Geometrically, Eqs (12) and (13) mean that points ðu c m1 ; u c m2 Þðm ¼ 1; 2; . . . ; sÞ are distributed on a line.
Similarly, we can obtain Thus, each row of A c can be represented as a linear combination of first two columns, v c 1 and v c 2 , of V c . Geometrically, Eqs (14) and (15) mean that points ðv c m1 ; v c m2 Þðm ¼ 1; 2; . . . ; tÞ are distributed on a line.
Proposition 3: Assume A c1 and A c2 with sizes s 1 × t and s 2 × t are two different biclusters, 1 , u c1 2 , u c1 3 , u c1 4 , of U c , each column of A c2 can be represented as another linear combination of rows s 1 + 1 to s 1 + s 2 of first four columns, u c2 1 , u c2 2 , u c2 3 , u c2 4 , of U c , and each row of A c can be represented as a linear combination of first four Proof: According to Proposition 1, the rank of A c1 is at most 2, and so is the rank of A c2 .
, where U c1 represents the first s 1 rows and U c2 the where u c1 m (m = 1, 2, 3, 4) are the first four columns of U c1 and u c2 m (m = 1, 2, 3, 4) are the first four columns of U c2 . Similar to the proof for Proposition 2, we can obtain where a c1 j represents the j-th column of A c1 or the first s 1 elements in the j-th column of A c , and a c2 j represents the j-th column of A c2 or the remaining s 2 elements in the j-th column of A c . Geometrically, Eq (17)  In these matrices, "X" represents entries of irrelevant elements, and "A" and "B" represent the entries of two biclusters A and B respectively. The actual values at the location marked by "X" can be different, and they are background noise. The values at the locations marked by "A" and "B" should form bicluster patterns as described in Section "Biclusters in Matrices".
Due to noise, the rank of Matrix 1 can be greater than 2. We can remove small singular values using a method to be discussed in Section "HOSVD". Assume that we still retain 2 singular values, then Eqs (12) to (15) are only approximations for rows 2, 5 and 8. Remember that our task in biclustering is to find these row indices (and column indices 2, 3 and 7). That is, we do not know beforehand rows 2, 5 and 8 contain a bicluster and we need to identify them. Assume that we take SVD of Matrix 1, then from the left singular vector matrix U, points (u m1 , u m2 ) (m = 2, 5, 8) should form a line approximately according to Eqs (12) and (13), while points (u m1 , u m2 ) (m 6 ¼ 2, 5, 8) will be distributed randomly and do not satisfy these equations. Our task now is to detect the line from all points (u m1 , u m2 ) (1 m 10). Once the line is detected, we can then determine which points are on the line. The row indices of these points correspond to the locations of the bicluster. Similarly, we can identify relevant columns of the bicluster by detecting lines using the right singular vector matrix V. To identify biclusters in Matrices 2 and 3, we need to find hyperplanes in singular vector spaces. The lines and hyperplanes can be detected using the linear grouping algorithm (LGA) to be discussed in Section "Linear Grouping Algorithm (LGA)".

Identification of Co-clusters in High-order Tensors
Co-clusters in High-order Tensors. The hyperplane model in singular vector spaces can be extended to the analysis of higher-order tensor data. For example, a co-cluster (represented by a sub-tensor can be similarly defined as in the 2D case. This definition involves the predefined fibers and slices in a 3D tensor. Now we use constant co-clusters as examples, 2. a constant-column-fiber co-cluster can be expressed as {a Ijk = μ jk 1 I j j 2 J, k 2 K}; 3. a constant-horizontal-slice co-cluster can be defined as {A iJK = μ i 1 JK j i 2 I}; Here 1 I or 1 JK is a vector or matrix of ones. Accordingly, additive and multiplicative co-clusters can be also defined. To evaluate the significance of co-clusters in high-order tensors, a scoring function generalized from that for matrices in Eq (5) can be used. Let S I 1 . . .i n . . .I N represent the average of S I 1 i n , . . ., S I n−1 i n , S i n I n+1 , . . ., S i n I N , then we can define the scoring function as Similarly, a lower score indicates a higher significance. A δ-co-cluster can be further defined if S(I 1 , I 2 , . . ., I N ) δ (δ > 0). In particular, we name such a co-cluster in a 3D tensor as a δtricluster.
HOSVD. Similar to SVD, HOSVD that decomposes a tensor into a core tensor and a singular vector matrix along each mode is employed in our method, to extract co-clusters in highorder tensors [23,40,41].
The HOSVD of an N-mode tensor A A 2 R I 1 ÂI 2 Â...ÂI N can be expressed as follows, where U ðnÞ 2 R I n Âr n ðn ¼ 1; . . . ; NÞ is a factor (singular vector) matrix, and T T 2 R r 1 Âr 2 Â...Âr N is the core tensor. The SVD of a matrix A in Eq (6) can be formulated in a similar format with T T ¼ S 2 R I 1 ÂI 2 , as a special case of Eq (20).
On the other hand, HOSVD can be expressed as a matrix format, using the unfolded matrices along each mode, That is, we obtain a matrix A (n) of size I n × I 1 I 2 Á Á ÁI n−1 I n+1 Á Á ÁI N for mode n, and U (n) is the left singular vector matrix of A (n) . When the tensor A A is unfolded to a matrix A (n) , a co-cluster in A A will be unfolded to a bicluster in A (n) . From U (n) , we can find the row indices of A (n) that contain a bicluster, according to Propositions 1 to 3. These row indices correspond to the locations in the tensor along mode n. By combining the row indices of biclusters in all unfolded matrices A (n) (n = 1, . . ., N), we can then find the co-cluster in an N-dimensional space or Nmode tensor A A . Now the major task is to find hyperplanes for each singular vector matrix U (n) . That is, the problem of detecting co-clusters in a multi-dimensional space has been effectively converted to the detection of biclusters in singular vector spaces. For a tensor A A 2 R I 1 ÂI 2 Â...ÂI N , the n-rank of A A ðrank n ðA A ÞÞ is defined as the column rank of A (n) . Let r n ¼ rank n ðA A Þ, (n = 1, . . ., N), then A A has the rank of (r 1 , r 2 , . . ., r N ). Inspired by the rank-l (l < r) SVD approximation of matrix A in Eq (7), the singular-value truncation also can be used to reveal effective signals and reduce noises in a tensor [23]. Accordingly, a portion of singular values S (n) in Eq (22) will be adopted. Thus, we can define a truncated HOSVD depending on the matrix format as follows, for a tensor A A , its decomposition of rankðr 1 ;r 2 ; . . . ;r N Þ, withr n < r n ¼ rank n ðA A Þ for at least one mode (n), is called a truncated HOSVD [40]. This concept for a 3-mode tensor is shown in Fig 1. The optimal rank ðr 1 ;r 2 ; . . . ;r N Þ of a tensor can be determined from a compromise between accuracy and number of singular values used. Because we use the unfolded matrices to Identifying Co-Clusters in Tensors Based on Hyperplane Detection determine the row indices of co-clusters, we find the optimal rankr n of each unfolded matrix A (n) and the optimal rank of the tensor is then (r 1 ,r 2 , . . .,r N ). The accuracy of a matrixÃ is defined using the Frebenius norm as acc ¼ , whereÃ is reconstructed from SVD after truncation of mall singular values. The relative error is defined as 1−acc. Fig 2 presents the relative error curves for two examples A andÃ with different numbers of singular values used. The two 2D synthetic matrices of size 100 × 100, which are similar to Matrix 1 and Matrix 2 discussed below Proposition 3, are embedded with one and two additive biclusters of size 10 × 10, respectively. Each additive bicluster is produced from a seed column of random numbers distributed in a normal distribution N(0, z 2 ) (z = 6 for low nosie level and z = 1.25 for high noise level) and each of the other 9 columns is produced by adding to the seed column a random number with the standard normal distribution. The background of each matrix also contains random numbers with the standard normal distribution. As shown in Fig 2, the corner Identifying Co-Clusters in Tensors Based on Hyperplane Detection point corresponds to the optimal rank (2 and 4) of the two 2D synthetic matrices. As noise increases, the relative error curve becomes smooth and the corner point disappears. Through many simulation experiments, we find that a threshold at 20% relative error can be used to find the optimal rank value.
Linear Grouping Algorithm (LGA). As discussed in Section "Biclustering Based on Singular Value Decomposition (SVD) and Clustering in Singular Vector Matrices", bicluster searching in a 2D matrix A can be transformed into a hyperplane detection problem in the singular vector matrices produced by SVD, which can be generalized to a high-order tensor A A . In detail, a co-cluster in A A can be represented by the biclusters along each mode, and a series of linear relations among the HOSVD-generated singular vectors can be accordingly inferred as below, where the co-cluster is decomposed as . Eq (23) represents a group of hyperplanar (linear) relations in a multi-dimensional space, which can be named hyperplanar co-clusters. Importantly, similar to the 2D case, these hyperplanar relations shed light on the multi-dimensional co-cluster identification in tensors [5,13,36,37,39].
Specifically in our work, the linear grouping algorithm (LGA) was adopted to reveal the linear relations embedded in the singular vector matrices, which were generated by a truncated HOSVD. This model follows the linear patterns among the involved vectors [5]. Originally, LGA clustered data points by fitting a mixture of linear regression models [42], and later an evolved model based on an orthogonal regression approach was proposed, which provided favorable performances in applications with outliers [5]. Recently, this model has been improved to a robust linear clustering method [43]. A simple procedure of LGA can be described as Algorithm 1.
Algorithm 1 Procedure of LGA for a group of vectors U I n Âr n procedure LGA ðU I n Âr n ¼ ½u 1 ; u 2 ; . . . ; u I n T ; K Þ Step 1: Scale the variables through dividing them by the standard deviation, U I n Âr n ¼ ½ũ 1 ;ũ 2 ; . . . ;ũ I n T , Step 2: Select K random sub-samples of sizer n , Step 3: Loop for j = 0, 1, . . ., J do 1. Initialize K orthogonal regression hyperplanes H j ¼ fh j 1 ; h j 2 ; . . . ; h j K g by fitting the samples in G j ¼ fg j 1 ; g j 2 ; . . . ; g j K g, resulting in h j k ¼ fðâ k ;b k Þ jâ T kũ j ki Àb k ¼ 0;ũ j ki 2 g j k ; kâ k k¼ 1; i ¼ 1; 2; . . . ;r n g ðk ¼ 1; 2; . . . ; K Þ. 2. Compute the distance between each hyperplane h j k and each sampleũ i , end for D j reaches the minimum D opt . Return G opt end procedure The purpose of applying the LGA algorithm is to find the row indices of each co-cluster along each mode of a tensor from the corresponding unfolded matrix, as discussed above. This is done by detecting hyperplanes formed by some but not all points (u m1 , u m2 , . . ., u md ) (m = 1, 2, . . ., I n ) in the left singular vector space along each mode. Initially, we choose random points to compute the coefficients of hyperplanes. Then we assign each point to the closest hyperplane. This process is repeated to improve the result. The procedure is similar to the K-means algorithms. The difference is that we deal with hyperplanes while the K-means algorithm involves cluster centers only. We consider a point is on a hyperplane if its distance to the hyperplane is smaller than a pre-specified threshold, which is determined experimentally. From a set of points on the same hyperplane, we can then find the row indices of a corresponding co-cluster. All co-clusters detected from the hyperplanes are subject to an evaluation and elimination procedure discussed below.
Multi-dimensional Co-clustering in Tensors Based on HOSVD and LGA. HOSVD decomposes an N-mode tensor A A into a core tensor and singular vector matrices U (n) along all modes. Hyperplanes embedded in each truncated singular vector matrix were revealed by LGA, based on the natural linear patterns among the vectors. Through combining the products along each mode, the high-order co-clusters in this tensor were successfully identified. To further filter such co-clusters, those having a score (Eq (20)) smaller than or equal to δ, namely S (I 1 , I 2 , . . ., I N ) δ (δ > 0), were extracted and regarded as more significant. A co-cluster is eliminated if it is a part of or identical to another co-cluster. The block diagram of our co-clustering method based on hyperplane detection in singular vector spaces (HDSVS) is shown in

Experiment Results
To verify HDSVS, several data sets, including multiple synthetic and biological tensors, were used in the experiments. Two synthetic tensors were constructed with increased noise and overlapped degree, to evaluate the effects of noise and overlapping complexity to the co-cluster identification. Two biological tensors were from gene expression data from 12 multiple sclerosis patients under an IFN-β therapy [44,45] and spatial/temporal lineage tracing data of embryonic cells in a crowd of Caenorhabditis elegans [34,46], and were used to test the performance of our method in practical problems. HDSVS can be successfully applied to co-clustering in highorder tensors. However, because most existing methods are designed for 2D data, we conducted comparisons of HDSVS with other methods using only matrices or second-order tensors in Section "Experiment Comparisons with Other Methods Using 2D Synthetic Data and 2D Yeast Gene Expression Data". Specifically, 2D synthetic tensors generated based on well-published principles [47] and a 2D yeast gene expression tensor [20] were adopted to compare the performance and robustness of our method with those of existing methods.

Evaluation of Noise and Overlapping Effects in Co-cluster Identification Using Synthetic Tensors
A matching scoring, generated by the Jaccard coefficient [14], was first defined to evaluate the agreement between a detected co-cluster and the true one. Let CL 1 ¼ ðG 1 1 ; G 1 2 ; . . . ; G 1 N Þ and CL 2 ¼ ðG 2 1 ; G 2 2 ; . . . ; G 2 N Þ be two co-clusters in a tensor A A 2 R I 1 ÂI 2 Â...ÂI N , where G j i I i ði ¼ 1; 2; . . . ; N; j ¼ 1; 2Þ is a subset of the ith dimension of A A . The matching score can be expressed as follows, Further, we denote a true co-cluster as CL true and a detected one as δ-CL, then a larger value of MS(CL true , δ-CL) represents a better detection. Based on such matching scores, effects of noise and overlapping complexity on the co-cluster identification will be discussed, using the two synthetic tensors. In the left-hand side, the flow for a truncated HOSVD is shown. The LGA module is presented in the middle. The ranking procedure based on a scoring function, for revealing significant co-clusters (δ-CLs) in a tensor, is listed in the righthand side. doi:10.1371/journal.pone.0162293.g003

Identifying Co-Clusters in Tensors Based on Hyperplane Detection
In the first case, four types of CL true (10 × 10 × 10), constant, constant-column-fiber, additive-fiber and mulitplicative-fiber co-clusters, were embedded into a 3D tensor A A 1 (100 × 100 × 100), whose background was generated based on the standard normal distribution. The four types of co-clusters were generated as follows: 1. constant co-cluster, i.e. {a ijk = 2 j i 2 I, j 2 J, k 2 K}; 2. constant-column-fiber co-cluster, i.e. {a Ijk = μ jk 1 I j j 2 J, k 2 K}, where μ jk was randomly selected from U(−2, 2); 3. additive-fiber co-cluster, i.e. {a Ijk = μ jk 1 I + a I(1) j j 2 J, k 2 K}, where a I(1) is the first fiber of the co-cluster, μ jk and each value of a I(1) were randomly selected from U(−2, 2); 4. multiplicative-fiber co-cluster, i.e. {a Ijk = μ jk a I(1) j j 2 J, k 2 K}, where a I(1) is the first fiber of the co-cluster, where μ jk and each value of a I(1) were randomly selected from U(−2, 2); Then Gaussian white noise with different signal-to-noise ratios (SNRs) was generated to degrade CL true . The proposed HDSVS algorithm and PARAFAC with sparse latent factors [29] was then applied to the noisy tensors, after which MS(CL true , δ-CL) was calculated. The experiment was performed 100 times for each method and the matching scores from all experiments were averaged to obtain the final score for comparison. The matching scores corresponding to various SNRs are summarized in Fig 4a. For the constant co-cluster, the matching score of HDSVS and PARAFAC are 1 for all SNRs. PARAFAC performs better than HDSVS when the SNR is low for the constant-column-fiber (SNR 15) and the multiplicative-fiber (SNR 5) co-cluster. However, as the SNR increases, HDSVS has higher matching scores than PAR-AFAC for the constant-column-fiber (SNR ! 20) and the multiplicative-fiber (SNR ! 10) cocluster. HDSVS is much better than PARAFAC for additive-fiber co-clusters. As discussed above, constant, constant-column and multiplicative biclusters have rank 1, while additive biclusters have rank 2. Our proposed HDSVS algorithm is especially effective for additive coclusters because the hyperplane model fits their linear structures well. The PARAFAC based co-clustering method relies on the K-means clustering formulation and can only work well with co-clusters that can be represented by their centers, corresponding to structures of rank 1.
Likewise, a 3D tensor A A 2 (100 × 100 × 100) with two overlapping co-clusters were generated. The experiment was also performed 100 times. Each time, the types of two overlapping co-clusters were chosen randomly. For simplicity, merely the overlapped cubic patterns were considered in the evaluation, and thus the overlapping degree v can be defined as the size of overlapped cubes in each dimension (0 v 9). As shown in Fig 4b, both methods have reasonably good performance for detecting overlapping co-clusters. However, for all overlapping degrees, HDSVS performs consistently better than PARAFAC.

Co-cluster Identification in Gene Expression Data from Sclerosis Patients under an IFN-β Therapy
A 3D tensor generated from the gene expression data of multiple sclerosis patients, who accepted a treatment of IFN-β injection, is discussed here. Twelve patients with western European ancestry, including eight females and four males (average age of 36.4), were recruited in this study. Fifteen milliliters of EDTA blood sample (at peripheral venous) were drawn from each patient, respectively at the baseline day and 2 days, 1 month, 1 year and 2 years after the initiation of an IFN-β therapy (http://link.springer.com/article/10.1007/s12035-013-8463-1/ fulltext.html) [44,45].
In detail, the gene expression data can be represented by a 3D tensor (gene×patient×time) of size 18862 × 12 × 5. Considering the IFN-β therapy, we only kept the therapy-related genetic pathways (56 genes), leading to a simplified tensor of size 56 × 12 × 5. Regarding each gene×patient matrix (time is fixed) as a layer, Fig 5 shows the heat maps for these five layers.
HDSVS was applied on this simplified tensor A A . Its core tensor T T and singular vector matrices U (n) (n = 1, 2, 3) can be extracted by HOSVD, while considering noises, a truncated HOSVD A A was implemented (T T andŨ ðnÞ ). Specifically, the optimal rank for the truncated HOSVD is (2,4,4), which was derived based on the method discussed in Section "HOSVD" and Fig 2. Once the truncated HOSVD was obtained, LGA was used for the bicluster-detection along each mode (Ũ ðnÞ ). For example, the 56 points inŨ ð1Þ can be linearly grouped into two patterns, and similar patterns can be derived forŨ ð2Þ andŨ ð3Þ . Such linear patterns, at the first two dimensions of eachŨ ðnÞ (or U (n) ), are shown in Fig 6a to 6c, respectively. As a supplementary study, the first three dimensions of eachŨ ðnÞ (or U (n) ) are separately plotted in Fig 7a to 7c. Further, As shown in Fig 6, the 56 genes are divided into two linear groups denoted as E i (i = 1, 2), the 12 patients correspond to three groups P j (j = 1, 2, 3), and the 5 time points represent two  Identifying Co-Clusters in Tensors Based on Hyperplane Detection groups T k (k = 1, 2). Accordingly, we combine these indexes to build a co-cluster as follows, To refine the findings, significant δ-CLs (Eq (19)) were extracted. CL 121 , including 13 genes, 6 patients, and 2 time points, finally stood out with δ = 0.156.
To profile this co-cluster, we observed it along each two modes and now present the heat maps in Fig 8. Fig 8a and 8b display the gene×patient matrices at the baseline time and 1 year after the IFN-β therapy. In Fig 8c through Fig 8h, the gene×time matrices for the 6 involved patients are shown. Here we can detect the similar patterns among the first four patients (Fig  8c to 8f), roughly defining a constant-lateral-slice or additive-lateral-slice co-cluter. Additionally, the patient×time matrices for the 13 genes in this co-cluster are presented in Fig 8i to 8u. Interestingly, an L-shape is revealed in the majority of these figures, and this similarly leads to the formation of an additive-horizontal-slice or multiplicative-horizontal-slice co-cluster. All these results have attested the reliability of our algorithm.
The biological significance of CL 121 was further analyzed according to genetic pathways and GO annotations [33]. Specifically, important contribution of the 13 genes (CXCL10, EIF2AK2, IFIT1, IRF7, IRF9, ISG15, ISG20, MX1, NFKB1, OAS1, RSAD2, STAT1, TLR8) to the biological processes in GO has long been elucidated ( Table 1). As reported in [48,49], the immune system process (GO: 0002376) was consistently annotated, verifying the enrichment of its three sub-terms, immune effector process, defense response to virus and response to virus in our detected co-cluster.
Moreover, the genetic pathways of the 13 genes were analyzed (Table 2). These pathways were mostly inferred from previous studies [44,45]. Interestingly, the enrichment of bone remodeling pathway is shown in our analysis, with a small p-value of 2.3E-4 in Table 2. This finding leads to sufficient biological evidences of correlating bone remodeling with the IFN-β treatment for sclerosis patients. Common characteristics of the patients [48], such as a shorter disease duration, a lower EDSS score and an easier relapse, in CL 121 were further revealed, leading to an effective profile of their disease progression. Overall, our algorithm can be beneficial to personalized therapy design and new drug discovery in the treatments of sclerosis patients.

Co-clustering of Embryonic Cell Cycles in the Lineages of C. Elegans
Recently, the techniques of live-cell imaging microscopy and fluorescent tagging have developed rapidly. These techniques have been broadly used in observing gene expression, nuclei movement and nuclei division, during the embryogenesis of a single cell [34,38,46]. An example of the lineage tracing of C. elegans can be found in Fig 9. The length of a cell cycle in an organism is important in the tracing of an embryonic cell lineage. For different organisms and cells, the length varies significantly. Depending on wellreported protocols in [46], the cell cycle lengths of *300 C. elegans embryos were evaluated by perturbing their 1219 genes in [49] (http://phenics.icts.hkbu.edu.hk/). For simplicity, the 8-cell stage in the AB branch was regarded as our founder-cell stage [34], and 14 descendants of each founder cell were studied. As a summary, a gene×descendant×founder tensor of size 1219 × 14 × 8 was constructed. Importantly, identification of co-clusters in this tensor can lead to the derivation of cell fates in the C. elegans lineage.
Similar to the preceding section, the optimal rank of (2, 5, 8) was derived for the truncated HOSVD. Linear patterns (LGA) inŨ ðnÞ were separately displayed in Fig 10a to 10c, where the first two column vectors were used as representatives. Meanwhile, the first three dimensions of eachŨ ðnÞ (Fig 11a to 11c) and their planar patterns in the 3D space are separately displayed in Fig 11d to 11f.  Specifically, 4, 2 and 2 clusters in the three modes were detected and are shown in Table 3. Here, the two groups of founder cells (F) are consistently traced to their mother cells (ABal/ ABpl and ABar/ABpr). Furthermore, all the cells in D 2 are terminal cells, and only three terminal cells ( Ã aaa, Ã paa and Ã pap) are distributed to D 1 . Compared to ancestors, terminal cells at the same stage were reasonably grouped together, as earlier cell cycles had evolved to different lengths after multiple cell divisions.
A significant δ-CL (CL 122 ), including 42 genes (E 1 ), 5 terminal cells (D 2 ), and 4 founder cells (F 2 ), was detected with δ = 0.0702. The descendant×founder matrices for the corresponding 42 genes are displayed in Fig 12, as a series of heat maps. Likewise, similar patterns (two horizontal black lines) can be identified in the majority of these maps, defining a specific additive or multiplicative co-cluster type. The biological functions of these 42 annotated genes were further explored, using the GO terms and KEGG pathways [33]. Results are listed in Table 4, where the four annotated functional categories have long been emphasized in earlier studies of lineage tracing of C. elegans [34,46,49]. These functional categories may be importantly correlated to the descendant cells of ABar and ABpr branches (F 2 ).   [34], and now they are detected in CL 321 as well. As a great number (379) of genes were involved, multiple functional categories were annotated with  The number of genes in the group is 69 descendant cells (D) D 1 "*a" "*aa" "*aaa" "*ap" "*p" "*pa" "*paa" "*pap" "*pp" D 2 "*aap" "*apa" "*app" "*ppa" "*ppp" founder cells (F) F 1 "ABala" "ABalp" "ABpla" "ABplp"

Experiment Comparisons with Other Methods Using 2D Synthetic Data and 2D Yeast Gene Expression Data
In order to test the performance and robustness of HDSVS, we implemented a series of comparion experiments. Matrices of 2D tensor data were used because most existing methods can process 2D data only and cannot be generalized to higher-order tensors easily, and multiple state-of-the-art biclustering methods were adopted for such comparisons. These methods include ISA [19], CC [2], FABIA [21], BSGP [18], SMR [29] and BiMax [20]. Specifically in our comparison experiments, MTBA (MATLAB Toolbox for Biclustering Analysis) was used as an algorithm-suite. First, 2D synthetic data were generated based on principles in [47], resulting in matrices (500×200) with single-type biclusters (50×50). The background values are generated by a normal distribution N(0, 1). Comprehensively, constant, constant-row/column, additive and mulitplicative biclusters were considered and generated as follows: 1. constant biclusters, i.e. {a ij = 2 j i 2 I, j 2 J}; 2. constant-row or constant-column biclusters, i.e. fa T iJ ¼ m i 1 J j i 2 Ig or {a Ij = μ j 1 I j j 2 J} where 1 T J or 1 I is a column vector of ones, μ i and μ j are drawn from a normal distribution N(0, 1); 3. additive-row or additive-column biclusters, i.e. fa T iJ ¼ m i 1 J þ a T ð1ÞJ j i; k 2 Ig or fa Ij ¼ m j 1 I þ a T ið1Þ j j; k 2 Jg where a (1)J (a i(1) ) is the first row (column) of the biclusters, μ i , μ j and each value of a (1)J (a i(1) ) are drawn from a normal distribution N(0, 1); 4. multiplicative-row or multiplicative-column biclusters, i.e. fa T iJ ¼ m i a T ð1ÞJ j i; k 2 Ig or {a Ij = μ j a I(1) j j, k 2 J} where a (1)J (a i(1) ) is the first row (column) of the biclusters, μ i , μ j and each value of a (1)J (a i(1) ) are drawn from a normal distribution N(0, 1).
For each scenario involving a specific bicluster type, the matching scores of the detected biclusters and the true ones under Gaussian white noise with different SNRs (dB) were derived for each method. These SNR-MS curves of the compared methods and our algorithm are displayed in Fig 13, where parts a, b, c and d show the case concerning constant, constant-row/column, additive and multiplicative biclusters, respectively. Depending on such SNR-MS curves, the robustness of each method can be reasonably measured. As shown in Fig 13, HDSVS performs well and stably under different SNRs. Espetially, it outperforms others when additive biclusters are involved. This validates the robustness and generalization capability of our method. To further evaluate the statistical significance of the results generated by different methods, a biological 2D tensor, namely gene expression data of yeast cells towards different stress conditions, was employed in our comparison experiments [20]. The original microarray data (http://www.tik.ee.ethz.ch/sop/bimax/) contains 2993 genes and 173 stress conditions, and have been normalized using mean centering. A set of Perl modules for accessing GO information and evaluationg the collective annotation of a gene group to GO terms was developed in [32], based on which the statistical significance of each annotation can be calculated. Using such modules, we carried out the GO enrichment significance test for our method and each of the comparison methods. The results from different thresholds are presented in Fig 14, where HDSVS and SMR outperforms other methods in this significance test. However, HDSVS (59 seconds) is about 12 times faster than SMR (786 seconds, when K = 2).

Conclusion
In this paper, we proposed a co-clustering method based on hyperplane detection in singular vector spaces (HDSVS), to identify co-clusters in high-order tensors. Based on linear structures of co-cluster patterns, this algorithm successfully extracted significant co-clusters (δ-CLs). Specifically, linear patterns embedded in the singular vector matrix along each mode, produced by a truncated HOSVD, were the key to co-cluster identification. These linear structures revealed by LGA showed a favorable performance in capturing the significant patterns. HDSVS was validated by multiple synthetic and biological tensors.
It is worth noting that, the performance of HDSVS was investigated with respect to different noise levels and overlapped degrees in tensors. Our method showed a robust performance on noisy tensors, due to the selection of singular vectors by the truncated HOSVD. Meanwhile, the applications of HDSVS to two biological tensors, namely the gene×patient×time tensor and the gene×descendant×founder tensor, validated its reliability in dealing with real-world applications. Especially, the genes in the detected co-clusters were significantly enriched in biologically-verified pathways and GO terms. In addition, comparisons between HDSVS and other popular methods on 2D synthetic data and 2D yeast gene expression data further showed the robustness and stability of HDSVS. The experiment results show that HDSVS is an efficient method for cocluster identification in high-order tensors. In this paper, we have used HOSVD for tensor decomposition. We can also consider the use of several other decomposition methods. For example, the dominant multidimensional subspace in tensor data can be found using higher order orthogonal iteration of tensors (HOOI) [50]. As discussed above, PARAFAC with sparse latent factors [29] has good performance for detecting co-clusters of rank 1 with low SNR. These decomposition methods can be explored in the future to improve the performance of HDSVS.
Supporting Information S1 Dataset. Yeast gene expression data, C. elegans cell cycle data and sclerosis patients gene expression data.