S3CMTF: Fast, accurate, and scalable method for incomplete coupled matrix-tensor factorization

Dongjin Choi; Jun-Gi Jang; U Kang

doi:10.1371/journal.pone.0217316

Abstract

How can we extract hidden relations from a tensor and a matrix data simultaneously in a fast, accurate, and scalable way? Coupled matrix-tensor factorization (CMTF) is an important tool for this purpose. Designing an accurate and efficient CMTF method has become more crucial as the size and dimension of real-world data are growing explosively. However, existing methods for CMTF suffer from lack of accuracy, slow running time, and limited scalability. In this paper, we propose S³CMTF, a fast, accurate, and scalable CMTF method. In contrast to previous methods which do not handle large sparse tensors and are not parallelizable, S³CMTF provides parallel sparse CMTF by carefully deriving gradient update rules. S³CMTF asynchronously updates partial gradients without expensive locking. We show that our method is guaranteed to converge to a quality solution theoretically and empirically. S³CMTF further boosts the performance by carefully storing intermediate computation and reusing them. We theoretically and empirically show that S³CMTF is the fastest, outperforming existing methods. Experimental results show that S³CMTF is up to 930× faster than existing methods while providing the best accuracy. S³CMTF shows linear scalability on the number of data entries and the number of cores. In addition, we apply S³CMTF to Yelp rating tensor data coupled with 3 additional matrices to discover interesting patterns.

Citation: Choi D, Jang J-G, Kang U (2019) S³CMTF: Fast, accurate, and scalable method for incomplete coupled matrix-tensor factorization. PLoS ONE 14(6): e0217316. https://doi.org/10.1371/journal.pone.0217316

Editor: Junwen Wang, Mayo Clinic Arizona, UNITED STATES

Received: February 13, 2019; Accepted: May 8, 2019; Published: June 28, 2019

Copyright: © 2019 Choi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data files are available from the web page: https://datalab.snu.ac.kr/S3CMTF/.

Funding: This work was supported by the National Research Foundation of Korea (NRF) funded by MSIT (2019R1A2C2004990, and NRF-016M3C4A7952587, PF Class Heterogeneous High Performance Computer Development). The Institute of Engineering Research at Seoul National University provided research facilities for this work. The ICT at Seoul National University provides research facilities for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Given a tensor data, and related matrix data, how can we analyze them efficiently? Tensors (i.e., multi-dimensional arrays) and matrices are natural representations for various real world high-order data [1, 2, 3]. For instance, an online review site Yelp provides rich information about users (name, friends, reviews, etc.), or businesses (name, city, Wi-Fi, etc.). One popular representation of such data includes a 3-way rating tensor with (user ID, business ID, time) triplets and an additional friendship matrix with (user ID, user ID) pairs. Coupled matrix-tensor factorization (CMTF) is an effective tool for joint analysis of coupled matrices and a tensor. The main purpose of CMTF is to integrate matrix factorization [4] and tensor factorization [5] to efficiently extract the factor matrices of each mode. The extracted factors have many useful applications such as latent semantic analysis [6, 7, 8], recommendation systems [9, 10], network traffic analysis [11], and completion of missing values [12, 13, 14].

However, existing CMTF methods do not provide good performance in terms of time, accuracy, and scalability. CMTF-Tucker-ALS [15], a method based on Tucker decomposition [16], has a limitation that it is only applicable for dense data and not parallelizable. For sparse real-world data, it assumes empty entries as zero and outputs highly skewed results which lead to high reconstruction error. Moreover, CMTF-Tucker-ALS does not scale to large data because it suffers from high memory requirement caused by M-bottleneck problem [17]. CMTF-OPT [12] is a CMTF method based on CANDECOMP/PARAFAC (CP) decomposition [18]. SDF [19] provided Quasi-Newton and nonlinear least squares optimization techniques for general coupled factorization problems where factors may have certain structures as Toeplitz, orthogonal and nonnegative. CMTF-Tucker-ALS and CMTF-OPT undergo high reconstruction error since the former is not applicable to sparse data, and the latter focuses only on CP model and thus cannot be generalized to the Tucker model. Furthermore, both methods are sequential and hard to take benefit of multi-core parallelization.

In this paper, we propose S³CMTF (Sparse, lock-free SGD based, and Scalable CMTF), a CMTF method which resolves the problems of previous methods. S³CMTF provides parallel, sparse CMTF based on Tucker factorization unlike previous methods which do not support sparse tensors or cannot be parallelized. We also show that asynchronously parallel stochastic gradient descent (SGD) is useful for S³CMTF in multi-core shared memory systems without expensive locking. S³CMTF further boosts the performance by storing intermediate computation and reusing them. Table 1 shows the comparison of S³CMTF and other existing methods. The main contributions of our study are as follows:

Algorithm: We propose S³CMTF, a coupled tensor-matrix factorization algorithm for matrix-tensor joint datasets. S³CMTF is designed to efficiently extract factors from the joint datasets by taking advantage of sparsity, exploiting intermediate data. We propose a method which resolves conflicts of parallelization and leads to a solution with guaranteed convergence.
Performance: S³CMTF shows the best performance on accuracy, speed, and scalability. S³CMTF runs up to 930× faster and is more scalable than existing methods as shown in Fig 1A. For real-world datasets, S³CMTF converges faster to the better optimum as shown in Fig 1B.
Discovery: Applying S³CMTF on Yelp review dataset with a 3-mode tensor (user, business, time) coupled with 3 additional matrices ((user, user), (business, category), and (business, city)), we observe interesting patterns and clusters of businesses and suggest a process for personal recommendation.

Download:

Table 1. Comparison of our proposed S³CMTF and the existing CMTF methods.

S³CMTF outperforms all other methods in terms of time, accuracy, scalability, memory usage, and parallelizability.

https://doi.org/10.1371/journal.pone.0217316.t001

Download:

Fig 1. Comparison of our proposed S³CMTF and the existing methods.

(a) For a fixed number of nonzeros, S³CMTF takes constant time as dimensionality grows, while existing methods become slower. Our sequential method S³CMTF-opt1 is 930× and 54× faster than CMTF-OPT and CMTF-Tucker ALS, respectively. (b) S³CMTF-opt20 shows the best convergence rate and accuracy on real world Yelp dataset. CMTF-Tucker-ALS shows O.O.M. in both experiments. (O.O.M.: out of memory error).

https://doi.org/10.1371/journal.pone.0217316.g001

Preliminaries and related works

In this section, we describe preliminaries for tensor and coupled matrix-tensor factorization. We list all symbols used in this paper in Table 2.

Download:

Table 2. Table of symbols.

https://doi.org/10.1371/journal.pone.0217316.t002

Tensor

A tensor is a multi-dimensional array. Each ‘dimension’ of a tensor is called mode or way. The length of each mode is called ‘dimensionality’ and denoted by I₁, ⋯, I_N. In this paper, an N-mode or N-way tensor is denoted by the boldface Euler script capital (e.g. ), and matrices are denoted by boldface capitals (e.g. A). x_α and a_β denote the entry of and A with indices α and β, respectively.

We describe tensor operations used in this paper. A mode-n fiber is a vector which has fixed indices except for the n-th index in a tensor. The mode-n matrix product of a tensor with a matrix is denoted by and has the size of I₁×⋯I_n−1×J×I_n+1 ⋯ × I_N. It is defined as: (1) where is the (j, i_n)-th entry of A. For brevity, we use the following shorthand notation for multiplication on every mode as in [20]: (2) where {A} denotes the ordered set {A⁽¹⁾, A⁽²⁾, ⋯, A^(N)}.

We use the following notation for multiplication on every mode except n-th mode. We examine the case that an ordered set of row vectors {a⁽¹⁾, a⁽²⁾, ⋯, a^(N)}, denoted by {a}, is multiplied to a tensor . First, consider the multiplication for every corresponding mode. By Eq (1), where denotes the k-th element of a^(m). Then, consider the multiplication for every mode except n-th mode. Such multiplication results in a vector of length I_n. The k-th entry of the vector is (3) where denotes the index set of having its n-th index as k. α = (i₁ i₂⋯i_N) denotes the index for an entry.

Tucker decomposition

Tucker decomposition is one of the most popular tensor factorization models. Tucker decomposition factorizes an N-mode tensor into a core tensor and factor matrices satisfying Element-wise formulation of Tucker model is (4) where α is a tensor index (i₁i₂⋯i_N), and denotes the i_n-th row of factor matrix U⁽ⁿ⁾. {u}_α denotes the set of factor rows . The core tensor indicates the relation between the factors in Tucker formulation. When the core tensor size is restricted as J₁ = J₂ = ⋯ = J_N and the core tensor structure is hyper-diagonal, it is equivalent to CANDECOMP/PARAFAC (CP) decomposition. Orthogonality constraint can optionally be imposed to the Tucker decomposition by forcing the factor matrices to have orthonormal columns (e.g. U^(n)T U⁽ⁿ⁾ = I for n = 1, ⋯, N where I is an identity matrix).

Coupled matrix-tensor factorization

Coupled matrix-tensor factorization (CMTF) is proposed for joint factorization of a tensor and matrices. CMTF integrates matrix factorization and tensor factorization.

Definition 1. (Coupled Matrix-Tensor Factorization) Given an N-mode tensor and a matrix where c is the coupled mode, , and are the coupled matrix-tensor factorization. is the c-th mode factor matrix, and denotes the factor matrix for the coupled matrix. Finding the factor matrices and core tensor for CMTF is equivalent to solving (5) where ‖ • ‖ denotes the Frobenius norm.

Various methods have been proposed to efficiently solve the CMTF problem. An alternating least squares (ALS) method CMTF-Tucker-ALS [15] was proposed. CMTF-Tucker-ALS is based on Tucker-ALS (HOOI) [21] which is a popular method for fitting the Tucker model. Tucker-ALS suffers from a crucial intermediate memory-bottleneck problem known as M-bottleneck problem [17] that arises from materialization of a large dense tensor as intermediate data where . Generalized coupled tensor factorization frameworks [22, 23] have been proposed, and they propose multiplicative methods for non-negative factorization. SDF [19] provided Quasi-Newton and nonlinear least squares optimization techniques for general coupled factorization problems where factors may have certain structures as Toeplitz, orthogonal and nonnegative. A Bayesian method [24] has been proposed. It suggests a generative model for tensor factorization and gets parameters with Gibbs sampling method. Most methods for CMTF use CP decomposition model for where J₁ = J₂ = ⋯ = J_N and the core tensor is hyper-diagonal [12, 25, 26, 27, 28, 19]. CMTF-OPT [12] is a representative algorithm for this problem which uses nonlinear conjugate gradient descent method to find factors. HaTen2 [26, 29], and SCouT [25] propose distributed methods for CMTF using CP decomposition model based on the MapReduce framework. Turbo-SMT [27] provides a time-boosting technique for CP-based CMTF methods.

Note that Eq (5) requires all data entries of and Y to be observed. Unobserved values are set to zeros when and Y are sparse, which results in low accuracy. However, most real world data set shows high sparsity. For example, the density of real world tensor we use for experiments vary from 10⁻⁷ to 10⁻⁴. For this reason above methods show low accuracy for real-world sparse data; what we focus on this paper is solving CMTF for sparse data.

Definition 2. (Sparse CMTF) When and Y are sparse, sparse CMTF aims to find factors only considering the observed entries. Let indicates the observed entries of such that Let W⁽²⁾ indicates the observed entries of Y analogously. We modify Eq (5) as (6) where * denotes the Hadamard product (element-wise product).

CMTF-Tucker-ALS does not support sparse CMTF since it calculates a singular vector of full and dense matrix. CMTF-OPT provides single machine approach for sparse CMTF for CP model, and CDTF [30] and FlexiFaCT [28] provide distributed methods for sparse CMTF for CP model. Note that all existing methods are based on CP model. Our method is for more general setting, Tucker decomposition, and also easily applied to CP model.

Proposed method

Overview

S³CMTF provides an algorithm for the joint factorization of Tucker decomposition. The major challenge of parallel Tucker decomposition is to avoid the race condition, and design an efficient algorithm for updating factors.

In this section, we describe S³CMTF (Sparse, lock-free SGD based, and Scalable CMTF), our proposed method for fast, accurate, and scalable CMTF. Our purpose is to minimize the number of race conditions with probabilistic guarantee by exploiting problem characteristic and minimize calculations by exploiting intermediate data.

We first propose a lock-free parallel method S³CMTF-base; then, we propose a time-improved version S³CMTF-opt. Fig 2 shows the overall scheme of S³CMTF. S³CMTF-base employs asynchronous parallel SGD for the parallel update with proper workload distribution, and S³CMTF-opt further improves the speed of S³CMTF-base by exploiting intermediate data and reusing them.

Download:

Fig 2. The scheme for S³CMTF.

https://doi.org/10.1371/journal.pone.0217316.g002

Objective function & gradient

We discuss the improved formulation of the sparse CMTF problem defined in Definition 2. For simplicity, we consider the case that one matrix is coupled to the c-th mode of a tensor . Naive calculation of Eq (6) takes excessive time and memory since it includes materialization of dense tensor . Therefore, we re-formulate the new CMTF objective function f to exploit the sparsity of data and add regularization. f is the weighted sum of two functions f_t and f_m which are element-wise sums of squared reconstruction error and regularization terms of tensor and matrix Y, respectively. (7) where λ_m is a balancing factor of the two functions. where α = (i₁⋯i_N), is the observable index set of , and λ_reg denotes the regularization parameter for factors. We rewrite the equation so that it is amenable to SGD update: where α = (i₁⋯i_N). Note that is the subset of having i_n as the n-th index. Now we formulate f_m, the sum of squared errors of coupled matrix and regularization term corresponding to the coupled matrix. We calculate the gradient of f (Eq (7)) with respect to factors and core for stochastic gradient descent update. Consider that we pick one index and matrix index β = (j₁j₂) ∈ Ω_Y. We calculate the corresponding partial derivatives of f with respect to the factors and the core tensor as follows. (8a) (8b) (8c) (8d)

Note that our formulated coupled matrix-tensor factorization model is easily generalized to the case that multiple matrices are coupled to a tensor. We couple multiple matrices to a tensor for experiments in Sections for experiments and discovery.

Multi-core parallelization

How can we parallelize the SGD updates for CMTF in multiple cores? In CMTF, SGD is hard to be parallelized without conflicts since each update may suffer from memory conflicts by attempting to write the core tensor to memory concurrently [31]. One solution for this problem is memory locking and synchronization. However, there are lots of overhead associated with locking. Therefore, we use lock-free strategy to parallelize S³CMTF. We develop a parallel update scheme for S³CMTF by adopting HOGWILD! update scheme [32]. For any SGD problem, a hypergraph is induced where its nodes represent parameters and edges represent the set of parameters related to a data point.

Definition 3. (Induced Hypergraph) The objective function in Eq (7) induces a hypergraph G = (V, E) whose nodes represent factor rows and the core tensor. Each entry of and Y induces a hyperedge e ∈ E consisting of corresponding factor rows or core tensor. Fig 3A shows an example induced graph of S³CMTF.

Download:

Fig 3. Example hypergraphs induced by S³CMTF objective function (Eq (7)).

A matrix Y is coupled to the second mode of with a coupled factor matrix V. Each node represents a factor row or the core tensor. Each hyperedge includes corresponding factors to an SGD update. (a) Induced hypergraph with the core tensor. Every hyperedge corresponding to tensor entries includes . (b) Induced hypergraph without core tensor. The graph has sparse structure as every node is shared by only few hyperedges.

https://doi.org/10.1371/journal.pone.0217316.g003

Lock-free parallel updates often converge nearly linearly for a sparse SGD problem in which conflicts between different updates rarely occur [32]. However, in CMTF with Tucker formulation, every update of tensor entries includes the core tensor as shown in Fig 3A. We allocate the update of the core tensor to one dedicated CPU core and increase the step size by the number to keep the expected step size unchanged, which leads to line 7 of Algorithm 1 described in the next section. Then we obtain a new induced hypergraph in Fig 3B. Previous induced hypergraph (Fig 3A) implies that every factor update (red, blue, and orange hyperedges) is in conflict with each other on updating the core tensor, resulting to unexpected behaviors. In contrast, the new induced hypergraph shows that the update of factors is independent of that of the core tensor.

Note that our problem with this induced hypergraph is a general case of matrix completion problem in [32] which provides convergence guarantee of lock-free parallelism; each edge in our hypergraph entails N vertices, while that in [32] entails only 2 vertices.

Algorithm 1 S³CMTF-base

Require: Tensor , rank (J₁, ⋯, J_N), number of parallel cores P, initial learning rate η₀, decay rate μ, coupled mode c, and coupled matrix

Emsure: Core tensor , factor matrices U⁽¹⁾, ⋯, U^(N), V

1: Initialize , for n = 1, ⋯, N, and V randomly

2: repeat

3: for , ∀β = (j₁j₂) ∈ Ω_Y in random order do in parallel

4: if α is picked then

5: (,⋯,,) ←compute_gradient(α,x_α,)

6: , (for n = 1, ⋯, N)

7: (executed by one dedicated CPU core)

8: end if

9: if β is picked then

10: ,

11:

12: ,

13: end if

14: end for

15: η_t = η₀(1 + μt)⁻¹

16: until convergence conditions are satisfied

17: for n = 1, …, N do

18: Q⁽ⁿ⁾,R⁽ⁿ⁾ ← QR decomposition of U⁽ⁿ⁾

19: U⁽ⁿ⁾ ← Q⁽ⁿ⁾,

20: end for

21:

22: return , U⁽¹⁾, ⋯, U^(N), V

S³CMTF-base

We present our method, S³CMTF-base, combination of the aforementioned techniques. S³CMTF-base solves the sparse CMTF problem by parallel SGD techniques explained above. Algorithm 1 shows the procedure of S³CMTF-base. In the beginning, S³CMTF-base initializes factor matrices and the core tensor randomly (line 1 of Algorithm 1). The outer loop (lines 2-16) repeats until the factor variables converge. The inner loop (lines 3-14) is performed by several cores in parallel. In each inner loop, S³CMTF-base selects an index which belongs to or Ω_Y in random order (line 3). If a tensor index α is picked, then the algorithm calculates the partial gradients of corresponding factor rows using compute_gradient (Algorithm 2) in line 5, and updates factor row vectors (line 6). Core tensor is updated by one dedicated CPU core (line 7). Note that if line 7 is run by multiple cores, a core may interrupt another core’s update of by overwriting the gradient , which leads to unexpected update of and hinders convergence; thus, we eliminate the possibility of such conflict by allocating update of to the dedicated CPU core. The update of line 7 is done independently by the dedicated CPU core, but concurrently with gradient calculation (line 5) and factor updates (line 6) of other CPU cores. The number P of cores is multiplied to the gradient to compensate for the one-core update so that SGD uses the same expected learning rate for all the parameters. If a coupled matrix index β is picked, then the gradient update is performed on corresponding factor row vectors (lines 9-13). At the end of the outer loop, the learning rate η_t of the t-th iteration is monotonically decreased [33]. (line 15). QR decomposition is applied on factors to satisfy orthogonality constraint of factor matrices (lines 17-20). QR decomposition of U⁽ⁿ⁾ generates Q⁽ⁿ⁾, an orthogonal matrix of the same size as U⁽ⁿ⁾, and a square matrix . Substituting U⁽ⁿ⁾ by Q⁽ⁿ⁾ (line 19) and by (after N-th execution of line 19) result in orthogonal factors with equivalent factorization quality [5]. In the same manner, we substitute V by (line 21) since .

Algorithm 2 compute_gradient(α,x_α,)

Require: Tensor entry x_α, , core tensor

Ensure: Gradients ,,⋯,,

1:

2: for n = 1, ⋯, N do

3:

4: end for

5:

6: return ,,⋯,,

Algorithm 3 compute_gradient_opt(α,x_α,)

Require: Tensor entry x_α, , core tensor

Ensure: Gradients ,,⋯,,

1:

2: for do

3:

4:

5: end for

6: for n = 1, …, N do

7:

8: end for

9:

10: return ,,…,,

S³CMTF-opt

There is much room for improvement in calculations of S³CMTF-base. The computational bottleneck of S³CMTF-base is compute_gradient. There are implicitly redundant calculations during multiple tensor-matrix products. For example, calculation of is repeated N times for every execution of compute_gradient (Algorithm 2) in line 5 of Algorithm 1. The calculation of for the n-th mode is equivalent to a special case of a well-studied operation, matricized tensor times Khatri-Rao product (MTTKRP). MTTKRP is an operation to compute X_(n) ⊙_{∀k ≠ n} A^(k) where X_(n) is a matricized tensor along the n-th mode, and ⊙ denotes the Khatri-Rao product [34]. is equivalent to an MTTKRP G_(n) ⊙_{∀k ≠ n} u^(k) where the matrix A^(k) is replaced by the vector u^(k).

Calculating MTTKRP along all modes is known as the CP gradient problem. In compute_gradient, we need to calculate for all N modes (line 3 of Algorithm 2), raising the special case of the CP gradient problem. To solve the particular CP gradient problem faster, we propose a method to avoid redundant computations by reusing the intermediate calculations in previous steps. Calculation of is equivalent to a summation of (Eq 3) which is a product of the core value and N − 1 related factor values. Before the calculation of the CP gradient, is calculated in line 1 of Algorithm 2. We exploit the fact that is the summation of the product (Eq 4), the product of a core value and all N related factor values. In S³CMTF-opt, we save time by storing the intermediate calculations for and reusing them.

Definition 4. (Intermediate Data) When updating the factor rows for a tensor entry , we define (j₁j₂⋯j_N)-th element of intermediate data :

There is no extra time required for calculating because is generated while calculating . Lemma 1 shows that is calculated by summing all entries of .

Lemma 1. For a given tensor index α, the estimated tensor entry .

Proof. The proof is straightforward by Eq (4).

We use with following Collapse operation to calculate gradients efficiently.

Definition 5. (Collapse) The Collapse operation of the intermediate tensor on the n-th mode outputs a row vector defined as:

Collapse operation aggregates the elements of intermediate tensor with respect to a fixed mode. We re-express the calculation of gradients for tensor factors in Eqs (8a)–(8d) in an efficient manner.

Lemma 2. (Efficient Gradient Calculation) The following statements are equivalent calculations of the gradients as in Eqs (8a)–(8d). (9a) (9b) (9c) where α = (i₁ i₂⋯i_N), and ⊘ denotes element-wise division.

Proof. In Lemma 1, Eq (9a) is proved. To prove the equivalence of Eq (9b) and the Eq (8a), it suffices to show We use Eq (3) for the proof. and . Next, to show the equivalence of Eq (9c) and the second equation of Eq (8), it suffices to show .

S³CMTF-opt replaces compute_gradient (Algorithm 2) of S³CMTF-base with compute_gradient_opt (Algorithm 3), the time-optimized alternative. We prove that the new calculation scheme is faster than the previous one.

Lemma 3. compute_gradient_opt is faster than compute_gradient. The theoretical time complexity of compute_gradient is and the time complexity of compute_gradient_opt is where J₁ = J₂ = ⋯ = J_N = J.

Proof. We assume that I₁ = I₂ = ⋯ = I_N = I for brevity. First, we calculate the time complexity of compute_gradient (Algorithm 2). Given a tensor index α, computing (line 1 of Algorithm 2) takes . Computing () (line 3) takes . Thus, aggregate time for calculating the row gradient for all modes (lines 2-4) takes . Calculating (line 5) takes . In total, compute_gradient takes time. Next, we calculate the time complexity of compute_gradient_opt (Algorithm 3). Computing an entry of intermediate data (line 3 of Algorithm 3) takes . Aggregate time for getting (lines 2-5) is since . Calculating row gradient for all modes (lines 6-8) takes since Collapse operation takes . Calculating gradient for core tensor (line 9) takes . In total, compute_gradient_opt takes time.

Analysis

We analyze the proposed method in terms of time complexity per iteration. For simplicity, we assume that I₁ = I₂ = ⋯ = I_N = I, and J₁ = J₂ = ⋯ = J_N = J. Table 3 summarizes the time complexity (per iteration) and memory usage of S³CMTF and other methods. Note that the memory usage refers to the auxiliary space for temporary variables used by a method.

Download:

Table 3. Comparison of time complexity (per iteration) and memory usage of our proposed S³CMTF and other CMTF algorithms.

S³CMTF-opt shows the lowest time complexity and S³CMTF-base shows the lowest memory usage. For simplicity, we assume that all modes are of size I, of rank J, and an I × K matrix is coupled to one mode. P is the number of parallel cores. (* indicates the lowest time or memory).

https://doi.org/10.1371/journal.pone.0217316.t003

Lemma 4. The time complexity (per iteration) of S³CMTF-base is and the time complexity (per iteration) of S³CMTF-opt is where P denotes the number of parallel cores.

Proof. First, we check the time complexity of S³CMTF-base. When a tensor index α is picked in the inner loop (line 4 of Algorithm 1), calculating gradients with respect to tensor factors (line 5) takes as shown in Lemma 3. Updating factor rows (line 6) takes , and updating core tensor (line 7) takes . If a coupled matrix index β is picked (line 9), calculating (line 10) takes . Calculating and updating the factor rows corresponding to coupled matrix entry (lines 10-12) take . All calculations except updating core tensor (line 7) are conducted in parallel. Finally, for all and β ∈ Ω_Y, S³CMTF-base takes for one iteration. S³CMTF-opt uses compute_gradient_opt instead of compute_gradient in line 5 of Algorithm 1, whose time complexity is shown in Lemma 3. Overall running time per iteration for S³CMTF-opt is .

Experiments

In this and the next sections, we experimentally evaluate S³CMTF. Especially, we answer the following questions.

Q1: Performance How accurate and fast is S³CMTF compared to competitors?

Q2: Scalability How do S³CMTF and other methods scale in terms of dimensionality, the number of observed entries, and the number of cores?

Q3: Discovery What are the discoveries of applying S³CMTF on real-world data?

The source codes of our method and datasets used in this paper are available at https://datalab.snu.ac.kr/S3CMTF.

Experimental settings

Data.

Table 4 shows the data we used in our experiments. We use three real-world datasets, MovieLens (http://grouplens.org/datasets/movielens/10m), Netflix (http://www.netflixprize.com), and Yelp (http://www.yelp.com/dataset_challenge), as well as synthetic data to evaluate S³CMTF. Each entry of the real-world datasets represents a rating, which consists of (user, ‘item’, time; rating) where ‘item’ indicates ‘movie’ for MovieLens and Netflix, and ‘business’ for Yelp. We use (movie, genre) and (movie, year) as coupled matrices for MovieLens and Netflix, respectively. We use (user, user) friendship matrix, (business, category) and (business, city) matrices for Yelp. Particularly for scalability experiments, we generate 3-mode synthetic random tensors with dimensionality I and corresponding coupled matrices to observe speed property while size is varying. We vary I in the range of 1K∼100M and the number of tensor entries in the range of 1K∼100M. We set the number of entries as for synthetic coupled matrices. We generated observed indices randomly, and their entries to follow uniform random distribution between 0 and 1.

Download:

Table 4. Summary of the data used for experiments.

‘K’ means thousand, and ‘M’ million. Tensors and matrices of density 1 are fully observed.

https://doi.org/10.1371/journal.pone.0217316.t004

Measure.

We use test RMSE as the measure for tensor reconstruction error. where Ω_test is the index set of the test data tensor, x_α stands for each test tensor entry, and is the corresponding reconstructed value.

Methods.

For fair comparison, we compare single core run of S³CMTF-base and S³CMTF-opt with other single machine CMTF methods: CMTF-Tucker-ALS and CMTF-OPT (described in Section). To examine multi-core performance, we run two versions of S³CMTF-opt: S³CMTF-opt1 (1 core), and S³CMTF-opt20 (20 cores). We exclude distributed CMTF methods [25, 26, 28] since they are designed for Hadoop with multiple machines, and thus take too much time for single machine environment. For example, [17] reported that HaTen2 [26] takes 10,700s to decompose 4-way tensor with I = 10K and , which is almost 7,000× slower than our single machine implementation of S³CMTF-opt. For CMTF-Tucker-ALS, we implemented a C++ version based on Tucker-MET [20], and for CMTF-OPT, we implemented a C++ version of CMTF-OPT [12]. Our implementation for CMTF-OPT solves Eq (6) by sparse matrix operations. We implement S³CMTF with C++. For all of our C++ implementations, we used C++11 with O2 flag. We used Armadillo 7.700 with LAPACK 3.7.0 and BLAS 3.7.0 for matrix operations such as eigenvector calculations. We used OpenMP 4.0 library for multi-core parallelization of S³CMTF.

We conduct all experiments on a machine equipped with Intel Xeon E5-2630 v4 2.2GHz CPU and 256GB RAM. We mark out-of-memory (O.O.M.) error when the memory usage exceeds the limit.

Hyperparameters.

We set pre-defined hyperparameters that resulted in the best reconstruction error on a 10% validation set by random grid search: tensor rank J, regularization factor λ_reg, λ_m, the initial learning rate η₀, and decay rate μ. We set λ_reg to 0.1, λ_m = 10, and μ = 0.1 for all datasets. For rank and initial learning rate, MovieLens: J = 12, η = 0.001, Netflix: J = 11, η = 0.001, and Yelp: J = 10, η = 0.0005. For synthetic datasets, we use J = 10 for all experiments.

Performance of S³CMTF

We observe the performance of S³CMTF to answer Q1. As seen in Figs 1B and 4, S³CMTF converges faster to the optimum with the lowest test error than existing methods with the following details.

Download:

Fig 4. Test RMSE of S³CMTF and other CMTF methods over iterations.

S³CMTF-opt20 shows the best convergence rate and accuracy.

https://doi.org/10.1371/journal.pone.0217316.g004

Accuracy.

We divide each data tensor into 80%/20% for train/test sets. Specifically, 80% of the tensor entries are regarded as the train set and remaining 20% as the test set. The lower error for a same elapsed time implies the better accuracy and faster convergence. Figs 1B and 4 show the changes of test RMSE of each method on three datasets over elapsed time which are the answers for Q1. S³CMTF achieves the lowest error compared to others for the same elapsed time. For Yelp, CMTF-Tucker-ALS yielded an O.O.M. error. S³CMTF-opt20 achieves the lowest error 1.253, 0.9147, and 0.8037 while the best competing method, CMFT-OPT, gives the error 1.370, 1.018, and 0.8125 for Yelp, Netflix, and MovieLens datasets, respectively. Note that the competing method CMFT-Tucker-ALS gives either an out of memory error or results in the highest error rate.

Running time.

We compare our method with the multi-core version of SALS-single [30], a parallel CP decomposition algorithm, to demonstrate the high performance of S³CMTF compared to the state-of-the-art decomposition algorithms. We used non-coupled CP version of our method, S³CMTF-CP-opt, by setting to be hyper-diagonal and not coupling any matrices. Fig 5 shows that S³CMTF is better than SALS-single in terms of both error and time for MovieLens dataset. S³CMTF-TUCKER explicitly denotes S³CMTF-opt for Tucker model.

Download:

Fig 5. Comparison with SALS-single for movieLens dataset.

We compare two non-coupled version of S³CMTF, S³CMTF-CP-opt and S³CMTF-TUCKER-opt with the parallel CP decomposition method, SALS-single. For (a), we set 1 mark per 20 iterations for clarity. (a) S³CMTF converges faster to a lower error than SALS does. (b) S³CMTF-CP-opt is 2.3× faster than SALS-single.

https://doi.org/10.1371/journal.pone.0217316.g005

Scalability analysis

We present scalability of our proposed S³CMTF and competitors to answer Q2, in terms of two aspects: data scalability and parallel scalability. We use synthetic data of varying size for evaluation. As a result, we show the running time (for one iteration) of S³CMTF follows our theoretical analysis in Section.

Data scalability.

The time complexity of CMTF-Tucker-ALS and CMTF-OPT have and as their dominant terms, respectively. In contrast, S³CMTF exploits the sparsity of input data, and has the time complexity linear to the number of entries (, |Ω_Y|) and is independent of the dimensionality (I) as shown in Lemma 4. Figs 1A and 6A show that the running time (for one iteration) of S³CMTF on real world data sets follows our theoretical analysis in Section. First, we fix to 1M and |Ω_Y| to 100K, and vary dimensionality I from 1K to 100M. Fig 1A shows the running time (for one iteration) of all methods with J = 10. Note that all of our proposed methods achieve constant running time as dimensionality increases because they exploit the sparsity of data by updating factors related to only observed data entries. However, CMTF-Tucker-ALS and CMTF-OPT show exponentially increasing running time, and CMTF-OPT shows O.O.M. when I = 10M. Next, we investigate the data scalability over the number of entries as shown in Fig 6A. We fix I to 10K and raise from 10K to 100M. CMTF-Tucker-ALS shows O.O.M. when , and CMTF-OPT shows near-linear scalability. Focusing on the results of S³CMTF, all three versions of our approach show linear relation between running time and .

Download:

Fig 6. Comparison of scalability.

(a) S³CMTF shows linear scalability as the number of entries increases. (b) S³CMTF-base and S³CMTF-opt show linear speed up as the number of cores grows. O.O.M.: out of memory error.

https://doi.org/10.1371/journal.pone.0217316.g006

Parallel scalability.

We conduct experiments to examine parallel scalability of S³CMTF on shared memory systems. For measurement, we define speed up as (iteration time on 1 core)/(iteration time). Fig 6B shows the linear speed up of S³CMTF-base and S³CMTF-opt. The slope of the parallel scalability curve is not one (perfectly parallelizable) since the growing number of cores leads to the concurrent read accesses to memory, which leads to conflicts. S³CMTF-opt shows higher speed up than S³CMTF-base because it reduces reading accesses for core tensor by utilizing intermediate data.

Discovery

In this section, we use S³CMTF for mining real-world data, Yelp, to answer the question Q3 in the beginning of the previous section. First, we demonstrate that S³CMTF has better discernment for business entities compared to the naive decomposition method by jointly capturing spatial and categorical prior knowledge. Second, we show how S³CMTF is possibly applied to the real recommender systems. It is an open challenge to jointly capture the spatio-temporal context along with user preference data [35]. We exemplify a personal recommendation for a specific user. For discovery, we use the total Yelp data tensor along with coupled matrices as explained in Table 4. For better interpretability, we found a non-negative factorization by applying projected gradient method [36]. An orthogonality condition is not imposed to keep non-negativity, and each column of factors is normalized.

Discovery

First, we compare discernment by S³CMTF and the Tucker decomposition. We use the business factor U⁽²⁾. Fig 7A shows the gap statistic values of clustering business entities with k-means clustering algorithm. Gap statistic is a theoretical tool to measure separability between k-means clusters [37]. A higher gap statistic means higher separability between clusters. S³CMTF shows higher gap statistic values compared to the Tucker decomposition which means S³CMTF outperforms the naive Tucker decomposition for entity clustering with respect to the gap statistic.

Download:

Fig 7.

(a) Gap statistics on U⁽²⁾ of S³CMTF and the Tucker decomposition for Yelp dataset. S³CMTF outperforms the naive Tucker decomposition for its clustering ability. (b) Visualization of the personal recommendation scenario.

https://doi.org/10.1371/journal.pone.0217316.g007

As the difference between S³CMTF and the Tucker decomposition is in the existence of coupled matrices, the high performance of S³CMTF is attributed to the unified factorization using spatial and categorical data as prior knowledge. Table 5 shows the found clusters of business entities. Note that each cluster represents a certain combination of spatial and categorical characteristics of business entities.

Download:

Table 5. Clustering results on business factor U⁽²⁾ found by S³CMTF.

We found dominant spatial and categorical characteristics from each cluster. Businesses in a same cluster tend to be in adjacent cities and are included in similar categories.

https://doi.org/10.1371/journal.pone.0217316.t005

User-specific recommendation

Commercial recommendations are one of the most important applications of factorization models [4, 9]. Here we illustrate how factor matrices are used for personalized recommendations with a real example. Fig 7B shows the process for recommendation. Below, we illustrate the process in detail.

An example user Tyler has a factor vector u, namely user profile, which has been calculated by previous review histories.
We then calculate the personalized profile matrix . measures the amount of interaction of user profile with business and time factors.
Norm values of rows in indicate the influence of latent business concepts on Tyler. Dominant and weak concepts are found based on the calculated norm values. In the example, B4 is the dominant, and B7 is the weak latent concept.
We inspect the corresponding columns of business factor matrix U⁽²⁾ and find relevant business entities with high values for the found concepts (B4 and B7).

We found both strong and weak entities by the above process. The strong and weak entities provide recommendation information by themselves in the sense that the probability of the user to like strong and weak entities are high and low, respectively, and they also give extended user preference information. For example, strong entities for Tyler are related to ‘spa & health’ and located in neighborhood cities of Arizona, US. Weak entities are related to ‘grill & restaurants’ and located in Toronto, Canada. The captured user preference information potentially makes commercial recommender systems interpretable with additional user-specific information such as address, current location among others.

Conclusion

We propose S³CMTF, a fast, accurate, and scalable CMTF method. S³CMTF provides up to 930× faster running times and the best accuracy by sparse CMTF with carefully derived update rules, lock-free parallel SGD, and reusing intermediate computation results. S³CMTF shows linear scalability for the number of data entries and parallel cores. Moreover, we show the usefulness of S³CMTF for cluster analysis and recommendation by applying S³CMTF to real-world Yelp data. For future improvements, applying recent achievements in the literature to improve CP gradient algorithm [38, 39] to our method is possible. Also, future works include extending the method to a distributed setting.

References

1. Park N, Jeon B, Lee J, Kang U. BIGtensor: Mining Billion-Scale Tensor Made Easy. In: Proceedings of the International Conference on Information and Knowledge Management. ACM; 2016.
2. Park N, Oh S, Kang U. Fast and Scalable Distributed Boolean Tensor Factorization. In: Data Engineering (ICDE), 2017 IEEE 33rd International Conference on. IEEE; 2017. p. 1071–1082.
3. Oh S, Park N, Sael L, Kang U. Scalable Tucker Factorization for Sparse Tensors—Algorithms and Discoveries. In: Data Engineering (ICDE), 2018 IEEE 34th International Conference on. IEEE; 2018. p. 1120–1131.
4. Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer. 2009;42(8).
- View Article
- Google Scholar
5. Kolda TG, Bader BW. Tensor decompositions and applications. SIAM review. 2009;51(3):455–500.
- View Article
- Google Scholar
6. Ding C, Li T, Peng W. On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Computational Statistics & Data Analysis. 2008;52(8):3913–3927.
- View Article
- Google Scholar
7. Peng W, Li T. On the equivalence between nonnegative tensor factorization and tensorial probabilistic latent semantic analysis. Applied Intelligence. 2011;35(2):285–295.
- View Article
- Google Scholar
8. Xu W, Liu X, Gong Y. Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM; 2003. p. 267–273.
9. Karatzoglou A, Amatriain X, Baltrunas L, Oliver N. Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering. In: Proceedings of the fourth ACM conference on Recommender systems. ACM; 2010. p. 79–86.
10. Rendle S, Schmidt-Thieme L. Pairwise interaction tensor factorization for personalized tag recommendation. In: Proceedings of the third ACM international conference on Web search and data mining. ACM; 2010. p. 81–90.
11. Sael L, Jeon I, Kang U. Scalable tensor mining. Big Data Research. 2015;2(2):82–86.
- View Article
- Google Scholar
12. Acar E, Kolda TG, Dunlavy DM. All-at-once optimization for coupled matrix and tensor factorizations. arXiv preprint arXiv:11053422. 2011.
13. Acar E, Rasmussen MA, Savorani F, Næs T, Bro R. Understanding data fusion within the framework of coupled matrix and tensor factorizations. Chemometrics and Intelligent Laboratory Systems. 2013;129:53–63.
- View Article
- Google Scholar
14. Narita A, Hayashi K, Tomioka R, Kashima H. Tensor factorization using auxiliary information. Data Mining and Knowledge Discovery. 2012;25(2):298–324.
- View Article
- Google Scholar
15. Ozcaglar C. Algorithmic data fusion methods for tuberculosis. Rensselaer Polytechnic Institute; 2012.
16. Tucker LR. Some mathematical notes on three-mode factor analysis. Psychometrika. 1966;31(3):279–311. pmid:5221127
- View Article
- PubMed/NCBI
- Google Scholar
17. Oh J, Shin K, Papalexakis EE, Faloutsos C, Yu H. S-HOT: Scalable High-Order Tucker Decomposition. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM; 2017. p. 761–770.
18. Hitchcock FL. The expression of a tensor or a polyadic as a sum of products. Studies in Applied Mathematics. 1927;6(1-4):164–189.
- View Article
- Google Scholar
19. Sorber L, Van Barel M, De Lathauwer L. Structured data fusion. IEEE Journal of Selected Topics in Signal Processing. 2015;9(4):586–600.
- View Article
- Google Scholar
20. Kolda TG, Sun J. Scalable tensor decompositions for multi-aspect data mining. In: Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE; 2008. p. 363–372.
21. De Lathauwer L, De Moor B, Vandewalle J. On the best rank-1 and rank-(r 1, r 2,…, rn) approximation of higher-order tensors. SIAM journal on Matrix Analysis and Applications. 2000;21(4):1324–1342.
- View Article
- Google Scholar
22. Ermiş B, Acar E, Cemgil AT. Link prediction in heterogeneous data via generalized coupled tensor factorization. Data Mining and Knowledge Discovery. 2015;29(1):203–236.
- View Article
- Google Scholar
23. Yılmaz KY, Cemgil AT, Simsekli U. Generalised coupled tensor factorisation. In: Advances in neural information processing systems; 2011. p. 2151–2159.
24. Khan SA, Leppäaho E, Kaski S. Bayesian multi-tensor factorization. Machine Learning. 2016;105(2):233–253.
- View Article
- Google Scholar
25. Jeon B, Jeon I, Sael L, Kang U. Scout: Scalable coupled matrix-tensor factorization-algorithm and discoveries. In: Data Engineering (ICDE), 2016 IEEE 32nd International Conference on. IEEE; 2016. p. 811–822.
26. Jeon I, Papalexakis EE, Kang U, Faloutsos C. Haten2: Billion-scale tensor decompositions. In: Data Engineering (ICDE), 2015 IEEE 31st International Conference on. IEEE; 2015. p. 1047–1058.
27. Papalexakis EE, Faloutsos C, Mitchell TM, Talukdar PP, Sidiropoulos ND, Murphy B. Turbo-smt: Accelerating coupled sparse matrix-tensor factorizations by 200x. In: Proceedings of the 2014 SIAM International Conference on Data Mining. SIAM; 2014. p. 118–126.
28. Beutel A, Talukdar PP, Kumar A, Faloutsos C, Papalexakis EE, Xing EP. Flexifact: Scalable flexible factorization of coupled tensors on hadoop. In: Proceedings of the 2014 SIAM International Conference on Data Mining. SIAM; 2014. p. 109–117.
29. Jeon I, Papalexakis EE, Faloutsos C, Sael L, Kang U. Mining billion-scale tensors: algorithms and discoveries. The VLDB Journal. 2016;25(4):519–544.
- View Article
- Google Scholar
30. Shin K, Sael L, Kang U. Fully scalable methods for distributed tensor factorization. IEEE Transactions on Knowledge and Data Engineering. 2017;29(1):100–113.
- View Article
- Google Scholar
31. Bradley JK, Kyrola A, Bickson D, Guestrin C. Parallel coordinate descent for l1-regularized loss minimization. arXiv preprint arXiv:11055379. 2011.
32. Recht B, Re C, Wright S, Niu F. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In: Advances in neural information processing systems; 2011. p. 693–701.
33. Bottou L. Stochastic gradient descent tricks. In: Neural networks: Tricks of the trade. Springer; 2012. p. 421–436.
34. Bader BW, Kolda TG. Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing. 2007;30(1):205–231.
- View Article
- Google Scholar
35. Gao H, Tang J, Hu X, Liu H. Exploring temporal effects for location recommendation on location-based social networks. In: Proceedings of the 7th ACM conference on Recommender systems. ACM; 2013. p. 93–100.
36. Lin CJ. Projected gradient methods for nonnegative matrix factorization. Neural computation. 2007;19(10):2756–2779. pmid:17716011
- View Article
- PubMed/NCBI
- Google Scholar
37. Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2001;63(2):411–423.
- View Article
- Google Scholar
38. Vannieuwenhoven N, Meerbergen K, Vandebril R. Computing the gradient in optimization algorithms for the CP decomposition in constant memory through tensor blocking. SIAM Journal on Scientific Computing. 2015;37(3):C415–C438.
- View Article
- Google Scholar
39. Phan AH, Tichavskỳ P, Cichocki A. Fast alternating LS algorithms for high order CANDECOMP/PARAFAC tensor factorizations. IEEE Transactions on Signal Processing. 2013;61(19):4834–4846.
- View Article
- Google Scholar

[ref1] 1. Park N, Jeon B, Lee J, Kang U. BIGtensor: Mining Billion-Scale Tensor Made Easy. In: Proceedings of the International Conference on Information and Knowledge Management. ACM; 2016.

[ref2] 2. Park N, Oh S, Kang U. Fast and Scalable Distributed Boolean Tensor Factorization. In: Data Engineering (ICDE), 2017 IEEE 33rd International Conference on. IEEE; 2017. p. 1071–1082.

[ref3] 3. Oh S, Park N, Sael L, Kang U. Scalable Tucker Factorization for Sparse Tensors—Algorithms and Discoveries. In: Data Engineering (ICDE), 2018 IEEE 34th International Conference on. IEEE; 2018. p. 1120–1131.

[ref4] 4. Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer. 2009;42(8).
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref5] 5. Kolda TG, Bader BW. Tensor decompositions and applications. SIAM review. 2009;51(3):455–500.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref6] 6. Ding C, Li T, Peng W. On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Computational Statistics & Data Analysis. 2008;52(8):3913–3927.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref7] 7. Peng W, Li T. On the equivalence between nonnegative tensor factorization and tensorial probabilistic latent semantic analysis. Applied Intelligence. 2011;35(2):285–295.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref8] 8. Xu W, Liu X, Gong Y. Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM; 2003. p. 267–273.

[ref9] 9. Karatzoglou A, Amatriain X, Baltrunas L, Oliver N. Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering. In: Proceedings of the fourth ACM conference on Recommender systems. ACM; 2010. p. 79–86.

[ref10] 10. Rendle S, Schmidt-Thieme L. Pairwise interaction tensor factorization for personalized tag recommendation. In: Proceedings of the third ACM international conference on Web search and data mining. ACM; 2010. p. 81–90.

[ref11] 11. Sael L, Jeon I, Kang U. Scalable tensor mining. Big Data Research. 2015;2(2):82–86.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref12] 12. Acar E, Kolda TG, Dunlavy DM. All-at-once optimization for coupled matrix and tensor factorizations. arXiv preprint arXiv:11053422. 2011.

[ref13] 13. Acar E, Rasmussen MA, Savorani F, Næs T, Bro R. Understanding data fusion within the framework of coupled matrix and tensor factorizations. Chemometrics and Intelligent Laboratory Systems. 2013;129:53–63.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref14] 14. Narita A, Hayashi K, Tomioka R, Kashima H. Tensor factorization using auxiliary information. Data Mining and Knowledge Discovery. 2012;25(2):298–324.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref15] 15. Ozcaglar C. Algorithmic data fusion methods for tuberculosis. Rensselaer Polytechnic Institute; 2012.

[ref16] 16. Tucker LR. Some mathematical notes on three-mode factor analysis. Psychometrika. 1966;31(3):279–311. pmid:5221127
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref17] 17. Oh J, Shin K, Papalexakis EE, Faloutsos C, Yu H. S-HOT: Scalable High-Order Tucker Decomposition. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM; 2017. p. 761–770.

[ref18] 18. Hitchcock FL. The expression of a tensor or a polyadic as a sum of products. Studies in Applied Mathematics. 1927;6(1-4):164–189.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref19] 19. Sorber L, Van Barel M, De Lathauwer L. Structured data fusion. IEEE Journal of Selected Topics in Signal Processing. 2015;9(4):586–600.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref20] 20. Kolda TG, Sun J. Scalable tensor decompositions for multi-aspect data mining. In: Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE; 2008. p. 363–372.

[ref21] 21. De Lathauwer L, De Moor B, Vandewalle J. On the best rank-1 and rank-(r 1, r 2,…, rn) approximation of higher-order tensors. SIAM journal on Matrix Analysis and Applications. 2000;21(4):1324–1342.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref22] 22. Ermiş B, Acar E, Cemgil AT. Link prediction in heterogeneous data via generalized coupled tensor factorization. Data Mining and Knowledge Discovery. 2015;29(1):203–236.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref23] 23. Yılmaz KY, Cemgil AT, Simsekli U. Generalised coupled tensor factorisation. In: Advances in neural information processing systems; 2011. p. 2151–2159.

[ref24] 24. Khan SA, Leppäaho E, Kaski S. Bayesian multi-tensor factorization. Machine Learning. 2016;105(2):233–253.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref25] 25. Jeon B, Jeon I, Sael L, Kang U. Scout: Scalable coupled matrix-tensor factorization-algorithm and discoveries. In: Data Engineering (ICDE), 2016 IEEE 32nd International Conference on. IEEE; 2016. p. 811–822.

[ref26] 26. Jeon I, Papalexakis EE, Kang U, Faloutsos C. Haten2: Billion-scale tensor decompositions. In: Data Engineering (ICDE), 2015 IEEE 31st International Conference on. IEEE; 2015. p. 1047–1058.

[ref27] 27. Papalexakis EE, Faloutsos C, Mitchell TM, Talukdar PP, Sidiropoulos ND, Murphy B. Turbo-smt: Accelerating coupled sparse matrix-tensor factorizations by 200x. In: Proceedings of the 2014 SIAM International Conference on Data Mining. SIAM; 2014. p. 118–126.

[ref28] 28. Beutel A, Talukdar PP, Kumar A, Faloutsos C, Papalexakis EE, Xing EP. Flexifact: Scalable flexible factorization of coupled tensors on hadoop. In: Proceedings of the 2014 SIAM International Conference on Data Mining. SIAM; 2014. p. 109–117.

[ref29] 29. Jeon I, Papalexakis EE, Faloutsos C, Sael L, Kang U. Mining billion-scale tensors: algorithms and discoveries. The VLDB Journal. 2016;25(4):519–544.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref30] 30. Shin K, Sael L, Kang U. Fully scalable methods for distributed tensor factorization. IEEE Transactions on Knowledge and Data Engineering. 2017;29(1):100–113.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref31] 31. Bradley JK, Kyrola A, Bickson D, Guestrin C. Parallel coordinate descent for l1-regularized loss minimization. arXiv preprint arXiv:11055379. 2011.

[ref32] 32. Recht B, Re C, Wright S, Niu F. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In: Advances in neural information processing systems; 2011. p. 693–701.

[ref33] 33. Bottou L. Stochastic gradient descent tricks. In: Neural networks: Tricks of the trade. Springer; 2012. p. 421–436.

[ref34] 34. Bader BW, Kolda TG. Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing. 2007;30(1):205–231.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref35] 35. Gao H, Tang J, Hu X, Liu H. Exploring temporal effects for location recommendation on location-based social networks. In: Proceedings of the 7th ACM conference on Recommender systems. ACM; 2013. p. 93–100.

[ref36] 36. Lin CJ. Projected gradient methods for nonnegative matrix factorization. Neural computation. 2007;19(10):2756–2779. pmid:17716011
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref37] 37. Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2001;63(2):411–423.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref38] 38. Vannieuwenhoven N, Meerbergen K, Vandebril R. Computing the gradient in optimization algorithms for the CP decomposition in constant memory through tensor blocking. SIAM Journal on Scientific Computing. 2015;37(3):C415–C438.
View Article
Google Scholar

[77] View Article

[78] Google Scholar

[ref39] 39. Phan AH, Tichavskỳ P, Cichocki A. Fast alternating LS algorithms for high order CANDECOMP/PARAFAC tensor factorizations. IEEE Transactions on Signal Processing. 2013;61(19):4834–4846.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

Figures

Abstract

Introduction

Preliminaries and related works

Tensor

Tucker decomposition

Coupled matrix-tensor factorization

Proposed method

Overview

Objective function & gradient

Multi-core parallelization

S3CMTF-base

S3CMTF-opt

Analysis

Experiments

Experimental settings

Data.

Measure.

Methods.

Hyperparameters.

Performance of S3CMTF

Accuracy.

Running time.

Scalability analysis

Data scalability.

Parallel scalability.

Discovery

Discovery

User-specific recommendation

Conclusion

References

S³CMTF-base

S³CMTF-opt

Performance of S³CMTF