Offline and online coupled tensor factorization with knowledge graph

SeungJoo Lee; Yong-Chan Park; U. Kang

doi:10.1371/journal.pone.0336100

Abstract

How can we accurately decompose a temporal irregular tensor along while incorporating a related knowledge graph tensor in both offline and online streaming settings? PARAFAC2 decomposition is widely applied to the analysis of irregular tensors consisting of matrices with varying row sizes. In both offline and online streaming scenarios, existing PARAFAC2 methods primarily focus on capturing dynamic features that evolve over time, since data irregularities often arise from temporal variations. However, these methods tend to overlook static features, such as knowledge-based information, which remain unchanged over time.

In this paper, we propose KG-CTF (Knowledge Graph-based Coupled Tensor Factorization) and OKG-CTF (Online Knowledge Graph-based Coupled Tensor Factorization), two coupled tensor factorization methods designed to effectively capture both dynamic and static features within an irregular tensor in offline and online streaming settings, respectively. To integrate knowledge graph tensors as static features, KG-CTF and OKG-CTF couple an irregular temporal tensor with a knowledge graph tensor by sharing a common axis. Additionally, both methods employ relational regularization to preserve the structural dependencies among the factor matrices of the knowledge graph tensor. To further enhance convergence speed, we utilize momentum-based update strategies for factor matrices. Through extensive experiments, we demonstrate that KG-CTF reduces error rates by up to 1.64× compared to existing PARAFAC2 methods. Furthermore, OKG-CTF achieves up to 5.7× faster running times compared to existing streaming approaches for each newly arriving tensor.

Citation: Lee S, Park Y-C, Kang U (2025) Offline and online coupled tensor factorization with knowledge graph. PLoS One 20(11): e0336100. https://doi.org/10.1371/journal.pone.0336100

Editor: George Vousden, Public Library of Science, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND

Received: June 20, 2025; Accepted: October 21, 2025; Published: November 12, 2025

Copyright: © 2025 Lee et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data files are available at https://github.com/snudatalab/KG-CTF.

Funding: This work was supported by the National Research Foundation of Korea (NRF) funded by MSIT (2022R1A2C3007921), Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT), [No. 2022-0-00641, XVoice: Multi-Modal Voice Meta Learning], [No. RS-2024-00509257, Global AI Frontier Lab], [No. RS-2021-II211343, Artificial Intelligence Graduate School Program (Seoul National University)], and [No. RS-2021-II212068, Artificial Intelligence Innovation Hub (Artificial Intelligence Institute, Seoul National University)]. The Institute of Engineering Research and the ICT at Seoul National University provided research facilities for this work. U. Kang is the corresponding author. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Given a temporal irregular tensor and a knowledge graph tensor, how can we accurately decompose these tensors to derive meaningful latent factors? Many real-world datasets are represented as irregular tensors such as stock data, traffic data, and music data. An irregular tensor consists of matrices with varying row sizes, while the column sizes remain fixed. For example, in stock data, each matrix corresponds to a specific stock, where the number of rows varies based on different time periods, but the columns represent common stock features such as opening price, closing price, and trading volume.

Large-scale factual information is effectively organized into a knowledge graph, which represents information through entities and the relationships between them. A knowledge graph is structured as a graph-based database that represents information in the form of triples (e_s, r, e_o), where the subject (head) entity e_s is connected to the object (tail) entity e_o through the relation r. A knowledge graph can be modeled as a third-order binary array in tensor form, where each entry represents a triplet, with a value of 1 indicating a known fact and 0 representing an unknown or missing fact. Various tensor decomposition methods for knowledge graphs [1–5] have been developed to capture hidden relational patterns and improve the accuracy of missing information prediction.

Many tensor decomposition methods [6–12] have been proposed to extract patterns from high-dimensional data. Recently, there has been much attention on PARAFAC2 decomposition [13–17] in order to analyze irregular tensors. These approaches [14,16] primarily employ the Alternating Least Squares (ALS) algorithm, incorporating various optimization techniques to enhance the efficiency of updates. However, many existing PARAFAC2 decomposition methods have significant limitations. First, they emphasize modeling temporal dynamic features (e.g., daily stock prices and trading volumes), which leads to neglecting static features (e.g., a company’s sector) that remain unchanged over time in the tensor. Both dynamic and static features are equally crucial for the accurate modeling of factor matrices. Second, incorporating static features into the existing PARAFAC2 decomposition method [14,18] often introduces a significant bias toward the temporal features, ultimately failing to explicitly model auxiliary information. TASTE [19] incorporates both dynamic and static features through coupled matrix and tensor factorization; however, it fails to represent static features as high-dimensional data, potentially resulting in information loss. Third, while existing PARAFAC2-based decomposition methods [15,20] introduce additional regularization terms to incorporate information from multiple sources, these terms often hinder convergence during the learning process. Therefore, the major challenges to be addressed are: 1) how to effectively integrate dynamic and static information in irregular tensors, 2) how to accurately capture patterns within static information, and 3) how to develop an update rule for stable convergence.

In this paper, we propose KG-CTF and OKG-CTF, two accurate coupled tensor factorization methods to integrate dynamic and static information in irregular tensors by leveraging knowledge graphs. KG-CTF runs in an offline setting, while OKG-CTF is designed for an online streaming setting. The main ideas of KG-CTF and OKG-CTF are as follows: 1) integrate a temporal irregular tensor with a knowledge graph tensor to model both dynamic and static features, 2) introduce an effective regularization that captures the relationships within the factor matrices of the knowledge graph, and then 3) ensure stable and fast convergence of the complex loss function through a momentum-based ALS update mechanism. Furthermore, in an online streaming setting, OKG-CTF efficiently deals with newly incoming tensors by avoiding direct computations on the entire tensor, instead leveraging only the new data and updated factor matrices. Experimental results show that KG-CTF and OKG-CTF outperform existing PARAFAC2 decomposition methods in missing value prediction tasks, achieving superior accuracy in both offline and online settings.

The contributions of this paper are summarized as follows:

Method. We propose KG-CTF and OKG-CTF, accurate tensor factorization methods that couple knowledge graphs with temporal irregular tensors in both offline and online streaming settings.
Theory. We provide a theoretical analysis of the convergence of KG-CTF and OKG-CTF, demonstrating that they effectively decrease the loss function.
Experiments. Extensive experiments demonstrate that KG-CTF and OKG-CTF reduce error rates by up to 1.64× and improve running times by up to 5.7× compared to existing PARAFAC2 methods. Additionally, we open-source a large-scale knowledge graph containing stock information from South Korea, the United States, Japan, and China.

The code and datasets are available at https://github.com/snudatalab/KG-CTF.

Preliminaries and problem definition

In this section, we explain preliminaries of irregular tensor notations, PARAFAC2 decomposition, and knowledge graph, followed by the problem definitions. Table 1 summarizes the symbols used throughout the paper.

Download:

Table 1. Symbol description.

https://doi.org/10.1371/journal.pone.0336100.t001

Irregular tensor

An irregular tensor is denoted as , consisting of a collection of slice matrices, where each slice represents the k-th matrix, and K is the total number of slices. Note that the row size I_k varies across slice matrices, while the column size J remains the same for all.

PARAFAC2 decomposition

PARAFAC2 decomposition [21] has been widely used for analyzing irregular tensors. Given each k-th slice matrix of an irregular tensor , PARAFAC2 decomposes each slice into three matrices: , , and V, as illustrated in Fig 1. This decomposition is formulated as , where is an matrix, is an diagonal matrix, and V is a matrix shared across all slices with R as the target rank.

Download:

Fig 1. An example of PARAFAC2 decomposition.

PARAFAC2 converts each slice matrix into , , and for , where is a diagonal matrix.

https://doi.org/10.1371/journal.pone.0336100.g001

The objective function of the PARAFAC2 decomposition is formulated as follows:

where represents the Frobenius norm of matrix X, defined as , with denoting the (i,j)-th element of matrix X.

To ensure the uniqueness of the solution, several studies [22,23] reformulate with , where is a column-orthogonal matrix, and is a shared matrix across all slices.

Knowledge graph

A knowledge graph is a collection of triplets that encode facts through entities and their relationships, denoted as . Specifically, each triplet (h,r,t) captures the semantic connection between a head entity h and a tail entity t through the relationship r. For example, the triplet represents the relationship between the book Harry Potter (head entity) and the author J.K. Rowling (tail entity) through the relation Written by in the context of book knowledge. Similarly, in the context of travel, indicates that Central Park (head entity) is related to New York City (tail entity) via the relation Located in. This structured representation of a knowledge graph facilitates the efficient integration of large-scale factual information, serving as additional contextual data. A knowledge graph is expressed as a three-dimensional tensor , which takes the form of either a regular or an irregular tensor. In this paper, we represent the knowledge graph as an irregular tensor , where each is an item-specific slice matrix that forms an (entity-relation) matrix, with rows corresponding to item-related entities and columns representing relations. A more detailed explanation of this representation is provided later in this paper.

Problem definition

We define the problems of coupled tensor factorization with knowledge graph in offline and online streaming settings as follows:

Problem 1 (Coupled Tensor Factorization with Knowledge Graph).

Given.

(1) A temporal irregular tensor with slice matrices
(2) A knowledge graph tensor with slice matrices

Find. Factor matrices , , , and for , and common factor matrices , , and with R as the target rank, where each k-th slice matrix of the temporal irregular tensor is approximated by , each k-th slice matrix of the knowledge graph tensor by , and each temporal factor matrix by .

Problem 2 (Online Coupled Tensor Factorization with Knowledge Graph). Given.

(1) New rows of existing slice matrices in a streaming temporal irregular tensor
(2) A knowledge graph tensor with slice matrices
(3) Pre-existing factor matrices , , , , V, H, and R of accumulated irregular tensors ,

Find. Factor matrices , , , and for , and common factor matrices , , and for the entire tensor and

where; denotes the vertical concatenation of matrices, old rows of k-th existing slice matrix are approximated by , and new rows of k-th existing slice matrix are approximated by . Each temporal factor matrix is approximated by , by , and each k-th slice matrix of the knowledge graph tensor by .

Proposed method for offline tensors: KG-CTF

We propose KG-CTF (Knowledge Graph-based Coupled Tensor Factorization), an accurate coupled tensor factorization method designed to effectively capture both dynamic and static features in irregular tensors.

Overview

KG-CTF tackles the following challenges to achieve accurate coupled tensor factorization.

C1. Incorporating the static features in data. While dynamic features effectively capture temporal variations, they often fail to represent time-invariant characteristics inherent in the data. How can we effectively integrate both dynamic and static information for irregular tensor data?
C2. Capturing the relational patterns of knowledge graphs. Knowledge graphs are structured in a triple format, where two entities are connected through a specific relationship. Effectively learning these relational dependencies is essential for leveraging static information. How can we accurately capture the relational structures of knowledge graphs?
C3. Accelerating ALS updates. Traditional ALS-based PARAFAC2 decomposition methods involve multiple regularization terms, often resulting in slow and unstable convergence. How can we accelerate ALS updates while optimizing complex loss functions?

To address these challenges, we propose the following key ideas (see Fig 2):

I1. Formulating a loss function for PARAFAC2-based coupled tensor factorization with a knowledge graph integrates temporal irregular tensors with knowledge graph tensors, effectively incorporating both dynamic and static information.
I2. Relational regularization enhances the learning of complex relational dependencies among factor matrices representing entities and relationships in the knowledge graph, thereby improving the representation of static information.
I3. Momentum-based ALS utilizes past update directions to accelerate ALS-based factor matrix updates, ensuring faster and more stable convergence.

Download:

Fig 2. Overview of KG-CTF.

(a) We introduce a method that couples a temporal tensor with a Knowledge Graph (KG) tensor by sharing the diagonal matrix S_k. This coupling allows the factor matrices to jointly learn both temporal and static features within an irregular tensor. (b) Relational regularization captures the relationship between the factor matrices derived from the KG tensor, improving their interpretability. (c) To ensure consistent entity representations across all slices, we initialize the entity factor matrix with the updated entity embeddings from . This approach maintains coherence in entity factors across slices, promoting stable learning. (d) We apply a momentum update strategy to ensure faster and stable convergence for factor matrices. At each iteration t, the momentum update refines the current ALS update step for by incorporating the update direction from , leading to a more efficient final update step for .

https://doi.org/10.1371/journal.pone.0336100.g002

Coupled tensor factorization

We formulate a loss function that effectively learns both static and dynamic information during the decomposition of irregular tensors. To capture dynamic information, we leverage the temporally irregular tensor . As described in Eq (1), each slice matrix is factorized using the PARAFAC2 decomposition with a target rank R, yielding three factor matrices: , , and . For example, in stock data analysis, represents the time factor matrix, corresponds to the stock-specific factor matrix, and V serves as the shared stock feature matrix. The loss function designed to capture this dynamic information is formulated as follows:

(1)

Next, we introduce the knowledge graph tensor as a source of static information and integrate it with the existing temporally irregular tensor . To effectively model the knowledge graph, we represent it as an irregular tensor for two key reasons. First, the number of entities associated with each item varies, inherently resulting in an irregular tensor structure. Second, the conventional entity-relation-entity representation of knowledge graphs leads to exceedingly large and sparse entity factor matrices due to the vast number of entities involved. If the entity factor matrix was constructed using the complete set of entities, the resulting tensor would be highly sparse, making an irregular tensor representation a more effective alternative. To construct this irregular tensor for knowledge graph, we define a slice matrix for each item k, where rows correspond to entities linked to k-th item and columns represent all relation types. Specifically, each matrix is derived from knowledge graph triplets (h,r,t), where the tail entity t denotes the k-th item, the head entity h is an entity associated with that item, and r is the relation type. The (i,j)-th entry of is set to 1 if the i-th entity is connected to item k via the j-th relation, and 0 otherwise. For example, in a stock-related knowledge graph, each item corresponds to a specific stock, with representing its entity–relation matrix for each stock. Triplets such as and indicate that the stock Apple is linked to entities California and IT via the relations located in and industry of, respectively. Accordingly, in for Apple, the row for California has a 1 in the located in column, and the row for IT has a 1 in the industry of column. Therefore, we define the knowledge graph tensor as , where each slice matrix (representing entity-relation interactions for an item) is factorized into three components: , , and . Here, is the entity factor matrix, is the item-specific factor matrix, and R is the relation factor matrix. In this notation, N_k denotes the number of entities linked to the k-th item, L represents the total number of relation types, and R is the target rank. For stock datasets, is decomposed into , , and R, representing entity-specific features related to each stock, stock-level latent factors, and the relation factor matrix shared across all stocks, respectively. Therefore, the loss function for the learning of static information is formulated as follows:

To ensure consistent entity representations across different slices and mitigate the inconsistencies that arise from independently decomposing each slice in PARAFAC2, we introduce a shared entity factor matrix approach. As shown in Fig 2(c), the embedding of an entity remains the same across all slices, preserving its representation throughout the decomposition process. For each slice k, the entity factor matrix is initialized using the following strategy: (1) If an entity has appeared in previous slices, its embedding is inherited from the corresponding entity factors in , (2) otherwise, its embedding is randomly initialized. This iterative approach ensures the stability of entity representations while also enhancing their interpretability across slices.

Finally, we couple the knowledge graph tensor with the temporal irregular tensor by sharing the common axis . Since is a diagonal matrix that encodes slice-specific information, sharing it allows for simultaneous learning of the dynamic characteristics from the temporal tensor and the static properties embedded in the knowledge graph tensor. Furthermore, unlike other factor matrices (e.g., , V, , or R) that are domain-specific, represents item-level latent factors. Therefore, sharing this static matrix , which is unaffected by temporal dynamics, ensures interpretability while avoiding conflicts between modalities. The overall loss function that couples these two tensors is formulated as follows:

Regularization

We add effective regularizations to the loss function.

Relational regularization.

We introduce a relational regularization term to ensure that the factor matrices obtained from the knowledge graph tensor effectively encode relational structures. Existing methods [24–26] struggle to fully capture the intricate relational dependencies present in knowledge graphs, largely because they overlook the directional nature of these relationships. To address this shortcoming, we incorporate a regularization term that explicitly models the triple structure of the knowledge graph, thereby improving the representation of directional relationships.

Based on the approach proposed in [27], we assume that each knowledge graph triplet (h,r,t) satisfies the condition , where h, r, and t correspond to the vector representations of the head entity, relation, and tail entity, respectively. Here, the norm represents the Euclidean distance, enforcing relational consistency within the knowledge graph. For instance, in a stock-related knowledge graph, if the head entity is iPhone, the relation is produced by, and the tail entity is Apple Inc., the relationship is captured by ensuring that remains close to zero.

To effectively learn these relational dependencies, we decompose each knowledge graph slice into three factor matrices. The entity factor matrix represents the head entities, the relation factor matrix captures relational properties, and the item factor matrix serves as the representation of the tail entities. To handle cases where an item is connected to multiple entities through the same relation, the regularization term is minimized using the least squares method.

For instance, if an item i₁ is linked to entities e₂ and e₃ through relation r₁, the objective is to minimize . Given that e₂ and e₃ are fixed, this can be equivalently rewritten as minimizing . Here, represents the average embedding of e₂ and e₃. Thus, to incorporate this principle, we define the relational regularization term as follows:

where is the diagonal degree matrix associated with , and the (l,l)-th element of corresponds to the number of entities connected to the l-th relation (i.e., the sum of the elements in the l-th column of ). Additionally, is a matrix filled with ones. represents the tail, meaning the embedding of the k-th item padded L times. The matrix represents the relation factor, while represents the head, computed as the average embedding of entities linked by the same relation. Here, represents the embeddings of the N_k entities connected to the k-th item. The product yields a matrix where the l-th row contains the sum of embeddings of entities linked to k-th item through relation l. By applying , we normalize this sum by the number of entities per relation, so that the l-th row of corresponds to the average embedding of entities connected to k-th item via relation l. To further illustrate how the relational regularization term operates, we provide an example in Fig 3. This figure illustrates a case where a head stock h is connected to three entities via four relation types, resulting in a slice matrix . The regularization encourages the tail embedding of stock h, repeated four times as , to be aligned with the sum of the relation embeddings and the average head entity embeddings associated with each relation, computed as . This alignment reduces inconsistency between the stock embedding and its relational context in the knowledge graph, resulting in more semantically coherent representations across domains.

Download:

Fig 3. Example of relational regularization for a head stock h.

Given four relation types and three entities connected to stock h, the slice matrix encodes binary connections between entities and relations. The relational regularization term aligns the tail embeddings of stock h with the sum of the relation embeddings and the average head embeddings computed over entities connected to stock h per relation.

https://doi.org/10.1371/journal.pone.0336100.g003

Uniqueness regularization.

To enhance the interpretability of the factor matrices while preserving accuracy, we employ uniqueness regularization by reformulating as , as established in previous studies [22,23], where is a column-orthogonal matrix. This approach mitigates the issue of arbitrary rotations in factor matrices, ensuring that remains unique, up to scaling and permutation. To enforce this constraint, we incorporate the following regularization terms into Eq (1):

where is a hyperparameter to control the effect of uniqueness for .

L2 regularization.

We apply L2 regularization to the factor matrices , , V, and R to mitigate overfitting and enhance numerical stability throughout the optimization process:

Loss function.

We define the following loss function , incorporating effective regularization terms:

(2)

Algorithm 1 KG-CTF with momentum-based ALS.

Input: and for

Output: , , , , for , , , and

Parameter: target rank R and momentum strength β

1: Initialize matrices for , and ,

and R

2: repeat

3: for do

4: Update using Eq (3)

5: , and

6: Update using Eq (4)

7: , and

8: Update using Eq (8)

9: Update using Eq (6)

10: end for

11: Update V using Eq (5)

12: , and

13: Update R using Eq (9)

14: Update H using Eq (7)

15: until the maximum iteration is reached, or the error ceases

to decrease

Momentum update procedure

We introduce a Momentum-based ALS algorithm that leverages a momentum mechanism to enhance the convergence speed of ALS when optimizing complex loss functions. By incorporating a fraction of the prior update direction into the newly computed factor matrix, the algorithm achieves more stable and faster factor updates. For the factor matrix , let denote its value at iteration t. After performing the standard ALS update to compute , the update is further refined by incorporating a momentum term:

where β is the momentum coefficient, typically ranging between 0 and 1. During the initial iterations (e.g., t < 5), β is set to zero to allow the algorithm to stabilize without the influence of previous updates. In subsequent iterations, the momentum coefficient β is assigned a positive value to integrate a portion of the previous update direction into the current update, thus accelerating convergence and enhancing optimization efficiency. By incorporating this momentum mechanism, the ALS algorithm improves its ability to process high-dimensional data, ultimately achieving faster and more stable convergence.

Algorithm 1 presents an overview of the proposed Momentum-ALS algorithm. The algorithm starts with the random initialization of the factor matrices, followed by iterative ALS updates. For each slice k, the factor matrices are initially updated using the standard ALS procedure, and a momentum term is subsequently applied to refine the final update.

We use an alternating optimization-based update procedure, where each factor matrix is updated independently while keeping the other factor matrices fixed.

Updating . We update by setting and rearranging the terms with the following lemma:

Lemma 1. When fixing all the factor matrices except for , the following update for minimizes (Eq (2)):

(3)

□

Proof: See Appendix. □

Updating . To update , we first transform for k = 1...K into whose k-th row contains the diagonal elements of (i.e., ). Then, we update by setting and rearranging the term based on the loss function (Eq (2)). We update W row by row with the following lemma:

Lemma 2. When fixing all the factor matrices except for , the following update for the k-th row of the factor matrix which corresponds to the diagonal elements of minimizes (Eq (2)):

(4)

where is a vectorization of a matrix. and * denote the Khatri-Rao product and the element-wise multiplication, respectively. is a vector filled with ones. □

Proof: See Appendix.

Updating . We update by setting and rearranging the terms with the following lemma:

Lemma 3. When fixing all the factor matrices except for , the following update for minimizes (Eq (2)):

(5)

□

Proof: See Appendix.

Updating and . We update and , which are factor matrices for the uniqueness regularization, with the following lemma:

Lemma 4. When fixing all the factor matrices except for , the following update for the matrix minimizes the loss function (Eq (2)):

(6)

where is a column-orthogonal matrix (i.e., ), and and are left and right singular vector matrices of , respectively. □

Proof: See Appendix.

Lemma 5. When fixing all the factor matrices except for , the following update for the factor matrix minimizes the loss function (Eq (2)):

(7)

□

Proof: See Appendix.

Updating . We update by setting and rearranging the terms with the following lemma:

Lemma 6. Solving the Sylvester equation (Eq (8)) with respect to minimizes the loss function (Eq (2)) when all other factor matrices are fixed:

(8)

Note that the Sylvester equation [28] has the form , where we set to solve for . □

Proof: See Appendix. □

Updating . We update by setting and rearranging the terms with the following lemma:

Lemma 7. When fixing all the factor matrices except for , the following update for the factor matrix minimizes the loss function (Eq (2)):

(9)

□

Proof: See Appendix. □

We iteratively update the factor matrices using the update procedure in Lemmas 1 to 7. Each update is the exact closed-form ALS solution of a quadratic subproblem with all other factors fixed, which ensures monotonic decrease of the objective and convergence to a stationary local minimum.

Time complexity of KG-CTF

We provide the time complexity of KG-CTF.

Theorem 1. The time complexity of KG-CTF is given by

□

Proof: See Appendix.

According to Theorem 1, KG-CTF runs in time that scales linearly with the total number of tensor entries, ensuring practicality even for very large datasets. It is also noteworthy that, in most practical settings, the KG information tables are orders of magnitude smaller than the primary tensor (). Consequently, the cubic term + is asymptotically dominated by the linear term in I_k, and its contribution to the overall runtime is negligible.

Proposed method for online tensors: OKG-CTF

We propose OKG-CTF (Online Knowledge Graph-based Coupled Tensor Factorization), an accurate coupled tensor factorization method designed to handle both dynamic and static features in streaming irregular tensor data.

Overview

OKG-CTF addresses the following challenges for accurate coupled tensor factorization in an online streaming setting.

C1. Incorporating static features into streaming tensor data. Existing methods in online streaming settings focus only on dynamic characteristics, ignoring the time-invariant features inherent in the data. How can we efficiently employ both dynamic and static features for streaming irregular tensor data?
C2. Improving efficiency in an online streaming setting. Previous coupled tensor factorization methods fail to work efficiently for streaming data due to repeated computations involving old data. How can we minimize the computational cost and space cost as new tensor data arrive over time?

We address the above challenges with the following ideas (see Fig 4):

I1. Formulating a loss function for PARAFAC2-based streaming coupled tensor factorization using a knowledge graph enables the effective integration of streaming temporal irregular tensors with knowledge graph tensors, capturing both dynamic and static information in an online setting.
I2. Avoiding explicit computations on old data enables OKG-CTF to efficiently update factor matrices by dividing the terms related to old and new data.

Download:

Fig 4. Overview of OKG-CTF.

We formulate a method to couple a streaming temporal tensor with a knowledge graph tensor by sharing the diagonal matrix S_k. This enables factor matrices to jointly learn temporal and static features within an irregular tensor in an online setting.

https://doi.org/10.1371/journal.pone.0336100.g004

Loss function for a streaming setting

We propose a loss function designed to efficiently capture both dynamic and static features from a streaming temporal irregular tensor . The loss function in Eq (2), when applied in streaming settings, suffers from inefficiencies due to redundant computations of the accumulated tensor whenever new data are given. This highlights the need for a more effective strategy tailored for a streaming setting. Therefore, we reformulate the loss function by separating the terms related to old data from those related to new data. When new data are added to each slice matrix of the temporal irregular tensor, the loss function is defined as follows:

(10)

where () is the hyperparameter controlling the effect of forgetting factor, which determines the relative importance of newly arrived data compared to older data. Specifically, we divide into and , into and , and into and . This approach ensures efficient computation by avoiding redundant operations on old data and focusing on learning from newly arrived data. The forgetting factor plays a crucial role in controlling the balance between retaining historical information and adapting to recent changes. A smaller emphasizes recent data, enabling rapid adaptation to new patterns but increasing sensitivity to short-term noise. Conversely, a larger gives greater weight to historical information, enhancing stability while slowing adaptation to distributional shifts. Thus, defines the trade-off between adaptivity and stability, highlighting the importance of selecting an appropriate balanced value in practice.

Streaming update procedure

We propose an efficient ALS update rule optimized for streaming settings, where one factor matrix is updated independently while fixing the other factor matrices. This rule eliminates the inefficiency caused by repeatedly recalculating terms involving old data whenever new data arrive, focusing on the computation of only new data.

When new data are added, the loss function is divided into two parts: one for the old data and another for the new data. During the update process, terms related to old data are loaded without any recomputation, while only the terms associated with the new data are calculated. In the subsequent step, as additional new data arrive, the previously loaded old factor matrix is combined with the computed factor matrix to form the updated old factor matrix, which is then loaded in the next step. This method effectively avoids unnecessary computational growth as data accumulate, ensuring consistent efficiency over time.

Algorithm 2 provides an overview of the proposed Momentum-ALS algorithm for OKG-CTF in a streaming setting. It starts by randomly initializing new factor matrices, followed by iterative ALS updates. For each slice k, all factor matrices are updated using standard ALS, with a momentum term applied to refine the final updates. The updated new factor matrices are then merged with the existing ones to update the old factor matrices for the next streaming data.

Algorithm 2 OKG-CTF with momentum-based ALS.

Input: a new incoming , , pre-existing factor

matrices , , , , V, H,

and R

Output: updated factor matrices , ,

, , V, H, and R

Parameter: target rank R, momentum strength β

1: Initialize matrices for

2: repeat

3: for do

4: Update using Eq (11)

5: , and

6: Update using Eq (12)

7: , and

8: Update using Eq (8)

9: Update using Eq (14)

10: end for

11: Update V using Eq (13)

12: , and

13: Update R using Eq (9)

14: Update H using Eq (15)

15: until the maximum iteration is reached, or the error ceases

to decrease

16: Update

17: Update

Updating . We update by setting and then rearranging the terms with the following lemma:

Lemma 8. When fixing all the factor matrices except for , the following update for minimizes (Eq (10)):

(11)

□

Proof: See Appendix.

Updating . To update , we first transform for k = 1...K into whose k-th row contains the diagonal elements of (i.e., ). Then, we update by setting and then rearrange the term based on the loss function (Eq (10)). We update W row by row with the following lemma:

Lemma 9. When fixing all the factor matrices except for , the following update for the k-th row of the factor matrix which corresponds to the diagonal elements of minimizes (Eq (10)):

(12)

where is a vectorization of a matrix. and * denote the Khatri-Rao product and the element-wise multiplication, respectively. is a vector filled with ones. □

Proof: See Appendix.

Updating . We update by setting and then rearranging the terms with the following lemma:

Lemma 10. When fixing all the factor matrices except for , the following update for minimizes (Eq (10)):

(13)

□

Proof: See Appendix.

Updating and . We update and , which are factor matrices for the unique regularization, with the following lemma:

Lemma 11. When fixing all the factor matrices except for , the following update for the matrix minimizes the loss function (Eq (10)) by solving Orthogonal Procrustes problem [29] due to column-orthogonality:

(14)

where is a column-orthogonal matrix (i.e., ), and and are left and right singular vector matrices of , respectively. □

Proof: See Appendix.

Lemma 12. When fixing all the factor matrices except for , the following update for the factor matrix minimizes the loss function (Eq (10)):

(15)

□

Proof: See Appendix.

Updating and . We update by setting and , and then rearranging the terms with the following lemma:

Lemma 13. The update rule for to minimize (Eq (10)) is identical to defined in Lemma 6 for (Eq (2)). □

Lemma 14. The update rule for to minimize (Eq (10)) is identical to defined in Lemma 7 for (Eq (2)). □

We alternatively update factor matrices with our update procedure in Lemmas 8 to 14. Each update is the exact closed-form ALS solution of a quadratic subproblem with all other factors fixed, which ensures monotonic decrease of the objective and convergence to a stationary local minimum.

Time complexity of OKG-CTF

We provide the time complexity of OKG-CTF for updating the factor matrices.

Theorem 2. The time complexity of OKG-CTF is given by

□

Proof: See Appendix.

Theorem 2 establishes that OKG-CTF’s time cost is linear in the time length I_k,new of the newly arrived data, with as the tensor grows. This leads to significantly reduced computational cost, especially in long streaming sequences.

Experiments

We conduct experiments to explore the following research questions:

Q1. Offline performance. How accurately does KG-CTF predict missing values in real-world irregular tensors?
Q2. Online streaming performance. How efficiently and accurately does OKG-CTF update factor matrices when new data are added to existing slice matrices?
Q3. Ablation study. How do the relational regularization and the accelerated update method contribute to the overall performance of KG-CTF and OKG-CTF?
Q4. Hyperparameter sensitivity. How much do the hyperparameters affect the performance of KG-CTF and OKG-CTF?

Experimental settings

We provide a detailed overview of our experimental setup, including datasets, baselines, task descriptions, evaluation metrics, and hyperparameters.

Datasets. We conduct experiments on six real-world datasets, as summarized in Tables 2 and 3. Each dataset comprises a temporal irregular tensor paired with a corresponding knowledge graph tensor. These datasets cover stock markets from the United States (S&P500, NYSE, NASDAQ), South Korea, China, and Japan. For the stock datasets, each temporal irregular tensor consists of a collection of matrices, where each matrix corresponds to an individual stock. Each slice matrix follows a (date, feature) format, with features categorized into two groups: (1) six fundamental features, including opening price, closing price, highest price, lowest price, adjusted closing price, and trading volume, and (2) 83 technical indicators derived from these fundamental features using the Technical Analysis library (https://technical-analysis-library-in-python.readthedocs.io/en/latest/). For the knowledge graph datasets, we construct triple-based knowledge graphs covering all stocks in the six datasets using the ICKG [30] model. The StockKG dataset contains a total of 89,822 entities, including 14,019 stock entities and 15 distinct types of relations. These datasets are represented as irregular tensors in the format (entity, relation, stock), where each slice matrix is structured as (entity, relation). Here, entities denote the set of associated entities for each stock, and the dataset comprises 15 distinct relation types.

Download:

Table 2. Description of real-world irregular tensor data.

# of nnz indicates the number of nonzeros.

https://doi.org/10.1371/journal.pone.0336100.t002

Download:

Table 3. Description of real-world irregular tensor data in a streaming setting.

Update cycle denotes the number of time steps between each update.

https://doi.org/10.1371/journal.pone.0336100.t003

We preprocess the real-world datasets by applying normalization methods based on their specific characteristics. For the six stock datasets, we perform z-normalization on each j-th column of the slice matrix X. In contrast, the knowledge graph tensor, consisting of binary values (0 or 1), remains unchanged without any additional normalization.

Competitors. We compare KG-CTF against existing PARAFAC2 decomposition methods designed for irregular tensors:

PARAFAC2-ALS: A standard PARAFAC2 decomposition approach utilizing alternating least squares. This method iteratively updates the target factor matrix while keeping all other factor matrices fixed.
ATOM [15]: A PARAFAC2-based method tailored for handling missing values in temporal irregular tensors. ATOM introduces temporal smoothness regularization, ensuring gradual changes in temporal factor matrices over time.
CTF-ALS: A Coupled Tensor Factorization (CTF) approach that employs ALS to jointly decompose multiple related tensors sharing a common axis. This method is designed to capture shared patterns across coupled tensors.
TASTE [19]: A coupled matrix and tensor factorization framework that integrates both temporal and static features. It combines a non-negative PARAFAC2 model with non-negative matrix factorization to improve joint modeling capabilities.

We compare OKG-CTF with existing streaming PARAFAC2 decomposition methods in an online streaming setting:

PARAFAC2-ALS: A baseline PARAFAC2 decomposition method to iteratively update the target factor matrix using alternating least squares.
SPADE [31]: An efficient PARAFAC2 decomposition method designed to update new slice matrices.
DASH [32]: An advanced PARAFAC2 decomposition method capable of efficiently handling both new slice matrices and new rows in existing slice matrices.
CTF-ALS: A Coupled Tensor Factorization (CTF) method using ALS to jointly factorize multiple related tensors sharing a common axis, learning shared patterns across coupled tensors.

Task. To evaluate the performance of KG-CTF and OKG-CTF, we perform a missing value prediction task on temporal irregular tensor data. We randomly divide the data into training and test entries with the following ratios: , and .

Metric. We evaluate the model performance using the Root Mean Squared Error (RMSE) computed as , where Ω represents the set of test entries in the input tensor X, and denotes the reconstructed tensor obtained from the learned factor matrices. A lower RMSE score indicates better tensor decomposition accuracy.

Hyperparameters. The hyperparameters for KG-CTF and OKG-CTF include the target rank R and the regularization strengths , and , along with the momentum coefficient β. We set the hyperparameters as follows: the target rank R to 5 for the China, Japan, and Korea datasets, and to 12 for the S&P500, NYSE, and NASDAQ datasets; the regularization strengths , , , and to 10, 1, 0.01, and 0.1, respectively; and the momentum coefficient β to 0.5.

Offline performance (Q1)

We compare the performance of KG-CTF with baseline models for the task of missing value prediction at test ratios ranging from 10% to 30%. According to Table 4, KG-CTF consistently achieves lower error rates across most datasets. For instance, in the Japan Stock dataset, KG-CTF shows an error rate approximately 1.64× lower than that of PARAFAC2-ALS at 10% test ratio and about 1.52× lower at 30% test ratio. Traditional methods reveal their limitations as they focus primarily on learning dynamic features while neglecting static features. In the NASDAQ, S&P500, and Korea Stock datasets, however, the performance gaps are less pronounced since KG coverage is skewed—a few stocks are linked to many entities while most have very few—so KG signals enhance a limited subset of stocks, reducing the overall benefit of coupling. In contrast, NYSE, China, and Japan Stock datasets exhibit more balanced entity coverage, allowing KG signals to be utilized more uniformly across stocks and resulting in more consistent performance improvements. Additionally, as the test ratio increases from 10% to 30%, the reliance of competing models on dynamic features causes their performance to degrade more significantly, further widening the gap with KG-CTF.

Download:

Table 4. Performance for missing value prediction.

Note that bold and underlined fonts indicate the lowest and second-lowest errors, respectively. KG-CTF outperforms the competitors across all datasets and missing value ratios.

https://doi.org/10.1371/journal.pone.0336100.t004

Online performance for newly arrived data (Q2)

We evaluate the performance of OKG-CTF with baselines in terms of local errors and efficiency.

Local Error. We evaluate OKG-CTF’s performance by analyzing local errors, measuring the mean and standard deviation across all updates. Table 5 shows that OKG-CTF consistently achieves much lower local errors than competing models. For example, in the S&P500 dataset, OKG-CTF performs up to 1.17× better than existing models. This is because OKG-CTF learns both dynamic and static features by incorporating auxiliary information with each new data arrival, resulting in more accurate factor matrices.

Download:

Table 5. Performance for missing value prediction in an online setting.

Note that bold and underlined fonts indicate the lowest and second-lowest errors. OKG-CTF outperforms the competitors across all datasets and missing value ratios.

https://doi.org/10.1371/journal.pone.0336100.t005

Efficiency. We evaluate the performance of OKG-CTF compared to competing models in a streaming setting. In Fig 5, the running time for the N-th update represents the update time for the N-th newly arrived data, rather than cumulative time. Fig 5 presents the results for all datasets in a streaming setting where new rows of existing slice matrices arrive. OKG-CTF demonstrates superior performance, achieving up to 5.7 × faster speeds compared to traditional static PARAFAC2 decomposition methods and streaming PARAFAC2 decomposition methods. Notably, in the NYSE and S&P500 datasets, Fig 5(b) and Fig 5(c) show that OKG-CTF achieves faster updates compared to competing models. This shows that OKG-CTF maintains competitive speed performance, while combining static and dynamic features to learn richer information.

Download:

Fig 5. Running time of OKG-CTF and competitors for a new tensor on real-world datasets.

https://doi.org/10.1371/journal.pone.0336100.g005

Scalability for newly arrived data size. We evaluate the scalability of OKG-CTF with respect to the size of a new incoming tensor by measuring its running time. We measure execution times across five update cycles [20, 40, 60, 80, 100]. The length of new rows added to existing slice matrices increases linearly with the update cycle. As shown in Fig 6, the results are presented using box plots: the orange line marks the median, the box covers the upper (Q3) and lower (Q1) quartiles, and the horizontal lines represent and . Fig 6 clearly shows that, in the setting where only new rows are added to existing slice matrices, the execution time of OKG-CTF scales linearly with the update cycle.

Download:

Fig 6. Scalability of OKG-CTF with respect to five update cycles: [20,40,60,80,100].

The size of new rows of existing slice matrices is linearly proportional to an update cycle.

https://doi.org/10.1371/journal.pone.0336100.g006

Ablation study (Q3)

We conduct an ablation study to evaluate how relationship learning from the knowledge graph and accelerated learning via momentum updates affect prediction accuracy for both KG-CTF and OKG-CTF. Table 6 provides the global error results for KG-CTF under 20% missing-value test ratio, while Table 7 reports the local error results for OKG-CTF under 30% test ratio. For each method, we consider three reduced variants: (i) KG-CTF-R and OKG-CTF-R, which omit the relational regularization derived from the knowledge graph; (ii) KG-CTF-M and OKG-CTF-M, which exclude the momentum updates; (iii) KG-CTF-RM and OKG-CTF-RM, which remove both components and thus resemble traditional CTF-ALS. As shown in Tables 6 and 7, KG-CTF and OKG-CTF consistently show the lowest prediction errors, confirming that incorporating both relational information and momentum updates leads to superior performance. Even though KG-CTF-R and OKG-CTF-R lose some predictive accuracy by excluding knowledge graph relationships, they still outperform their respective -RM variants. Likewise, KG-CTF-M and OKG-CTF-M underscore the benefit of momentum updates, since their performance degrades relative to the full models. These results collectively demonstrate that our models effectively exploit both static relational structure and dynamic temporal information, allowing them to learn more accurate factor matrices for irregular time-evolving tensors.

Download:

Table 6. Ablation study of KG-CTF.

-R and -M indicate the elimination of relational regularization and no momentum update, respectively. Bold and underlined fonts indicate the lowest and second-lowest errors. Note that both the relational regularization and momentum update contribute to improving the prediction accuracy.

https://doi.org/10.1371/journal.pone.0336100.t006

Download:

Table 7. Ablation study of OKG-CTF. -R and -M indicate the elimination of relational regularization and no momentum update, respectively. Bold and underlined fonts indicate the lowest and second-lowest errors. Note that both the relational regularization and momentum update contribute to improving the prediction accuracy.

https://doi.org/10.1371/journal.pone.0336100.t007

Hyperparameter sensitivity (Q4)

We analyze the hyperparameter sensitivity of KG-CTF and OKG-CTF by measuring prediction errors across various key hyperparameters, including target rank, relational regularization, uniqueness, L2 regularization, forgetting factor, and momentum coefficient. Our experiments test four target rank values R of [3, 6, 9, 12], five relational regularization hyperparameters of [0.01, 0.1, 1, 10, 100], five uniqueness hyperparameters of [0.01, 0.1, 1, 10, 100], five L2 regularization hyperparameters of [0.01, 0.1, 1, 10, 100], five forgetting factor hyperparameters of [0.001, 0.01, 0.1, 1, 10], and five momentum coefficients β of [0.0, 0.3, 0.5, 0.7, 0.9]. We use the S&P500 (SP) and NASDAQ (ND) datasets to illustrate KG-CTF’s performance (see Fig 7), and the China Stock (CHN) and Korea Stock (KOR) datasets to examine OKG-CTF (see Fig 8).

Download:

Fig 7. Hyperparameter sensitivity of KG-CTF in terms of Root Mean Squared error on S&P500 and NASDAQ datasets.

Note that SP stands for S&P500 and ND stands for NASDAQ.

https://doi.org/10.1371/journal.pone.0336100.g007

Download:

Fig 8. Hyperparameter sensitivity of OKG-CTF in terms of Root Mean Squared error on China Stock and Korea Stock datasets.

Note that CHN stands for China Stock and KOR stands for Korea Stock.

https://doi.org/10.1371/journal.pone.0336100.g008

Rank. We evaluate KG-CTF’s performance on S&P500 and NASDAQ datasets across four target rank values (3, 6, 9, 12). As shown in Fig 7(a) and 7(e), the prediction errors steadily decrease as the target rank increases, reaching their lowest at the highest rank R = 12. These results show that larger rank values consistently yield lower prediction errors, confirming that a higher rank helps capture more complex patterns in the data and improves accuracy.

Relational pattern. Both KG-CTF and OKG-CTF incorporate relational regularization derived from a knowledge graph. For KG-CTF, Fig 7(b) and 7(f) illustrate the influence of learning relationships from the knowledge graph for the NASDAQ and S&P500 datasets. Specifically, for the NASDAQ dataset, value of 1 achieves lower errors compared to smaller values such as 0.1 or 0.01. In contrast, the S&P500 dataset performs better with smaller values. This difference arises because the NASDAQ dataset contains a larger number of entities, allowing the model to effectively leverage richer relational patterns. However, excessively large values of can lead to overfitting, particularly on datasets with fewer entities like S&P500, causing the model to learn noise rather than meaningful relational structures. A similar pattern appears for OKG-CTF in Fig 8(b) and 8(f). On both the China and Korea Stock datasets, moderate value of (around 1) produces optimal performance, confirming the importance of appropriately tuning relational regularization to avoid overfitting.

Uniqueness. We assess the impact of uniqueness regularization on the performance of KG-CTF and OKG-CTF. In both KG-CTF (Fig 7(c), 7(g)) and OKG-CTF (Fig 8(c), 8(g)), we observe that an excessively large degrades performance. Although moderate helps improve the interpretability of the factor matrices, excessive regularization negatively affects the overall performance.

L2. We examine how L2 regularization affects prediction performance for both KG-CTF and OKG-CTF. As shown in Fig 7(d) and 7(h) (for KG-CTF), the error rates remain largely unchanged across different L2 regularization hyperparameter values. This stability arises from the normalized input tensors, which align the scales of the factor matrices, as well as from the abundant data in the KG-CTF datasets, which support accurate factor updates even without strong regularization. In contrast, Fig 8(d), 8(h) (for OKG-CTF) reveal that larger values clearly increase prediction errors. Because OKG-CTF operates on newly arrived smaller datasets, excessive L2 regularization overly constrains the model, shrinking parameter values prematurely and preventing it from capturing subtle, local patterns. Thus, while the large datasets in KG-CTF reduce sensitivity to L2 regularization, careful tuning of is necessary for OKG-CTF to balance between preventing overfitting and preserving essential information in smaller, more localized datasets.

Forgetting factor. OKG-CTF explicitly includes a forgetting factor for time-evolving tensors. As illustrated in Fig 8(a) and 8(e) for the China and Korea stock datasets, excessively large values of negatively impact the prediction accuracy by overly emphasizing outdated data, while very small values discard useful historical information too quickly. In Korea stock dataset, a moderate value () achieves the best performance, balancing the short-term volatility and long-term trends inherent in stock data. Thus, selecting an intermediate value for is crucial, enabling the model to dynamically adapt to recent data trends while preserving sufficient historical context for stable and accurate predictions.

Momentum coefficient. We quantitatively analyze the impact of momentum-based updates on convergence speed in both KG-CTF and OKG-CTF. We measure convergence as the first iteration (epoch) at which the RMSE drops below a dataset-specific target threshold, with lower values indicating faster convergence. As shown in Fig 9, increasing the momentum coefficient β generally leads to faster convergence across the NASDAQ, NYSE, and S&P500 datasets. For example, in NASDAQ, the model with reaches the target RMSE within just 9 iterations, whereas the baseline without momentum () requires 49 iterations—indicating a 40-iteration improvement. This result highlights that the momentum mechanism becomes particularly effective in high-dimensional, large-scale datasets. However, in the S&P500 dataset, excessively large values (e.g., ) slow down convergence or cause instability, indicating that overly strong momentum may harm training stability. Fig 10 shows similar trends in OKG-CTF. Across all datasets, consistently achieves the fastest convergence; for instance, NYSE reaches the target RMSE in only 5 iterations under this setting. By contrast, higher momentum values ( or 0.9) result in slower convergence and degraded final performance. Overall, these results confirm that moderate momentum effectively accelerates convergence, whereas excessively large values can cause oscillations and suboptimal learning.

Download:

Fig 9. Convergence speed of KG-CTF across different momentum coefficients β, measured by the number of iterations (epochs).

Note that ND and SP denote NASDAQ and S&P500, respectively.

https://doi.org/10.1371/journal.pone.0336100.g009

Download:

Fig 10. Convergence speed of OKG-CTF across different momentum coefficients β, measured by the number of iterations (epochs).

Note that ND and SP denote NASDAQ and S&P500, respectively.

https://doi.org/10.1371/journal.pone.0336100.g010

Related work

We describe related works for existing tensor decomposition methods for irregular tensors, streaming tensors, and coupled tensor factorization approaches.

Irregular tensor decomposition

Many studies [18,33] have introduced PARAFAC2 decomposition methods for analyzing irregular tensors. Unlike the traditional PARAFAC2-based ALS algorithm [22], RD-ALS [16] and DPar2 [14] apply preprocessing steps prior to factor matrix updates. ATOM [15] further improves the handling of temporal irregular sparse tensors, particularly those with missing values. However, they primarily focus on capturing dynamic features, while static features, which are equally crucial factors, are largely neglected. Integrating static features into the PARAFAC2 model often results in significant bias toward learning dynamic features. This limitation extends to other PARAFAC2-based approaches that do not explicitly incorporate side information. To address this, KG-CTF couples the temporal irregular tensor with the knowledge graph tensor by sharing a common axis, enabling the joint modeling of dynamic and static features while preserving the inherent relational structures among factor matrices.

Streaming tensor decomposition

In online streaming settings, tensor decomposition methods [6,34,35] have been developed to efficiently update factor matrices as new data arrive. DAO-CP [9] adapts CP decomposition by detecting changes in tensor streams and selectively reusing or recomputing factor matrices. STF [7] incorporates attention-based temporal regularization to leverage past and future information for improved online learning. For irregular tensors, SPADE [31] updates factor matrices when new slices are added, while DASH [32] extends this by handling both new rows within existing slices and new slice matrices, enabling more flexible updates. However, existing methods focus on dynamic features, neglecting static features, which are equally important. A key limitation of online PARAFAC2 models is that they cannot incorporate static information in an online learning framework, as it remains unchanged over time and does not fit within their update mechanisms. While spectral regularization has been explored in neural networks and factorization models [36–38] to incorporate auxiliary structures, its use in online tensor factorization remains limited. Meanwhile, OKG-CTF effectively integrates dynamic and static features in online irregular tensor decomposition, enabling richer information learning while maintaining efficient factor matrix updates.

Coupled tensor factorization

Previous studies [39–42] have proposed Coupled Matrix-Tensor Factorization (CMTF) methods to jointly decompose tensors and matrices. HaTen2 [43] and SCouT [44] extend CMTF by implementing distributed CP-based factorization within the MAPREDUCE framework. TASTE [19] introduces a coupled irregular tensor decomposition approach for EHR data, while C3APTION [45] builds on TASTE by integrating a (non-negative) PARAFAC2 model with a (non-negative) CP model. These methods leverage complementary information to achieve data fusion, enabling the extraction of richer information through tensor decomposition. However, relying on simple matrices to represent large-scale auxiliary information has limitations, particularly in capturing intricate relationships. In contrast, KG-CTF and OKG-CTF introduce novel data fusion methods that represent knowledge graph data as irregular tensors, effectively capturing relational structures within the factor matrices. KG-CTF [39] is our preliminary work on coupled tensor factorization, designed to jointly integrate temporal irregular tensors with knowledge graph tensors. In this work, we extend KG-CTF further to effectively handle streaming irregular tensors.

Conclusion

In this paper, we propose KG-CTF and OKG-CTF, accurate coupled tensor factorization methods that extend the traditional PARAFAC2 approaches by incorporating a knowledge graph tensor. This integration enables the modeling of both dynamic and static characteristics within temporal irregular tensors in offline and online settings, respectively. Existing methods predominantly emphasize capturing dynamic temporal features while overlooking the intricate multi-dimensional relationships embedded in the data. To address these limitations, KG-CTF and OKG-CTF incorporate relational regularization to effectively capture relational structures within the knowledge graph. Additionally, a momentum-based update strategy is employed to accelerate the factor matrix updates. Our proposed methods achieve superior performance over existing PARAFAC2-based approaches by jointly learning dynamic and static features, leading to improved accuracy with minimal computational overhead. Extensive experimental evaluations confirm that both methods significantly reduce error rates, establishing them as robust solutions for complex temporal irregular tensor analysis. Future work includes extending KG-CTF and OKG-CTF to a broader range of datasets and further optimizing them to enhance scalability.

Supporting information

S1 Text. Detailed proofs for Lemmas 1 to 12 and Theorem 1 and 2.

https://doi.org/10.1371/journal.pone.0336100.s001

(PDF)

References

1. Shao P, Zhang D, Yang G, Tao J, Che F, Liu T. Tucker decomposition-based temporal knowledge graph completion. Knowledge-Based Systems. 2022;238:107841.
- View Article
- Google Scholar
2. Balažević I, Allen C, Hospedales T. TuckER: tensor factorization for knowledge graph completion. In: EMNLP. 2019.
3. Zhang J, Lu CT, Cao B, Chang Y, Philip SY. Connecting emerging relationships from news via tensor factorization. In: Big Data. 2017.
4. Padia A, Kalpakis K, Finin T. Inferring relations in knowledge graphs with tensor decompositions. In: Big Data. 2016.
5. Nickel M, Tresp V, Kriegel HP. A three-way model for collective learning on multi-relational data. In: ICML. 2011.
6. Jang J-G, Kang U. Static and streaming tucker decomposition for dense tensors. ACM Trans Knowl Discov Data. 2023;17(5):1–34.
- View Article
- Google Scholar
7. Ahn D, Kim S, Kang U. Accurate online tensor factorization for temporal tensor streams with missing values. In: CIKM. 2021.
8. Jang JG, Kang U. Fast and memory-efficient tucker decomposition for answering diverse time range queries. In: KDD. 2021.
9. Son S, Park Y-C, Cho M, Kang U. DAO-CP: data-adaptive online CP decomposition for tensor stream. PLoS One. 2022;17(4):e0267091. pmid:35421202
- View Article
- PubMed/NCBI
- Google Scholar
10. Oh S, Park N, Lee S, Kang U. Scalable tucker factorization for sparse tensors-algorithms and discoveries. In: ICDE. 2018.
11. Jang JG, Kang U. D-tucker: Fast and memory-efficient tucker decomposition for dense tensors. In: ICDE. 2020.
12. Ahn D, Jang JG, Kang U. Time-aware tensor decomposition for sparse tensors. Mach Learn. 2022;111(4):1409–30.
- View Article
- Google Scholar
13. Jang JG, Park Y c, Kang U. Fast and accurate PARAFAC2 decomposition for time range queries on irregular tensors. In: CIKM. 2024.
14. Jang JG, Kang U. Dpar2: Fast and scalable parafac2 decomposition for irregular dense tensors. In: ICDE. 2022.
15. Jang JG, Lee J, Park J, Kang U. Accurate PARAFAC2 decomposition for temporal irregular tensors with missing values. In: Big Data. 2022.
16. Cheng Y, Haardt M. Efficient computation of the PARAFAC2 decomposition. In: ACSSC. 2019.
17. Perros I, Papalexakis EE, Wang F, Vuduc R, Searles E, Thompson M. SPARTan: Scalable PARAFAC2 for large & sparse data. In: KDD. 2017.
18. Kim J, Park KH, Jang JG, Kang U. Fast and accurate domain adaptation for irregular tensor decomposition. In: KDD. 2024.
19. Afshar A, Perros I, Park H, Defilippi C, Yan X, Stewart W. Taste: temporal and static tensor factorization for phenotyping electronic health records. In: CHIL. 2020.
20. Yin K, Afshar A, Ho JC, Cheung WK, Zhang C, Sun J. LogPar: logistic PARAFAC2 factorization for temporal binary data with missing values. In: KDD. 2020.
21. Harshman RA, et al. PARAFAC2: mathematical and technical notes. UCLA working papers in phonetics. 1972;22(3044):122215.
- View Article
- Google Scholar
22. Kiers HAL, ten Berge JMF, Bro R. PARAFAC2—Part I. A direct fitting algorithm for the PARAFAC2 model. J Chemometrics. 1999;13(3–4):275–94.
- View Article
- Google Scholar
23. ten Berge JMF, Kiers HAL. Some uniqueness results for PARAFAC2. Psychometrika. 1996;61(1):123–32.
- View Article
- Google Scholar
24. Roald M, Schenker C, Cohen JE, Acar E. PARAFAC2 AO-ADMM: constraints in all modes. In: EUSIPCO; 2021.
25. Kazemi SM, Poole D. Simple embedding for link prediction in knowledge graphs. Advances in Neural Information Processing Systems. 2018;31.
- View Article
- Google Scholar
26. Trouillon T, Dance CR, Gaussier É, Welbl J, Riedel S, Bouchard G. Knowledge graph completion via complex tensor factorization. Journal of Machine Learning Research. 2017;18(130):1–38.
- View Article
- Google Scholar
27. Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems. 2013;26.
- View Article
- Google Scholar
28. Hu DY, Reichel L. Krylov-subspace methods for the Sylvester equation. Linear Algebra and its Applications. 1992;172:283–313.
- View Article
- Google Scholar
29. Golub GH, Van Loan CF. Matrix computations. JHU Press; 2013.
30. Li XV. Findkg: dynamic knowledge graph with large language models for global finance. 2023. https://ssrn.com/abstract=4608445
31. Gujral E, Theocharous G, Papalexakis EE. Spade: Streaming parafac2 decomposition for large datasets. In: SDM. 2020.
32. Jang JG, Lee J, Park Y c, Kang U. Fast and accurate dual-way streaming PARAFAC2 for irregular tensors-algorithm and application. In: KDD. 2023.
33. Kwon T, Ko J, Jung J, Jang JG, Shin K. Compact decomposition of irregular tensors for data compression: from sparse to dense to high-order tensors. In: KDD. 2024.
34. Lee D, Shin K. Robust factorization of real-world tensor streams with patterns, missing values, and outliers. In: ICDE. IEEE; 2021.
35. Zhou S, Erfani S, Bailey J. Online CP decomposition for sparse tensors. In: ICDM. IEEE; 2018.
36. Park YC, Jang JG, Kang U. Fast and accurate partial fourier transform for time series data. In: KDD. 2021. p. 1309–18.
37. Park YC, Kim J, Kang U. Fast multidimensional partial fourier transform with automatic hyperparameter selection. In: KDD; 2024. p. 2328–39.
38. Park YC, Kim K, Kang U. PuzzleTensor: a method-agnostic data transformation for compact tensor factorization. In: KDD. 2025. p. 2234–44.
39. Lee S, Park YC, Kang U. Accurate coupled tensor factorization with knowledge graph. In: BigData. IEEE; 2024.
40. Chen H, Li J. Collective tensor completion with multiple heterogeneous side information. In: Big Data. 2019.
41. Choi D, Jang J-G, Kang U. S3CMTF: fast, accurate, and scalable method for incomplete coupled matrix-tensor factorization. PLoS One. 2019;14(6):e0217316. pmid:31251750
- View Article
- PubMed/NCBI
- Google Scholar
42. Acar E, Kolda TG, Dunlavy DM. All-at-once optimization for coupled matrix and tensor factorizations. arXiv preprint 2011. https://arxiv.org/abs/1105.3422
- View Article
- Google Scholar
43. Jeon I, Papalexakis EE, Kang U, Faloutsos C. Haten2: billion-scale tensor decompositions. In: ICDE. 2015.
44. Jeon B, Jeon I, Sael L, Kang U. Scout: scalable coupled matrix-tensor factorization-algorithm and discoveries. In: ICDE. 2016.
45. Gujral E, Theocharous G, Papalexakis EE. C3 APTION: constrainted coupled CP and PARAFAC2 tensor decomposition. In: ASONAM. 2020.

[ref1] 1. Shao P, Zhang D, Yang G, Tao J, Che F, Liu T. Tucker decomposition-based temporal knowledge graph completion. Knowledge-Based Systems. 2022;238:107841.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Balažević I, Allen C, Hospedales T. TuckER: tensor factorization for knowledge graph completion. In: EMNLP. 2019.

[ref3] 3. Zhang J, Lu CT, Cao B, Chang Y, Philip SY. Connecting emerging relationships from news via tensor factorization. In: Big Data. 2017.

[ref4] 4. Padia A, Kalpakis K, Finin T. Inferring relations in knowledge graphs with tensor decompositions. In: Big Data. 2016.

[ref5] 5. Nickel M, Tresp V, Kriegel HP. A three-way model for collective learning on multi-relational data. In: ICML. 2011.

[ref6] 6. Jang J-G, Kang U. Static and streaming tucker decomposition for dense tensors. ACM Trans Knowl Discov Data. 2023;17(5):1–34.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref7] 7. Ahn D, Kim S, Kang U. Accurate online tensor factorization for temporal tensor streams with missing values. In: CIKM. 2021.

[ref8] 8. Jang JG, Kang U. Fast and memory-efficient tucker decomposition for answering diverse time range queries. In: KDD. 2021.

[ref9] 9. Son S, Park Y-C, Cho M, Kang U. DAO-CP: data-adaptive online CP decomposition for tensor stream. PLoS One. 2022;17(4):e0267091. pmid:35421202
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref10] 10. Oh S, Park N, Lee S, Kang U. Scalable tucker factorization for sparse tensors-algorithms and discoveries. In: ICDE. 2018.

[ref11] 11. Jang JG, Kang U. D-tucker: Fast and memory-efficient tucker decomposition for dense tensors. In: ICDE. 2020.

[ref12] 12. Ahn D, Jang JG, Kang U. Time-aware tensor decomposition for sparse tensors. Mach Learn. 2022;111(4):1409–30.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref13] 13. Jang JG, Park Y c, Kang U. Fast and accurate PARAFAC2 decomposition for time range queries on irregular tensors. In: CIKM. 2024.

[ref14] 14. Jang JG, Kang U. Dpar2: Fast and scalable parafac2 decomposition for irregular dense tensors. In: ICDE. 2022.

[ref15] 15. Jang JG, Lee J, Park J, Kang U. Accurate PARAFAC2 decomposition for temporal irregular tensors with missing values. In: Big Data. 2022.

[ref16] 16. Cheng Y, Haardt M. Efficient computation of the PARAFAC2 decomposition. In: ACSSC. 2019.

[ref17] 17. Perros I, Papalexakis EE, Wang F, Vuduc R, Searles E, Thompson M. SPARTan: Scalable PARAFAC2 for large & sparse data. In: KDD. 2017.

[ref18] 18. Kim J, Park KH, Jang JG, Kang U. Fast and accurate domain adaptation for irregular tensor decomposition. In: KDD. 2024.

[ref19] 19. Afshar A, Perros I, Park H, Defilippi C, Yan X, Stewart W. Taste: temporal and static tensor factorization for phenotyping electronic health records. In: CHIL. 2020.

[ref20] 20. Yin K, Afshar A, Ho JC, Cheung WK, Zhang C, Sun J. LogPar: logistic PARAFAC2 factorization for temporal binary data with missing values. In: KDD. 2020.

[ref21] 21. Harshman RA, et al. PARAFAC2: mathematical and technical notes. UCLA working papers in phonetics. 1972;22(3044):122215.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref22] 22. Kiers HAL, ten Berge JMF, Bro R. PARAFAC2—Part I. A direct fitting algorithm for the PARAFAC2 model. J Chemometrics. 1999;13(3–4):275–94.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref23] 23. ten Berge JMF, Kiers HAL. Some uniqueness results for PARAFAC2. Psychometrika. 1996;61(1):123–32.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref24] 24. Roald M, Schenker C, Cohen JE, Acar E. PARAFAC2 AO-ADMM: constraints in all modes. In: EUSIPCO; 2021.

[ref25] 25. Kazemi SM, Poole D. Simple embedding for link prediction in knowledge graphs. Advances in Neural Information Processing Systems. 2018;31.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref26] 26. Trouillon T, Dance CR, Gaussier É, Welbl J, Riedel S, Bouchard G. Knowledge graph completion via complex tensor factorization. Journal of Machine Learning Research. 2017;18(130):1–38.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref27] 27. Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems. 2013;26.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref28] 28. Hu DY, Reichel L. Krylov-subspace methods for the Sylvester equation. Linear Algebra and its Applications. 1992;172:283–313.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref29] 29. Golub GH, Van Loan CF. Matrix computations. JHU Press; 2013.

[ref30] 30. Li XV. Findkg: dynamic knowledge graph with large language models for global finance. 2023. https://ssrn.com/abstract=4608445

[ref31] 31. Gujral E, Theocharous G, Papalexakis EE. Spade: Streaming parafac2 decomposition for large datasets. In: SDM. 2020.

[ref32] 32. Jang JG, Lee J, Park Y c, Kang U. Fast and accurate dual-way streaming PARAFAC2 for irregular tensors-algorithm and application. In: KDD. 2023.

[ref33] 33. Kwon T, Ko J, Jung J, Jang JG, Shin K. Compact decomposition of irregular tensors for data compression: from sparse to dense to high-order tensors. In: KDD. 2024.

[ref34] 34. Lee D, Shin K. Robust factorization of real-world tensor streams with patterns, missing values, and outliers. In: ICDE. IEEE; 2021.

[ref35] 35. Zhou S, Erfani S, Bailey J. Online CP decomposition for sparse tensors. In: ICDM. IEEE; 2018.

[ref36] 36. Park YC, Jang JG, Kang U. Fast and accurate partial fourier transform for time series data. In: KDD. 2021. p. 1309–18.

[ref37] 37. Park YC, Kim J, Kang U. Fast multidimensional partial fourier transform with automatic hyperparameter selection. In: KDD; 2024. p. 2328–39.

[ref38] 38. Park YC, Kim K, Kang U. PuzzleTensor: a method-agnostic data transformation for compact tensor factorization. In: KDD. 2025. p. 2234–44.

[ref39] 39. Lee S, Park YC, Kang U. Accurate coupled tensor factorization with knowledge graph. In: BigData. IEEE; 2024.

[ref40] 40. Chen H, Li J. Collective tensor completion with multiple heterogeneous side information. In: Big Data. 2019.

[ref41] 41. Choi D, Jang J-G, Kang U. S3CMTF: fast, accurate, and scalable method for incomplete coupled matrix-tensor factorization. PLoS One. 2019;14(6):e0217316. pmid:31251750
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref42] 42. Acar E, Kolda TG, Dunlavy DM. All-at-once optimization for coupled matrix and tensor factorizations. arXiv preprint 2011. https://arxiv.org/abs/1105.3422
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref43] 43. Jeon I, Papalexakis EE, Kang U, Faloutsos C. Haten2: billion-scale tensor decompositions. In: ICDE. 2015.

[ref44] 44. Jeon B, Jeon I, Sael L, Kang U. Scout: scalable coupled matrix-tensor factorization-algorithm and discoveries. In: ICDE. 2016.

[ref45] 45. Gujral E, Theocharous G, Papalexakis EE. C3 APTION: constrainted coupled CP and PARAFAC2 tensor decomposition. In: ASONAM. 2020.

Figures

Abstract

Introduction

Preliminaries and problem definition

Irregular tensor

PARAFAC2 decomposition

Knowledge graph

Problem definition

Proposed method for offline tensors: KG-CTF

Overview

Coupled tensor factorization

Regularization

Relational regularization.

Uniqueness regularization.

L2 regularization.

Loss function.

Momentum update procedure

Time complexity of KG-CTF

Proposed method for online tensors: OKG-CTF

Overview

Loss function for a streaming setting

Streaming update procedure

Time complexity of OKG-CTF

Experiments

Experimental settings

Offline performance (Q1)

Online performance for newly arrived data (Q2)

Ablation study (Q3)

Hyperparameter sensitivity (Q4)

Related work

Irregular tensor decomposition

Streaming tensor decomposition

Coupled tensor factorization

Conclusion

Supporting information

S1 Text. Detailed proofs for Lemmas 1 to 12 and Theorem 1 and 2.

References