Features extracted using tensor decomposition reflect the biological features of the temporal patterns of human blood multimodal metabolome

High-throughput omics technologies have enabled the profiling of entire biological systems. For the biological interpretation of such omics data, two analyses, hypothesis- and data-driven analyses including tensor decomposition, have been used. Both analyses have their own advantages and disadvantages and are mutually complementary; however, a direct comparison of these two analyses for omics data is poorly examined.We applied tensor decomposition (TD) to a dataset representing changes in the concentrations of 562 blood molecules at 14 time points in 20 healthy human subjects after ingestion of 75 g oral glucose. We characterized each molecule by individual dependence (constant or variable) and time dependence (later peak or early peak). Three of the four features extracted by TD were characterized by our previous hypothesis-driven study, indicating that TD can extract some of the same features obtained by hypothesis-driven analysis in a non-biased manner. In contrast to the years taken for our previous hypothesis-driven analysis, the data-driven analysis in this study took days, indicating that TD can extract biological features in a non-biased manner without the time-consuming process of hypothesis generation.


Introduction
The introduction of high-throughput technologies has enabled the profiling of entire biological systems by acquiring omics data such as genomes, transcriptomes, epigenomes, and metabolomes [1,2]. The biological interpretation of such omics data requires integrating, summarizing, and visualizing the omics data to acquire a complete picture of the biological system [3]. A variety of bioinformatics methodologies have been developed to address the challenge of processing large amounts of complex omics data [4][5][6][7]. There are two analyses used for omics analyses: hypothesis-driven and data-driven [4,8]. Hypothesis-driven analysis tests the hypothesis made by the researcher, whereas data-driven analysis does not require a hypothesis to be made by the researcher in advance. Hypothesis-driven analysis is a subjective analysis that extracts the features of omics data by intuition. This result and its biological interpretation are direct and easy to understand. However, because hypothesis-driven analysis is a human-task, hypothesis generated by the same data may differ between individuals, and the extraction of features can be biased depending on prior knowledge. Also, hypothesis generation relies on human inspections and trial and error, which makes it time-consuming. Datadriven analysis is an objective analysis that extracts features by statistical analysis. Because data-driven analysis is a computational task, feature extraction (FE) can avoid bias from individuals and prior knowledge and is much faster than hypothesis-driven analysis. However, the extracted feature is not necessarily easy to understand and it is sometimes difficult to interpret the biological data. Hypothesis-and data-driven analyses have their own advantages and disadvantages, and are mutually complementary. However, a direct comparison between these two analyses is poorly examined. We previously used hypothesis-driven analysis for time series data of various blood metabolites such as amino acids and lipids, including blood glucose and hormones during oral glucose ingestion, and found four features with temporal patterns among individuals and molecules [4]. However, because the number of molecules targeted was limited and different statistical methods were used to calculate the features, analyst bias is a concern. In addition, the FE took years. To address these issues, we attempted a non-biased FE using tensor decomposition (TD).
Here, we used a data-driven approach based on TD as a multivariate analysis method applied to multi-omics datasets. TD enables data-driven analyses such as data dimensionality reduction, classification, and potential FE [9,10] and has been widely applied to omics studies [3,7,[11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26]. Omberg et al. integrated genome-scale mRNA expression data from three cell cycle time courses in yeast to identify genes and the differential effects of gene-mediated biological processes on cell cycle progression using TD. TD also has been applied to analyses within each omics [11,12,15,16,18] and among multiple omics [3,14,17]. Recently, the application of TD-based unsupervised FE was proposed [7,20,21,23,24,26]. Taken together, the data-driven analyses using TD in these studies have shown that biologically meaningful features can be extracted by a data-driven non-biased method from datasets with various modes. However, TD has not been applied to a time series dataset of human blood metabolites and hormone concentrations before and after glucose ingestion. Among several TD-based data-driven analyses, we used the algorithm used in the previously proposed TD-based unsupervised FE [7,20,21,23,24,26].
In this study, we applied TD to time series datasets representing changes in the concentrations of 562 molecules (555 human blood metabolites and 7 hormones) in 20 healthy subjects before and at 14 time points after ingestion of 75 g oral glucose to extract features. We obtained the core tensor and individual-, time-, and molecule-related singular vectors. Reconstructing the time series using only the dominant singular vectors allowed us to better interpret the features of the temporal patterns of the molecules. We characterized each molecule by individual dependence (constant or variable) and time dependence (later or earlier peaks). The moleculerelated singular vectors obtained by TD reflected three of the four features characterized by our previous hypothesis-driven study [4]. We also extracted 68 molecules showing feature time and individual dependencies through the unsupervised learning method by using TD. The extracted molecules significantly overlapped with the analyzed molecules in our previous study [4]. Therefore, by applying TD to the dataset characterized in our earlier study, we were able to extract the features of the target molecules and reveal the temporal patterns in that study [4]. This result not only confirms the validity of our previous findings but also shows the usefulness of TD as a FE method.
Next, we applied the TD method to a dataset representing the concentration changes of 40 molecules in three healthy subjects at 26 time points by three different oral doses (25g, 50g, 75g) and two different patterns of glucose ingestion (bolus or 2h-continuous ingestion). We obtained the core tensor and individual-, time-, molecule-, and experimental condition-related singular vectors. Reconstructing the time series using only the dominant singular vectors allowed us to better interpret the features of the temporal patterns of the molecules. We characterized the temporal pattern of each molecule by its experimental condition dependence (constant or variable) and time dependence (early or late peaks). Thus, we applied TD to a time series dataset of human blood metabolites and hormone concentrations before and after glucose ingestion with various modes, and extracted the features in a non-biased manner. Of the four features of temporal patterns in the hypothesis-driven analysis from our previous study [4], the following three features were extracted by data-driven analysis: the "amplitude and rate" components of temporal patterns, the similarity of temporal patterns among individuals, and the similarity of temporal patterns among molecules. The only feature not extracted was the relationship among individuals over time. This resultindicates that TD can extract some of the same features obtained by hypothesis-driven analysis in a non-biased manner. In addition, FE using TD took only days in this study. As we extracted biological features in a non-biased manner without the time-consuming process of hypothesis generation, we propose that TD be the first choice for the analysis of omics data.

Subjects
The study included 20 healthy subjects for 'third-order tensor with individual mode, time mode, and molecule mode' and 3 healthy subjects for 'fourth-order tensor with individual mode, time mode, experimental condition mode, and molecule mode.' The subjects' profiles are shown in S1 Table, and all subjects provided written informed consent. Subjects were recruited through a snowball sampling and self-selection.

Blood sampling experiment
We used human blood samples obtained previously [4,27]. Briefly, after a 10 h overnight fast, subjects underwent the oral glucose tolerance test in the morning. An intravenous catheter was inserted into the vein of the forearm and fasting samples were drawn twice.

'fourth-order tensor with individual mode, time mode, experimental condition mode, and molecule mode'
A glucose solution containing 25, 50, or 75 g glucose was orally ingested. The ingestion method was rapid within a minute (bolus ingestion) and continuous over the course of 2 h (2 h continuous ingestion). For continuous ingestion, we connected the tube to a noncontact microdispenser robot (Mr. MJ; MECT Co., Osaka, Japan) [28]. The glucose solution was ingested from the tube, and blood samples were obtained every 10 min until 240 min after sugar ingestion [27]. Subjects remained at rest throughout the test. Blood samples were rapidly centrifuged.
Here, we set an interval of 1-2 months for each experiment, because these types of studies take a long period of time and require several hours and fasting by the subjects for each experimental condition.

Ethics committee certification
We complied with Japan's Ethical Guidelines for Epidemiological Research, and the study was approved by the institutional review board and ethics committee of Tokyo University Hospital (No. 10264-(4)).

Introduction of TD
For 'third-order tensor with individual mode, time mode, and molecule mode,' we applied TD to the multimodal data of 562 blood molecules after ingestion of 75 g oral glucose at 14 time points in 20 healthy subjects (Fig 1). Multimodal data are a data structure X with three axes (modes): molecules, individuals, and time. We applied a TD called Tucker decomposition. We schematically illustrate the Tucker decomposition model for the three-dimensional data case in Fig 1. The data cube X (X2R 20×14×562 ) was decomposed into several components by Tucker decomposition, and the number of components differed in the three modes (i.e., dimensions or axes). Here, G(l 1 l 2 l 3 )2R 20×14×562 is the core tensor and U2R 562×562 , V2R 20×20 , W2R 14×14 are the orthogonal matrices. The extracted components were characterized by the columns of the three orthogonal matrices (U, V, W) (i.e., the singular vectors of each mode), meaning that the extracted components had a dependence on each column of the original data. The model of the original data is a weighted sum of the outer products among the columns of (U, V, W). The core tensor G represents the weighted value of the product of single components.
x ijk ¼ X 20 We applied one of the algorithms of the Tucker decomposition, called higher order singular value decomposition (HOSVD) because the proposed application of TD-based unsupervised FE used this method [7,20,21,23,24,26].
This method is easier to compute and has a more converging algorithm than the other method because it does not need initialization and the iterative computation [21]. The components of X (x ijk ) can be decomposed as in (Eq 1), where G(l 1 l 2 l 3 )2R 20×14×562 is the core tensor; For 'fourth-order tensor with individual mode, time mode, experimental condition mode, and molecule mode,' The data were formatted as a tensor (X = {x ijkm }2R 3×26×6×40 ), representing the concentration of the i th individual, j th time point, k th condition, and m th molecule. We normalized x ijkm as P 26 j¼1 x ijkm ¼ 0; [21] was applied to x ijk .

TD-based unsupervised FE
For 'third-order tensor with individual mode, time mode, and molecule mode,' we chose w l 3 k to select biologically meaningful molecules. After selecting w l 3 k , assuming that w l 3 k were normally distributed, the p-values were attributed to the k th molecules using the χ squared distribution as: where s w l 3 is the standard deviation of w l 3 k , and Pχ 2 [> x] is the probability that the argument is larger than x under the assumption that the arguments obey a χ squared distribution. The p-values were further adjusted by the Benjamini-Hochberg (BH) criterion [21], and those molecules associated with adjusted p < 0.1 were selected.

Reconstructed time courses
For 'third-order tensor with individual mode, time mode, and molecule mode,' the original time series of the molecule was reconstructed using only constant individual dependence (u l 1 i ; l 1 ¼ 1) and variable individual dependence (u l 1 i ; l 1 ¼ 2) (Fig 3, see Result). For "individuals similar pattern" and "individuals different pattern," the concentration of the i th individual of the j th time point of the kth molecule is defined as follows: x Individuals similar ijk ¼ X 14 x Individuals different ijk ¼ X 14 where (1 or 2l 2 l 3 )2R 20×14×562 is the core tensor, U ¼ fu l 1 i g 2 R 20�20 , V ¼ fv l 2 j g 2 R 14�14 ; and W ¼ fw l 3 k g 2 R 562�562 are the orthogonal matrices.
For 'fourth-order tensor with individual mode, time mode, experimental condition mode, and molecule mode,' the original time series of the molecule was reconstructed using only constant experimental condition dependence (w l 3 k ; l 3 ¼ 1) and variable individual dependence (w l 3 k ; l 3 ¼ 2) (Fig 7). For "conditions similar pattern" and "conditions different pattern," the concentration of the i th individual's of the j th time point of the k th molecule of the m th experimental condition is defined as follows: x Conditions different ijkm ¼ X 3 where G(l 1 l 2 1 or 2l 4 )2R 3×26×6×40 is the core tensor, U ¼ fu are the orthogonal matrices.

Singular vectors obtained by TD ('third-order tensor with individuals mode, time mode, and molecule mode')
We performed TD of three modes data of the concentrations of 562 blood molecules (7 hormones, 555 metabolites) at 14 time points in 20 healthy subjects after ingestion of 75 g oral glucose, and attempted to capture the differences in temporal patterns of molecular concentration among individuals and molecules by glucose ingestion with TD. The data are formatted as tensor X = {x ijk }2R 20×14×562 , representing the concentration of the i th individual of the j th time point of the k th molecule (Fig 1). We normalized x ijk as P 14 j¼1 x ijk ¼ 0; Higher order singular value decomposition [21] (HOSVD) was applied to x ijk . Note that the molecular assay used in this study was absolute quantitation, thus normalization for inter-sample comparisons of blood samples is not required.
To select a set of molecules with specific time and individual dependence, we used the method proposed by Taguchi (2020). In this method, we focused on the absolute value of G (l 1 l 2 l 3 ) and investigated the combinations of u l 1 i ; v l 2 j ; w l 3 k that shared a large absolute value of G(l 1 l 2 l 3 ). u l 1 i ; v l 2 j ; w l 3 k are singular vectors related to individuals, time, and molecules, respectively. This is because we could consider a molecule with a large value of u l 1 i , which shares a large absolute value of G(l 1 l 2 l 3 ) with a specific pair of individual and time singular vectors, as shown in S1A Fig, to have specific individual and time dependencies. Because u l 1 i ; v l 2 j ; w l 3 k are the unit vectors, note that G(l 1 l 2 l 3 ) represents the weighted value of the product of any single component, as well as how much each dependence is related to each other (Eq 1).
We sorted core tensor components G(l 1 l 2 l 3 ) in the order of their absolute values (Table 1), focusing only on the top four core tensor components and associated singular vectors (S1B Fig, Table 1). Explained variance indicates the ratio of the variance of the low-rank approximation using each component to the variance of the original data.
As discussed later, these four were sufficient for replicating the features of temporal patterns that characterized differences among molecules and individuals in our previous study [4]. These large absolute values of G(l 1 l 2 l 3 ) corresponded to the singular vectors u l 1 i ; l 1 ¼ 1; 2; v l 2 j ; l 2 ¼ 1; 2; w l 3 k ; l 3 ¼ 1; 2; 4 of individual, time, and molecule (Table 1). Thus, we next interpreted these singular vectors.
u l 1 i are the individual-related singular vectors; these values mean individual dependence (Fig 2A and 2B). u l 1 i ; l 1 ¼ 1 showed a constant value among individuals (Fig 2A). For   Abbreviations for the representative molecules are as follows: Cit, citrulline; CRP, C-reactive peptide; FFA, free fatty acid; 3-OH, u l 1 i ; l 1 ¼ 2, the values showed variable among individuals. (Fig 2B). This result means that u l 1 i ; l 1 ¼ 1 represents the common component among individuals and u l 1 i ; l 1 ¼ 2 represents the variable component among individuals. This result also means that the common component among individuals is larger than the variable component among individuals because u l 1 i ; l 1 ¼ 1 shares a larger absolute value of G(l 1 l 2 l 3 ) than u l 1 i ; l 1 ¼ 2. v l 2 j are the time-related singular vectors (Fig 2C and 2D). These values are time-dependent. v l 2 j ; l 2 ¼ 1 peaked at the later time point (later than 60 min) (Fig 2C), v l 2 j ; l 2 ¼ 2 peaked at the earlier time point (at 60 min) (Fig 2C), and v l 2 j ; l 2 � 3 showed a zigzag temporal pattern with different periods (Fig 2D). This result indicates that the early and late peaks are the main components of temporal changes. w l 3 k are the molecule-related singular vectors ( Fig 2E). These values mean molecule dependence. w l 3 k ; l 3 ¼ 1 showed large values for amino acids such as leucine and valine ( Fig 2E). w l 3 k ; l 3 ¼ 2 showed large values for glucose metabolism-related molecules such as glucose and insulin ( Fig 2E). Both w l 3 k ; l 3 ¼ 1; 2 showed large values for lipids such as free fatty acid (FFA) and ketone ( Fig 2E). We focused on the top four core tensor components G(l 1 l 2 l 3 ) in terms of absolute values (S1B Fig, Table 1). Following this, p-values were attributed to the i th molecule for each of w l 3 k ; l 3 ¼ 1; 2; 4 (Methods). P-values were corrected using the BH criterion [21], resulting in the selection of 20 molecules for w l 3 k ; l 3 ¼ 1, 23 molecules for w l 3 k ; l 3 ¼ 2, and 19 molecules for w l 3 k ; l 3 ¼ 4 (S3 Table).

Set of molecules selected by TD and decomposed time series by the dominant singular vectors ('third-order tensor with individuals mode, time mode, and molecule mode').
We investigated the overlap among the selected sets of molecules using w l 3 k ; l 3 ¼ 1; 2; 4, and found that the overlap was small and the time and individual dependencies of each selected set of molecules were specific ( Fig 3A). To better interpret the features of the temporal patterns of the molecules, we reconstructed the time series using only the singular vectors that we focused on (Fig 3B, see Methods). This is equivalent to a low-rank approximation of the tensor.
Because we were interested in features of the temporal patterns that are similar among individuals of each molecule and the temporal patterns that are different among individuals, we reconstructed the original time series of the molecule using only the constant individual dependence ðu l 1 i ; l 1 ¼ 1Þ but also the variable individual dependence (u l 1 i ; l 1 ¼ 2) (Fig 3B, see Methods). Here, we refer to the time series reconstructed using only (u l 1 i ; l 1 ¼ 1) as "individuals similar pattern," and the time series reconstructed using only (u l 1 i ; l 1 ¼ 2) as "individuals different pattern." We focused on leucine, FFA, glutamic acid, and glucose, which showed significant responses before and after glucose ingestion in our previous study [4].
For leucine, as one of the representative molecules of the amino acid (Fig 3B), "individuals similar pattern" peaked at 150 min and showed a sustained temporal pattern. "individuals different pattern" showed the most variation at 45 min, but the absolute value of"individuals different pattern" was smaller than "individuals different pattern" for the other molecules ( Fig 3B). Twenty-three molecules selected only by w l 3 k ; l 3 ¼ 1 included branched chain amino 3-hydroxybutyric acid; Ketone, Total ketone body; Glc, glucose; Ins, insulin; Leu, leucine; Val, valine. The label colors correspond to the metabolic group list. https://doi.org/10.1371/journal.pone.0281594.g002

PLOS ONE
Features extracted by tensor decomposition reflect the biological features of the human blood metabolome acids such as leucine and valine, and aromatic amino acids such as tyrosine and phenylalanine (Fig 3, S1 Table). These molecules had a constant individual dependence (u l 1 i ; l 1 ¼ 1) with a sustained time dependence (v l 2 j ; l 2 ¼ 1) (Fig 2, Table 1). This result suggests that the amino acids selected only w l 3 k ; l 3 ¼ 1 had similar sustained temporal patterns among individuals.
For FFA selected only by w l 3 k ; l 3 ¼ 2, the "individuals similar pattern" showed a early peak with a peak at 150 min (Fig 3), which is similar to leucine. However, the "individuals difference pattern" of FFA showed the most variation at 210 min, and also showed variation between -10 min and 10 min, which differed from leucine. Twenty-five molecules selected only by w l 3 k ; l 3 ¼ 2 included glucose metabolism-related molecules and lipids such as FFA and acetoacetic acid (Fig 3, S1 Table). These molecules had variable individual dependence ðu l 2 ; j ; l 2 ¼ 2Þ with time dependence of the later peak (v l 2 j ; l 2 ¼ 1), while constant individual dependence ðu l 1 i ; l 1 ¼ 1Þ with time dependence of the earlier peak (v l 2 j ; l 2 ¼ 2) (Fig 2, Table 1). This result suggests that the glucose metabolism-related molecules and lipids, which were selected only by w l 3 k ; l 3 ¼ 2; had similar temporal patterns for the earlier peak and different temporal patterns for the later peak among individuals.
For glutamic acid selected only by w l 3 k ; l 3 ¼ 4, the "individual similar pattern" peaked at 210 min and showed a temporal patterns for the later peak, which is slower than leucine and FFA. "individual different pattern" of glutamic acid showed a large variation from 120 to 240 min (Fig 3), and also showed a large variation from 30 to 45 min (Fig 3), which is different from leucine and FFA. Seventeen molecules selected only by w l 3 k ; l 3 ¼ 4 included molecules such as citrate, succinate, and malate, which constitute the TCA cycle, and amino acids such as glutamic acid and glutamine (Fig 3, S1 Table). These molecules had a time dependence of the earlier peak (v l 2 j ; l 2 ¼ 2) with variable individual dependence ðu l 1 i ; l 1 ¼ 2Þ (Fig 2, Table 1). This result suggests that the time dependence of the earlier peak of these molecules are different among individuals. For glucose both selected by both w l 3 k ; l 3 ¼ 2, "individuals similar pattern" peaked at 50 min and showed a time dependence of the earlier peak, which is faster than leucine, FFA, and glutamic acid. The "individuals difference pattern" of glucose showed the largest variation at 180 min (Fig 3), and also showed a large variation from -10 to 10 min, from 75 to 90 min, and at 240 min (Fig 3), which is different from leucine, FFA, and glutamic acid. For the overlap, the molecules selected by both w l 3 k , l 3 = 2 and l 3 = 4 were glucose, creatine, and creatinine (Fig 3, S1 Table). This result suggests that these molecules are characterized by both similar temporal patterns for the earlier peak among individuals and different temporal patterns for the later peak among individuals; and different temporal patterns for the earlier peak among individuals.
For "individuals similar patterns," leucine and glutamic acid showed temporal patterns for the later peak, whereas FFA, and glucose showed temporal patterns for the earlier peak (Fig 3,  Table 1). Furthermore, the different temporal patterns among individuals were later peak for glucose-related molecules and lipids (Table 1). For the "individuals different pattern," the four molecules did not show a clear time dependence, but the time points with large variation were different among molecules (Fig 3). Taken together, each metabolic group was characterized by individual dependence (constant or variable) and time dependence (later peak or earlier peak).

Comparison of the results of our previous study with this study
In our previous study, we used hypothesis-driven analysis and characterized the temporal patterns among individuals and molecules by hypothesis-driven analysis with the following four features [4]: the decomposability into "amplitude" and "rate" components, the similarity of temporal patterns among individuals, the relationship among individuals' over time, and the similarity of temporal patterns among molecules (S3 Fig). In this study, we used TD as datadriven analysis. We investigated whether the features extracted by data-driven analysis in this study reflected the four features derived from the hypothesis-driven analysis. Here, we used the value of the square of w l 3 k to select the molecules.
In our earlier study of the hypothesis-driven analysis, we decomposed the temporal pattern of the molecule into "amplitude of response (AUC)" and "rate of response (T AUC1/2 )" as the first feature [4]. The larger the value of AUC positively or negatively, the larger the response from fasting after glucose ingestion positively (increase) or negatively (decrease), respectively. The larger the value of T AUC1/2 , the slower the response. In this data-driven analysis, we used w l 3 k ; l 3 ¼ 1; 2; 4 to select molecules whose time dependence was later peak or earlier peak. In other words, w l 3 k ; l 3 ¼ 1; 2; 4 are likely to capture the features of the temporal pattern such as the amplitude and rate of response. Therefore, we investigated the relationship between the values of w l 3 k ; l 3 ¼ 1; 2; 4 and AUC, T AUC1/2 . Correlation analysis showed a significant relationship between the value of abs (w l 3 k ; l 3 ¼ 1) and the amplitude of the response, and between the value of abs (w l 3 k ; l 3 ¼ 2) and the rate of the response, respectively (Fig 4A and  4B, Table 2). Thus, the value of w l 3 k ; l 3 ¼ 1 was negatively correlated with the amplitude of response, and the value of w l 3 k ; l 3 ¼ 2 was negatively correlated with the rate of response, indicating that these two values extracted by data-driven analysis reflected the features of amplitude and rate of response derived from the hypothesis-driven analysis.
In our earlier study, we defined the feature of the similarity of temporal patterns among individuals, by calculating for each molecule a correlation coefficient connecting all time courses, combining two selected from all individuals as the index of the similarity (S3 Fig) [4]. The higher the value of this index of similarity, the higher the similarity of temporal patterns among individuals. Because we used w l 3 k ; l 3 ¼ 1 and 2 to select the molecule with constant individual dependence (Fig 2A), we investigated the relationship between the absolute values of both w l 3 k ; l 3 ¼ 1; 2 and the similarity of temporal patterns among individuals. Multiple regression analysis showed a significant relationship between the value of abs(w l 3 k ; l 3 ¼ 1; 2) and the index of the similarity (adjusted coefficient of determination is 0.598 (p = 5.75 x 10 −15 ) ( Fig 4C). Thus, the value of w l 3 k ; l 3 ¼ 1; 2 reflects the similarity of the temporal pattern among individuals.
In our earlier study, we defined the feature of the change in the relationship among individuals over time, by calculating the average changes in z-score values over time for each molecule at each time point as the index of the change in the relationship among individuals over time [4]. The larger the value of the index, the more constant the relationship among individuals over time. Because we used both w l 3 k ; l 3 ¼ 2 and 4 to select molecules with variable individual dependence, we investigated the relationship between the absolute values of w l 3 k ; l 3 ¼ 2; 4 and the magnitude of the change in the relationship among individuals over time. Correlation analysis showed no significant relationship between the value of abs (w l 3 k ; l 3 ¼ 2; 4) and the magnitude of the change in the relationship among individuals over time ( Table 2). We also investigated other relationships between the absolute value of w l 3 k ; l 3 ¼ 1 and the magnitude of the change over time in the relationship among individuals. Correlation analysis showed a significant relationship between the value of abs(u l 1 ; i ; l 1 ¼ 1) and the magnitude of the change in the relationship among individuals over time, but the correlation coefficient was low (r = 3.51 x 10 −1 ) (Fig 4D, Table 2). Therefore, we conclude that TD cannot be used to extract features that reflect the magnitude of the change in the relationship among individuals over time.

Fig 4. Comparison of tensor decomposition analysis with hypothesis-driven analysis. A
The distribution of abs (w l 3 k ; l 3 ¼ 1) and magnitude of response for each molecule. B The distribution of abs (w l 3 k ; l 3 ¼ 2) and rate of response for each molecule. C The distribution of abs (w l 3 k ; l 3 ¼ 1), abs (w l 3 k ; l 3 ¼ 2) and the similarity of temporal pattern among individuals for each molecule. D The distribution of abs (w l 3 k ; l 3 ¼ 1) and the change in the relationship among individuals over time for each molecule. E The distribution of abs (w l 3 k ; l 3 ¼ 1) and the number of molecules with similar temporal pattern for each molecule. Dot colors correspond to the metabolic groups (inset). Representative molecules are labeled. Their abbreviations are as follows: Cit, citrulline; CRP, C-reactive peptide; FFA, free fatty acid; GIP, gastric inhibitory polypeptide (active); Glc, glucose; Ile, isoleucine; Ins, insulin; Ketone, total ketone bodies; Leu, leucine. https://doi.org/10.1371/journal.pone.0281594.g004

PLOS ONE
Features extracted by tensor decomposition reflect the biological features of the human blood metabolome In our earlier study, we defined the feature of the similarity of temporal patterns among molecules [4]. Because the similarity of temporal patterns among molecules was defined for a pair of molecules, we defined the index as a quantitative value of the similarity of temporal patterns among molecules for each molecule. Molecules with large values of this index have a large number of molecules with similar temporal patterns. For details, please refer to our earlier study [4]. In this study, we used w l 3 k ; l 3 ¼ 1; 2; 4 to select molecules whose time dependence is later peak or ealier peak. Therefore, we investigated the relationship between the absolute values of w l 3 k ; l 3 ¼ 1; 2; 4 and the number of molecules with similar temporal patterns. Correlation analysis showed that a significant relationship between the value of abs (w l 3 k ; l 3 ¼ 1) and the number of molecules with similar temporal patterns ( Fig 4E, Table 2). Therefore, the value of w l 3 k ; l 3 ¼ 1 reflects the similar temporal patterns among molecules.
Taken together, the following three features were extracted by data-driven analysis: the "amplitude and rate" components of temporal patterns, the similarity of temporal patterns among individuals, and the similarity of temporal patterns among molecules. The only feature not extracted was the relationship among individuals over time, indicating that TD can extract some of the same features obtained by hypothesis-driven analysis in a non-biased manner (Fig 4, S3 and S4 Figs).

Molecules targeted in our earlier study and extracted molecules in this study
In this study, we used TD to extract 68 molecules showing specific time dependence and individual dependence by unsupervised learning (Fig 5, S4 Table). We examined the overlap with the 83 molecules analyzed in our earlier study; the number of overlapped molecules was 55, which is 55/83 = 66% compared to the 83 molecules analyzed in our earlier study (Fig 5, S4  Table). This result indicates that our earlier hypothesis driven analysis [4] included most of the specific molecules extracted by data-driven analysis using TD in this study. Fisher's exact test showed that the overlap was significant at a significance level of 0.01 (p = 3.88 x 10 −43 ). The overlapped molecules included glucose and insulin, amino acids such as leucine and valine, and lipids such as FFA and total ketone body (Fig 5, S4 Table), which we analyzed in detail in our earlier study [4]. The set of molecules extracted by TD but not analyzed in our earlier study included molecules such as adrenaline, acetoacetic acid, and noradrenalin (Fig 5, S4  Table). Although these molecules look to show specific temporal patterns, they had many missing points and were excluded from the analysis in our earlier study. Note that the TD was analyzed with missing points as zero [21].

PLOS ONE
By contrast, many of the molecules included in our earlier hypothesis-driven analysis but not extracted by TD were ions and other metabolites (Fig 5, S4 Table). These molecules did not show a specific temporal pattern by glucose ingestion, although they had no missing points. The overlapped 55 molecules was 55/68 = 81% compared to 68 molecules extracted using TD (Fig 5). This result indicates that the features extracted by TD are useful in biology, as data-driven analysis sufficiently captured the features obtained by our previous hypothesisdriven analysis.
Taken together, we extracted the features of the molecules and temporal patterns that were targeted in our earlier hypothesis-driven analysis by applying TD to the datasets that were characterized in our earlier study [4]. This result shows not only the validity of the results of our previous study, but also the usefulness of TD as a FE method.

Singular vectors obtained by TD ('fourth-order tensor with individuals mode, time mode, experimental condition mode and molecule mode')
We further analyzed a new dataset that additionally included differences in experimental conditions such as different doses of glucose and ingestion patterns of glucose ingestion (S5 Table). As a new dataset, we targeted molecules of glucose metabolism-related, amino acids, and lipids. For these data, three healthy volunteers ingested either three different doses of glucose (25, 50, and 75 g) rapidly or glucose over 2 h in six experiments to obtain time series data (S5 Fig) [27]. The rapid ingestion paradigm is referred to as bolus ingestion, whereas the slow ingestion paradigm is referred to as continuous ingestion over 2 h. We performed TD of four modes data of the concentrations of 40 blood molecules (4 hormones, 36 metabolites) at 26 time points in three healthy subjects by three different oral doses and two different patterns of glucose ingestion, and tried to capture the differences of temporal pattern of molecular concentration among individuals, molecules, and experimental conditions by TD.
We sorted the core tensor components G(l 1 l 2 l 3 l 4 ) in the order of their absolute values (Table 3, see Methods), and focused only on the top four core tensor components and  [4] and extracted molecules in this study. The molecules included in each Venn diagram are listed in S4 Table. https://doi.org/10.1371/journal.pone.0281594.g005 associated singular vectors (S6 Fig, Table 3). As discussed later, these four were sufficient to characterize the differences in experimental conditions. These large absolute values of G (l 1 l 2 l 3 l 4 ) corresponded to the singular vectors u l 1 i ; l 1 ¼ 1; v l 2 j ; l 2 ¼ 1; 2; w l 3 k ; l 3 ¼ 1; 2; y l 4 m ; l 4 ¼ 1; 2 of individual, time, experimental condition, and molecule (Table 3). Thus, we next interpreted these singular vectors.
u l 1 i are the individual-related singular vectors. This value means individual dependence (Fig 6A). u l 1 i ; l 1 ¼ 1 showed a constant value among individuals (Fig 6A). v l 2 j are the timerelated singular vectors (Fig 6B). This value means time-dependence. v l 2 j ; l 2 ¼ 1 peaked at the later time point (later than 60 min) (Fig 6B), v l 2 j ; l 2 ¼ 2 peaked at the earlier time point (at 60 min) (Fig 6B). This result indicated that the early and late peaks were the main components of the temporal pattern.
w l 3 k are the experimental condition-related singular vectors (Fig 6C and 6D). This value means experimental condition dependence. w l 3 k ; l 3 ¼ 1 shows the constant values among the experimental conditions (Fig 6C). w l 3 k ; l 3 ¼ 2 shows the variable values among the experimental conditions (Fig 6D). w l 3 k ; l 3 ¼ 2 also shows that the value of bolus condition decreases from positive to negative in a dose-dependent manner, and the value of continuous condition negatively increase in a dose-dependent manner. Taken together, w l 3 k ; l 3 ¼ 2 divides not only the duration of the bolus or continuous condition but also the axis of the dose.

The individual-related singular vectors and decomposed time series by the dominant singular vectors ('fourth-order tensor with individuals mode, time mode, experimental condition mode and molecule mode')
We selected the top four pairs as relatively large G(l 1 l 2 l 3 l 4 ) (S6 Fig, Table 3). In previous sections (Figs 2-5), We calculated the p-values for each molecule in order to select biologically meaningful molecules. However, we could not extract molecules with a significantly large value of y l 4 m possibly because of the small number of molecules. Therefore, we summarized the features of the temporal patterns of the molecules based on the distribution of y l 4 m (Fig 7A).
The magnitude of the absolute value of the molecule-related singular vectors captured the features of the temporal pattern after glucose ingestion (Fig 4, S2 Fig). We focused on the absolute value of the molecule-related singular vectors. Abs (y l 4 m ) values are the absolute value of the molecule-related singular vectors. Abs (y l 4 m ; l 4 ¼ 1) values of amino acids such as leucine and valine (Fig 7A) and lipids such as FFA and ketone ( Fig 7A) were larger, and those of the molecules related to glucose metabolism such as glucose and insulin (Fig 7A) were smaller.  Abs (y l 4 m ; l 4 ¼ 2) values of lipids ( Fig 7A) and glucose metabolism-related molecules ( Fig 7A) were larger, and those of amino acids were smaller (Fig 7A). This result indicates that glucose metabolism-related molecules, amino acids and lipids have specific individual, time, and experimental condition dependence.
To better interpret the features of the temporal patterns of the molecules, we reconstructed the time series using only the singular vectors (Fig 7B, see Methods). Because we focused on the features of the temporal patterns that were similar among experimental conditions of each molecule and the temporal patterns that were different among experimental conditions, we  reconstructed the original time series of the molecule using only the constant condition dependence (w l 3 k ; l 3 ¼ 1) and the variable condition dependence (w l 3 k ; l 3 ¼ 2) (Fig 7B, see Methods). Here, we refer to the time series reconstructed using only (w l 3 k ; l 3 ¼ 1) as "conditions similar pattern," and the time series reconstructed using only (w l 3 k ; l 3 ¼ 2) as "conditions different pattern." The original data of leucine, FFA, and glucose, and the "conditions similar pattern" and "conditions different pattern" are shown (Fig 7B).
Both y l 4 m ; l 4 ¼ 1; 2 shared a constant individual dependence (u l 1 i ; l 1 ¼ 1) (Table 3). This result indicates that all molecules had a similar temporal patterns among individuals. For leucine, as one of the representative molecules of the amino acid (Fig 7A), "conditions similar pattern" showed a later peak from 120 to 150 min (Fig 7B). "Conditions different pattern" of leucine showed variation from 30 to 100 min and from 160 to 210 min (Fig 7B). The earlier peak values from 30 to 100 min changed from negative to positive from bolus ( Fig 7B) to continuous (Fig 7B), whereas later peak values from 160 to 210 min changed from negative to positive from continuous ( Fig 7B) to bolus (Fig 7B). Because "conditions similar pattern" peaked at negative values, the more negative the value of the peak in the "Conditions difference pattern," the larger the peak in the original time series. Thus, the earlier peak was larger for bolus, and the later peak was larger for continuous. Other amino acids with large abs (y l 4 m ; l 4 ¼ 1) had constant conditional dependence (w l 3 k ; l 3 ¼ 1), with time dependence of the later peak (v l 2 j ; l 2 ¼ 1), whereas variable condition dependence (w l 3 k ; l 3 ¼ 2) had time dependence of the earlier peak (v l 2 j ; l 2 ¼ 2) (Table 3). This result suggested that the temporal patterns of amino acids showed similar temporal patterns for the later peak and different temporal patterns for the earlier peak among the experimental conditions.
For glucose, the representative molecule of glucose metabolism-related molecules, "conditions similar pattern" showed an early peak from 30 to 40 min (Fig 7B). "Conditions different pattern" showed variation from 0 to 40 min, 60 to 110 min, and 200 to 240 min, but the variation from 200 to 240 min was smaller than the other two durations (Fig 7B). During 0 to 40 min and 200 to 240 min, the peak value changed from negative to positive from continuous ( Fig 7B) to bolus (Fig 7B), whereas from 60 to 110 min, the peak value changed from negative to positive from bolus ( Fig 7B) to continuous (Fig 7B). Because "conditions similar pattern" peaked at positive values, this result indicates that the earlier peak during 0 to 40 min was larger for bolus and the later peak during 60 to 110 min was larger for continuous. This result also indicates that the other peak during 200 to 240 min after glucose ingestion was larger for bolus although the variation among experimental conditions was not as large as the others. Glucose metabolism-related molecules such as glucose and insulin with large abs (y l 4 m ; l 4 ¼ 2) had constant experimental condition dependence (w l 3 k ; l 3 ¼ 1) with time dependence of the early peak (v l 2 j ; l 2 ¼ 2) (Table 3), but variable experimental condition dependence ðw l 3 k ; l 3 ¼ 2Þ with the time dependence of the late peak (v l 2 j ; l 2 ¼ 1) (Table 3). These results suggest that the temporal patterns of glucose metabolism-related molecules showed similar temporal patterns for the earlier peak and different temporal patterns for the later peak among the experimental conditions.
For FFA, as one of the representative molecules of the lipid, "conditions similar pattern" peaked from 110 to 120 min, later than those of glucose metabolism-related molecules but earlier than amino acids (Fig 7A). "Conditions different pattern" showed variation from -5 to 10 min, 40 to 60 min, and 140 to 210 min (Fig 7B). The peak values changed from negative to positive from bolus ( Fig 7B) to continuous (Fig 7B) from 5 to 10 min, 40 to 60 min, and from negative to positive from continuous to bolus during 140 to 210 min. Because "conditions similar pattern" peaked at negative values, this result indicates that the earlier peak was larger for bolus and the later peak was larger for continuous. Abs (y l 4 m ; l 4 ¼ 1) of lipids were also larger than those of glucose metabolism-related molecules but smaller than those of amino acids (Fig 7A). In addition, abs (y l 4 m ; l 4 ¼ 2) of lipids were large (Fig 7A). This result suggests that the similar temporal patterns among experimental conditions for lipids showed a later peak than glucose metabolism-related molecules and earlier peak than amino acids. This result also suggests that the different temporal patterns among experimental conditions for lipids showed an earlier peak than glucose metabolism-related molecules and later peak than amino acids.
Taken together, the similar temporal patterns among experimental conditions changed from the earlier peak to later peak in the order of amino acids, lipids, and glucose metabolismrelated molecules (Fig 7, Table 3). "Conditions similar patterns" of leucine, FFA, and glucose showed these time dependencies (Fig 7B). The molecule-related singular vectors of each molecule also indicated that the different temporal patterns among experimental conditions changed from the earlier peak to the late peak in the order of amino acids, lipids, and glucose metabolism-related molecules (Fig 7, Table 3). "Conditions different pattern" for leucine, FFA, and glucose showed large variation at different time points (Fig 7B). "Conditions different pattern" also showed that the earlier peak was larger for bolus and the later peak was larger for continuous ( Fig 7B). This supports that the change in w l 3 k ; l 3 ¼ 2 from positive to negative from 25B to 75C indicated a change from the earlier peak to the later peak. Taken together, the temporal patterns of each molecule were characterized by experimental condition dependence (constant or variable) and time dependence (early or late peak).

The temporal pattern of 'third-order tensor with individual mode, time mode, and molecule mode'
In this study, we applied TD to 'third-order tensor with individual mode, time mode, and molecule mode' as multimodal data, and extracted features of temporal patterns obtained by previous data-driven analysis (Figs 1-5, Tables 1, 2). In our earlier study [4], we selected target molecules from the same dataset and characterized four features of temporal patterns of molecules among individuals and molecules derived from the hypothesis-driven analysis.
The features extracted by the TD in this study reflected three of the four features of the temporal pattern (Fig 4, S3, S4 Figs). This result suggests that not only the three features of the temporal pattern capture the features of the dataset but also that a feature can only be extracted by hypothesis-driven analysis. However, we could not exclude the possibility that the lower core tensor components reflect such feature, and further study is necessary to address this issue.
We also extracted 68 molecules that showed specific time and individual dependencies through unsupervised learning by using TD (Fig 5). The extracted molecules included most of the molecules analyzed in our earlier study, which ensured the validity of the extraction of feature by our earlier hypothesis-driven study [4]. In addition, the features extracted by TD sufficiently captured the features of the temporal patterns of the molecules that were of interest in our previous study, indicating that the results of data-driven analysis are useful in biology.

Interpretation of time patterns by reconstructing time series
To better interpret the features of the temporal patterns of molecules, we reconstructed the time series using only the singular vectors of interest (Figs 3 and 7, see Methods). For example, a molecule such as glucose, which has multiple time and individual dependencies (Fig 3), is difficult to extract multiple biological features. We improved the interpretability by reconstructing the time series and visualizing it (Fig 3B). In the field of biology, the extraction of waveforms by low-rank approximation in functional magnetic resonance imaging has been studied [32]. In our earlier study, the low-rank representation of the time series facilitated biological interpretation and led to useful discussions [4].

The temporal pattern of 'fourth-order tensor with individual mode, time mode, experimental condition mode, and molecule mode'
We also applied TD to 'fourth-order tensor with individual mode, time mode, experimental condition mode, and molecule mode' to characterize the experimental condition-dependent temporal patterns for each molecule (Fig 6). The similar and different temporal patterns among experimental conditions changed from the earlier peak to the later peak in the order of amino acids, lipids, and glucose metabolism-related molecules (Fig 7, Table 3).
"Conditions different pattern" of leucine, glucose, and FFA showed a large variation at different time points. "Conditions different pattern" of leucine, glucose, and FFA showed that the earlier peaks were larger for bolus ingestion and the later peaks were larger for continuous ingestion. In a previous study in which healthy subjects drank the same dose of glucose solution in either a bolus or continuous manner, the peaks by continuous ingestion were later than those by bolus ingestion for the glucose metabolism-related molecules such as glucose, insulin, C-peptide, and GIP [33]; branched-chain amino acids such as valine, leucine, and isoleucine; and metabolites including FFAs and hormones [34]. The results of this study were consistent with previous studies in which the dependence of the experimental conditions was evaluated for each molecule [33,34]. However, in this study, we could evaluate not only the dependence among experimental conditions for each molecule (Fig 7B) but also the difference in dependence of the experimental conditions among molecules (Fig 7A). This indicates that the analysis flow using TD is valid for multimodal data and can simultaneously extract multiple biological features by data-driven analysis. In this study, we applied TD to a time series dataset of human blood metabolites and hormone concentrations before and after glucose ingestion. We extracted individual-dependent temporal patterns for each molecule and experimental condition-dependent temporal patterns for each molecule by using TD. Because previous studies have used TD to extract features from other omics dataset such as transcriptome [12,13,15,16,18], it would be possible to extract features as temporal patterns from time series data of other omics dataset.

Research limitations and prospects
We focused on the top four pairs as the core tensors with the largest absolute values (Tables 1,  3). However, because this selection method introduces analyst bias, selection criteria will be needed in the future. In addition, a few molecules possibly had a specific dependence on the core tensor that we did not focus on in this study. A decomposition method that can extract the feature dependence of a few molecules will be necessary. TD didn't extract the feature that reflect the relationship among individuals over time. We attributed this to the normalization method. In this study, we could not extract features related to the variation in absolute values among individuals because we normalized data by time mode not individuals mode. The various features of the responses of molecules (temporal patterns) by glucose ingestion we focused on in this study were clarified by a combination of hypothesis-driven and data-driven analyses [4]. However, we could not identify the mechanisms behind the feature of dataset. For this purpose, it would be effective to formulate hypotheses based on the features revealed in this study and to make plans for new experiments or analyze time series data using mechanistic mathematical models. We discussed interindividual differences in the role of incretins in the regulation of blood glucose levels by mathematical model analysis focusing on glucose metabolism-related molecules [27].

Conclusion
In this study, we first applied TD to a dataset representing changes in the concentrations of 562 molecules (7 hormones and 555 metabolites) at 14 time points in 20 healthy subjects by 75 g glucose ingestion ('third-order tensor with individual mode, time mode, and molecule mode'). We obtained the core tensor and individual-, time-, and molecule-related singular vectors. By reconstructing the time series using only the singular vectors that we focused on, we could better interpret the features of the temporal patterns of the molecules. We characterized each molecule by individual dependence (constant or variable) and time dependence (later or earlier peaks).
The molecule-related singular vectors obtained by TD reflected three of the four features characterized by our earlier hypothesis-driven study [4]. We also extracted 68 molecules showing feature time and individual dependencies through unsupervised learning method by using TD. The extracted molecules overlapped significantly with the analyzed molecules in our earlier study [4]. Therefore, by applying TD to the dataset characterized in our earlier study, we extracted the features of the target molecules and the revealed temporal patterns in our earlier study [4]. This result not only confirms the validity of the results of our earlier study [4] but also shows the usefulness of TD as a FE method.
Next, we applied the TD method to a dataset representing the concentration changes of 40 molecules in three healthy subjects at 26 time points by three different oral doses and two different patterns of glucose ingestion ('fourth-order tensor with individual mode, time mode, experimental condition mode, and molecule model'). We obtained the core tensor and individual-, time-, molecule-, and experimental condition-related singular vectors. By reconstructing the time series using only the singular vectors we focused on, we could better interpret the features of the temporal patterns of the molecules. We can characterize the temporal pattern of each molecule by its experimental condition dependence (constant or variable) and time dependence (early or late peaks). We applied TD to a time series dataset of human blood metabolites and hormone concentrations before and after glucose ingestion with various modes, and extracted biological features in a non-biased manner without time-consuming process of hypothesis generation. We propose that TD can be the first choice for analysis of omics data. In our earlier study, we used hypothesis-driven analysis and characterized the temporal patterns among individuals and among molecules by the hypothesis-driven analysis with four features [4]: the decomposability into "amplitude" and "rate" components, the similarity of temporal patterns among individuals, the relationship among individuals' over time, and the similarity of temporal patterns among molecules.