Tensor decomposition-based unsupervised feature extraction applied to matrix products for multi-view data processing | PLOS One

Advertisement

Browse Subject Areas

?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

Boxplots of sample singular value vectors x_ℓ₃,j (a) when TD was applied to the type I tensor and (b), (c), 1 ≤ ℓ₃ ≤ 5, when TD was applied to the type II tensor, generated from mRNA and miRNA expression profiles of multi-omics datasets. (d) Sample singular value vectors when HO GSVD was applied to multi-omics datasets. P-values computed by categorical regression attributed to (a) to (d) were below the figures.

More »

Fig 1.

Boxplots of sample singular value vectors x_ℓ₃,j (a) when TD was applied to the type I tensor and (b), (c), 1 ≤ ℓ₃ ≤ 5, when TD was applied to the type II tensor, generated from mRNA and miRNA expression profiles of multi-omics datasets. (d) Sample singular value vectors when HO GSVD was applied to multi-omics datasets. P-values computed by categorical regression attributed to (a) to (d) were below the figures.

More »

Fig 2.

The results of TD applied to type I tensor generated from EGF treatment experiments.
Sample singular value vectors, Black open circle: Red open circle: (a) ℓ₁ = 1 (b) ℓ₁ = 2. (c) Histogram of the correlation coefficients between sample (time) singular value vectors and selected individual 558 mRNA probes expression profiles. (d) Boxplot of scaled and shifted selected individual 558 mRNA probe expression profiles. Black: control, Red: EGF treated cell lines. The same as (a) to (d), but for type II tensor. Black open circles: Red open circles: (e) ℓ₁ = 1 (f) ℓ₁ = 2. (g) Histogram of the correlation coefficients between sample (time) singular value vectors and selected individual 398 mRNA probe expression profiles. (h) Boxplot of scaled and shifted selected individual 398 mRNA probe expression profiles. Black: control, Red: EGF treated cell lines. P-values computed by t test of 558 (d) or 398 (h) mRNA probes between with and without EGF treatments are below figures.

More »

Fig 2.

The results of TD applied to type I tensor generated from EGF treatment experiments.
Sample singular value vectors, Black open circle: Red open circle: (a) ℓ₁ = 1 (b) ℓ₁ = 2. (c) Histogram of the correlation coefficients between sample (time) singular value vectors and selected individual 558 mRNA probes expression profiles. (d) Boxplot of scaled and shifted selected individual 558 mRNA probe expression profiles. Black: control, Red: EGF treated cell lines. The same as (a) to (d), but for type II tensor. Black open circles: Red open circles: (e) ℓ₁ = 1 (f) ℓ₁ = 2. (g) Histogram of the correlation coefficients between sample (time) singular value vectors and selected individual 398 mRNA probe expression profiles. (h) Boxplot of scaled and shifted selected individual 398 mRNA probe expression profiles. Black: control, Red: EGF treated cell lines. P-values computed by t test of 558 (d) or 398 (h) mRNA probes between with and without EGF treatments are below figures.

More »

Fig 3.

The results of TD applied to type II tensor generated from vaccination.
Sample singular value vectors, Black open circle: Red open circle: Green open circle: (a) ℓ₁ = ℓ₂ = ℓ₃ = 1 (b) ℓ₁ = ℓ₂ = ℓ₃ = 2 (c) Histogram of the correlation coefficients between sample singular value vectors and selected individual 104 mRNA probes expression profiles. (d) Boxplot of scaled and shifted selected individual 104 mRNA probe expression profiles. Black: P, Red:D, green:ND cell lines. P-values computed by categorical regression between P, D, and NP groups are below figures.

More »

Fig 3.

The results of TD applied to type II tensor generated from vaccination.
Sample singular value vectors, Black open circle: Red open circle: Green open circle: (a) ℓ₁ = ℓ₂ = ℓ₃ = 1 (b) ℓ₁ = ℓ₂ = ℓ₃ = 2 (c) Histogram of the correlation coefficients between sample singular value vectors and selected individual 104 mRNA probes expression profiles. (d) Boxplot of scaled and shifted selected individual 104 mRNA probe expression profiles. Black: P, Red:D, green:ND cell lines. P-values computed by categorical regression between P, D, and NP groups are below figures.

More »

Fig 4.

The results of TD applied to the type I tensor generated from a synthetic dataset (M = 50).
(a) to (c) are orthogonal base functions: (a) constant, (b) linear, (c) half period sinusoidal. (d) and (e) base functions used for generating . (d) k = 1, (e) k = 2. (f) is the scatter plot of (d) and (e). (g) to (i) are the first, second, and third sample singular value vectors x_ℓ₃,j and ℓ₃ = 1, 2, 3, and are computed by applying TD to synthetic data.

More »

Fig 4 — Fig 4.

The results of TD applied to the type I tensor generated from a synthetic dataset (M = 50).
(a) to (c) are orthogonal base functions: (a) constant, (b) linear, (c) half period sinusoidal. (d) and (e) base functions used for generating . (d) k = 1, (e) k = 2. (f) is the scatter plot of (d) and (e). (g) to (i) are the first, second, and third sample singular value vectors x_ℓ₃,j and ℓ₃ = 1, 2, 3, and are computed by applying TD to synthetic data.

More »

Fig 5.

Feature singular value vectors when TD was applied to type I tensor generated from synthetic data.
(a) and (b) . and type II tensor, (c) and (d) . Red open circles are 1 ≤ i₁, i₂ ≤ N₀ and black open circles are N₀ < i₁, i₂ ≤ N.

More »

Fig 5.

Feature singular value vectors when TD was applied to type I tensor generated from synthetic data.
(a) and (b) . and type II tensor, (c) and (d) . Red open circles are 1 ≤ i₁, i₂ ≤ N₀ and black open circles are N₀ < i₁, i₂ ≤ N.

More »

Fig 6.

The results of TD applied to type II tensor generated from synthetic dataset (M = 50).
(a) ℓ₃ = 1 (b) ℓ₃ = 2 (c) ℓ₃ = 3. (d) ℓ₃ = 1 (e) ℓ₃ = 2 (f) ℓ₃ = 3. (g): (a) vs (c), (h): (b) vs (d), γ = 0.97, P = 0, (i): (c) vs (f), γ = 0.97, P = 0. γ: Pearson correlation coefficients. P: associated P-values.

More »

Fig 6.

The results of TD applied to type II tensor generated from synthetic dataset (M = 50).
(a) ℓ₃ = 1 (b) ℓ₃ = 2 (c) ℓ₃ = 3. (d) ℓ₃ = 1 (e) ℓ₃ = 2 (f) ℓ₃ = 3. (g): (a) vs (c), (h): (b) vs (d), γ = 0.97, P = 0, (i): (c) vs (f), γ = 0.97, P = 0. γ: Pearson correlation coefficients. P: associated P-values.

More »

Table 1.

Top ranked 10 G(ℓ₁, ℓ₂, ℓ₃)s with larger absolute values among 1 ≤ ℓ₁, ℓ₂, ℓ₃ ≤ 10 when TD was applied to type I tensor generated from and (left) and and (right).

More »

Table 1.

Top ranked 10 G(ℓ₁, ℓ₂, ℓ₃)s with larger absolute values among 1 ≤ ℓ₁, ℓ₂, ℓ₃ ≤ 10 when TD was applied to type I tensor generated from and (left) and and (right).

More »

Table 2 — Table 2.

Overlap between mRNAs identified (S1 Table) and MSigDB.
Top 10 ranked gene sets are presented. Upper rows: type I, lower rows: type II tensors are considered in each gene set name, respectively. The word “BREAST_CANCER/_DUCTAL_CARCINOMA” was presented in bold face in order to emphasize the overlap with breast cancer related gene sets. K: The number of genes in each gene set, k: The number of genes overlapped.

More »

Table 3 — Table 3.

Results of DIANA-mirath using seven miRNAs identified.
Top 10 significant KEGG pathway was presented. gene: number of genes overlapped with miRNAs target genes, miRNA: number of overlapped miRNAs. Numbers both sides of “/” correspond to type I/type II tensors, respectively.

More »

Fig 7.

Hierarchical clustering of (x_mRNA) and (x_miRNA).
When TD was applied to type II tensor (a) and v_ℓ₃,j (for mRNA, labelled as PC), and (for miRNA, labelled as PCM) when PCA was separately applied to miRNA and mRNA (b) (1 ≤ ℓ₃ ≤ 10). Distances were negative signed absolute values of Pearson correlation coefficients. Unweighted Pair Group Method with Arithmetic mean (UPGMA) was employed.

More »

Fig 7.

Hierarchical clustering of (x_mRNA) and (x_miRNA).
When TD was applied to type II tensor (a) and v_ℓ₃,j (for mRNA, labelled as PC), and (for miRNA, labelled as PCM) when PCA was separately applied to miRNA and mRNA (b) (1 ≤ ℓ₃ ≤ 10). Distances were negative signed absolute values of Pearson correlation coefficients. Unweighted Pair Group Method with Arithmetic mean (UPGMA) was employed.

More »

Table 4 — Table 4.

Comparison between 426 mRNA probes identified by TD based unsupervised FE applied to type I tensor and 374 mRNA probes identified by TD based unsupervised FE applied to type II tensor, or 427 probes identified by PCA based unsupervised FE separately applied to miRNA/mRNA.
S:selected, NS:not selected.

More »

Fig 8 — Fig 8.

Two alternative methods applied to synthetic data.
(a) to (d): The results of kCCA. Vertical axes are the coefficients used for linear combinations of N features, Horizontal axes are i₁ and i₂, i.e., indices attributed to N features. (a) and (b): the first view, (c) and (d) the second view. (a) and (c): the first type of kCCA results and (b) and (d): the second type of kCCA results. The distinction between the two types of kCCA results is coincident with the distinction between features with latent correlation (1 ≤ j ≤ N₀) and those without correlation (N₀ < j ≤ N). However, as computed correlation coefficients were as high as 0.99, kCCA failed to identify latent correlation. (e) to (h): PCA separately applied to two views in synthetic data. (e) and (f): the first PC loadings attributed to M samples in each view. (g) and (h) the first and the second PC scores attributed to N features in each view. Red open circles are features with latent correlation (1 ≤ j ≤ N₀). Black circles are those composed of random numbers (N₀ < j ≤ N).

More »

Table 5 — Table 5.

The numbers of identified mRNAs and miRNAs (multi-omics) and mRNAs (vaccination) using various methodologies.
Multi-omics: Among 427 mRNA probes and 12 miRNAs identified PCA based unsupervised FE, 408 mRNA probes (Table 4) and 9 miRNAs were also identified with TD based unsupervised FE applied to type I tensor.

More »

Table 6 — Table 6.

Top 10 significant overlap gene set in MSigDB with top ranked 400 (approx) mRNA probes identified by alternative methods SAM, limma, and RF, as well as 374 mRNA probes identified by HO GSVD.
“BREAST_CANCER” was presented in bold to emphasize the overlap with breast cancer, whose counts are in parentheses at the right side of method names.

More »

Fig 9 — Fig 9.

The results of HO GSVD applied to the synthetic data.
Red open circles are features with latent correlation (1 ≤ j ≤ N₀). The first (a) and the second (b) sample singular value vectors and the first vs the second feature singular value vectors of the first (c) and the second views (d).

More »