Dimensionality reduction of longitudinal ’omics data using modern tensor factorizations

doi:10.1371/journal.pcbi.1010212

Fig 1.

Illustration of settings and workflow.

Center rhombus describes the typical data produced in a longitudinal experiment, where high-dimensional samples are collected from multiple subjects across various timepoints. Left-hand side describes a summarized overview of standard, non-tensor-based workflows, including (top to bottom) ordination plots with repeated measurements, per-timepoint multivariate analysis, and funnels for discovery using univariate time-series analysis. Top right—schematic derivation of tcam from tsvdm. Bottom right—tcam’s output and its applications, including exploratory analysis of the data through a reduced features space where variation between points reflects differences between high-dimensional temporal trajectories, feature engineering for downstream ML workflows, and feature selection for downstream univariate exploration.

More »

Expand

Fig 2.

Subject centered view of 3^rd order tensor.

a An illustration of the data structure. b The right panel presents a breakdown of the left tensor into m horizontal slices that are p × n matrices.

More »

Expand

Fig 3.

Illustration of the tsvdm decomposition for a 3^rd order tensor.

Left hand side of the equation shows that data tensor , right hand side shows the factors , where are ⋆_M-orthogonal tensors and is f-diagonal.

More »

Expand

Fig 4.

Illustration of the tcam mapping defined in Eq 3 and Algorithm 2.

Top: right multiplication of new data point (a matrix) by , followed by application of M (middle), and concatenation (bottom).

More »

Expand

Fig 5.

Comparison of tcam with existing matrix-based methods.

a PCA of all timepoints, colored by participant. b Regression line of mean distance between subjects at all timepoints (x) and at baseline (y). Distances computed using PC₁ and PC₂. c Leading tcam factors. d Bar graph showing top 2.5% features contributing to F₁s variation. e Comparison of discovery rates for univariate hypothesis testing (lmer), between naïve workflow (left) and tcam-based pruning (right) workflow. f Venn diagram and bar graphs. Bars denote per-subject iAUC for all detected bacteria (q<0.05). Venn diagram relates each bacterium to the workflow it was detected in. Bars represent medians. Names of microbial species which are included in the probiotics mix are highlighted in bold.

More »

Expand

Fig 6.

Comparison of tcam with existing tensor-based methods.

a Scatter plot of the data from [16] obtained using CTF; Inset: pairwise PERMANOVA. b tcam Scatter plot for data of [16] using tcam; Inset: Pairwise PERMANOVA c Funnel comparing discovery rates of CTF and tcam based pruning strategies. d Time series describing significant bacteria (lmer) found using tcam based pruning strategy on top of LFB normalization. e Barplot with top and bottom 2.5% loadings for F₃. Heatmap representing per-subject AUC (log scale) for the same features; Color bar indicates z-score normalized value.

More »

Expand

Fig 7.

tcam’s applicability to proteomics datasets.

a Scatterplot of leading tcam factors significantly correlated with insulin resistance or sensitivity. Points are colored according to insulin resistant (IR) and insulin sensitive (IS) information. b Heatmap showing the sum of top and bottom 25 features contributing to the variation on F₁ according to their loadings; Color bar indicates z-score normalized value.

More »

Expand

Fig 8.

tcam enables new discoveries and is amenable for ML application.

a ROC curve for MLP model trained to classify remission/flare based on tcam transformed data of all timepoints. b Bar plot showing importance scores of top 5% ranked features. c Scatterplot of tcam scores computed on top 5% most important features.

More »

Expand