Table 1.
Dimensionality reduction methods.
Table 2.
Example implementations.
Fig 1.
For spectral methods, the eigenvalues can be used to decide how many dimensions are sufficient. The number of dimensions to keep can be selected based on an "elbow rule." In the example shown, you should keep the first five principal components.
Fig 2.
Two simulated Gaussian clusters projected on the first and the second PCs. Incorrect aspect ratio in a rectangular (a) and square (b) plot. Correct aspect ratio in (c, d) where the plot's height and width are adjusted to match the variances in PC1 and PC2 coordinates. Colors shown in (d) indicate the true Gaussian group membership. Dim1, dimension 1; Dim2, dimension 2; PC, principal component; PCA, PC analysis.
Fig 3.
PCA on wine dataset shows how variables' representation can be used to understand the meaning of the new dimensions. Correlation circle (a) and PC1 contribution plot (b). AlcAsh, alcalinity of ash; Dim1, dimension 1; Dim2, dimension 2; Flav, flavanoids; NonFlav Phenols, nonflavanoid phenols; OD, OD280/OD315 of diluted wine; PC, principal component; PCA, PC analysis; Phenols, total phenols; Proa, proanthocyanins.
Fig 4.
A single plot for the wine dataset combines both the samples' and the variables' projection to the first two principal components. AlcAsh, alcalinity of ash; Dim1, dimension 1; Dim2, dimension 2; Flav, flavanoids; NonFlav Phenols, nonflavanoid phenols; OD, OD280/OD315 of diluted wine; PCA, principal component analysis; Phenols, total phenols; Proa, proanthocyanins.
Fig 5.
Observations in PCA plots may cluster into groups (a) or follow a continuous gradient (b). Dim1, dimension 1; Dim2, dimension 2; PCA, principal component analysis.
Fig 6.
(a) A PCA sample projection on the wine dataset shows that, based on their properties, wines tend to cluster in agreement with the grape variety classification: Nebbiolo, Grignolino, and Barbera. (b) A PCA biplot can be used to find which groups of wines tend to have higher levels of which property. Dim1, dimension 1; Dim2, dimension 2; PCA, principal component analysis.
Fig 7.
DiSTATIS on multiple distance tables defined for the same observations. Multiple distances can be computed from different data modalities, e.g., gene expression, methylation, clinical data, or from data resampled from a known data-generating distribution.
Fig 8.
When subsequent eigenvalues have close-to-equal values, PCA representation is unstable. PCA, principal component analysis.
Fig 9.
Stability in the DR output coordinates for each data point. Projections of bootstrap samples for two 10D simulated datasets with rank 2 (a) and rank 5 (b) onto the first two PCs aligned using a Procrustes transformation. Smaller, circular markers correspond to each bootstrap trial, and larger, diamond markers are coordinates of the full dataset. DR, dimensionality reduction; PC, principal component.