Figure 1.
Rough overview of the proposed extension of PPLS-DA.
Figure 2.
Extension of PPLS-DA - for stepsize and
.
The power parameter is denoted by , the prediction error (number of wrongly classified samples of the inner test set) is abbreviated with PE.
varied in 11 steps (
). Cj, j = 1
5 is short for the jth component. The function min(f) takes the minimum of function
. The cross-validation procedures consist of random samples of the outer training set to the proportions of 0.7 (training set) and 0.3 (test set). The cross-validation steps are conform to sampling with replacement. The optimal
-value and the optimal number of components are determined after 50 repeats.
Figure 3.
Condition index for the first five eigenvalues.
The condition index
(
number of features) is used as a measure for variable dependence, with
eigenvalue of
. It can be assumed that
. The increase of the first five condition indexes (
) reflects the collinearity of the features. A rapid increase means, the features are strong linear dependent, a weak increase implies a weak dependence.
Figure 4.
Plot of the first 50 largest eigenvalues of cov(
) (bars) and of the absolute covariance between
and
(dots) for the experimental data sets and for case 3 for the simulated data.
The eigenvalues ,
are scaled corresponding to the largest eigenvalue, also the absolute values of the covariance between the principal component
and the response vector
, here
equals 1 if sample i belongs to group
, otherwise
equals −1.
Table 1.
Overview of the experimental data sets.
Figure 5.
Mean PE of PPLS-DA using ,
and
, PLS-DA, t-LDA and SVM for the five cases of the simulated data.
Table 2.
The mean number of components used for simulated data for and
= 0.1.
Figure 6.
Average loading weights of the first component for the simulated data (case 3).
The simulated data of case 1 are constructed such that the technical variance is of the same size as the biological variance. 10 differentially expressed genes with a mean class difference are simulated. Loading weights for the first component as calculated by PPLS-DA are shown with the power parameter
(A) and
(B) using 50 inner cross-validation steps and a stepsize of
. The basis are the results of 100 choices of the outer training and outer test set.
Figure 7.
Mean PE of PPLS-DA using ,
and
, PLS-DA, t-LDA and SVM for the five cases of the experimental data sets.
Table 3.
The mean number of components used for the experimental data sets.