Fig 1.
Procedures for training, testing and validation of the classifiers.
(a) In the first approach the training and test/validation set were treated as totally separate sets. (b) In the second approach batch effects between the training and test set were overcome by surrogate variable analysis, after which the sets were separated and the models were trained and tested. The samples in the validation set were corrected using the surrogate variables twice, labelled as ALS and as control, before assessing the performance of the models.
Table 1.
Baseline characteristics.
Fig 2.
Elimination of expression heterogeneity by surrogate variable analysis.
The left heatmap displays the expression of the 5,000 most variable probes before correction by surrogate variable analysis. The right heatmap displays the expression of the 5,000 probes after correction by surrogate variable analysis. Rows display arrays and columns reflect probes. Arrays are clustered by hierarchical clustering. Black lines reflect patients and grey lines control subject. Red lines display array hybridized on Illumina’s HumanHT-12 version 3 BeadChips and blue lines those hybridized on version 4. Before SVA correction, arrays are perfectly clustered based on the platform used: after SVA correction, these batch effects are corrected for.
Table 2.
Pathway analysis.
Fig 3.
Probabilities for training and test/validation set.
Boxplots of probabilities given by the four different models (LDA, SVM, NSC and LASSO) in the training and test/validation set for approach 1 (a) and approach 2 (b).
Fig 4.
Receiver operator curves for validation set.
(a) Receiver operator curves for the SVM, NSC and LASSO classifiers in the validation set when discriminating between ALS cases from controls and (b) discriminating ALS cases from ALS-mimics.
Fig 5.
Survival curves for predicted survival classes.
(a) Differences in survival time for the so-termed “long survivors” and “short survivors” in the training set, which was used as input to train the nearest shrunken centroid survival model. (b) The differences in true survival between the predicted “long survivors” and predicted “short survivors” in the test set.