Evaluation of machine learning algorithms and structural features for optimal MRI-based diagnostic prediction in psychosis

doi:10.1371/journal.pone.0175683

Table 1.

Demographic and clinical characteristics of samples of both patient groups and of healthy controls.

More »

Expand

Fig 1.

Data features generated from the individual structural T1 magnetic resonance images.

Each type was used as input data to evaluate the prediction capacity of the different classifiers. Grey and white matter was considered separately when using voxel based features. Left and right hemispheres were considered separately when vertex based cortical information was applied.

More »

Expand

Fig 2.

General cross validation scheme applied to evaluate the classification accuracy in all combinations of algorithms and data features.

For most classifiers, cross-validation is used at two levels: at an outer level for training and testing and within each training sample to select the optimal values for the regularization parameters (delta). The effect of nuisance covariates on test data should be regressed out by using coefficients fitted in the training data. Individual performances are given as frequencies of test individuals successfully classified.

More »

Expand

Fig 3.

Classification accuracies for each combination of algorithm and feature type applied to the healthy vs. schizophrenia classification.

Mean accuracy for the 10 test samples (in green), approximate 95% confidence interval for the mean accuracy (in blue) and highest and lowest accuracy values (in red) are shown for each combination. Rid: Ridge regression, Las: Lasso regression, Ela: Elastic net regularization, L0: L0-norm regularization, SVC: Support vector classifier, RDA: Regularized discriminant analysis, GPC: Gaussian process classifier, RF: Random forest.

More »

Expand

Fig 4.

Classification accuracies for each combination of algorithm and feature type applied to the healthy vs. bipolar disorder classification.

Mean accuracy for the 10 test samples (in green), approximate 95% confidence interval for the mean accuracy (in blue) and highest and lowest accuracy values (in red) are shown for each combination. Rid: Ridge regression, Las: Lasso regression, Ela: Elastic net regularization, L0: L0-norm regularization, SVC: Support vector classifier, RDA: Regularized discriminant analysis, GPC: Gaussian process classifier, RF: Random forest.

More »

Expand

Fig 5.

Classification accuracies for each combination of algorithm and feature type applied to the bipolar disorder vs. schizophrenia classification.

Mean accuracy for the 10 test samples (in green), approximate 95% confidence interval for the mean accuracy (in blue) and highest and lowest accuracy values (in red) are shown for each combination. Rid: Ridge regression, Las: Lasso regression, Ela: Elastic net regularization, L0: L0-norm regularization, SVC: Support vector classifier, RDA: Regularized discriminant analysis, GPC: Gaussian process classifier, RF: Random forest.

More »

Expand

Fig 6.

Accuracy rates averaged over all classifiers for the different feature types.

Pairs of features showing significant differences from paired Wilcoxon tests (with p < 0.05) are signaled. Most of the significant differences involve a higher accuracy rate for grey matter VBM and WBM. a: significantly different from VBM_GM, b: significantly different from WBM_GM, c: significantly different from VolumeR.

More »

Expand

Fig 7.

Accuracy rates averaged over all feature types for the eight classifiers.

Pairs of classifiers showing significant differences from paired Wilcoxon tests (with p < 0.05) are marked. None of the classifiers clearly outperforms the others. a: significantly different from L0, b: significantly different from GPC, c: significantly different from Elastic.

More »

Expand

Table 2.

Mean accuracy rate and area under the receiver operating curve (AUC) for the eight classifiers on VBM grey matter.

More »

Expand

Fig 8.

Receiver Operating Curves (ROCs) for the different classifiers applied to grey matter VBM.

Best classification performances are observed in the healthy vs. schizophrenia classification. The overlap between curves in each plot points to similar classification levels attained by the different algorithms. AUC: Area under the receiver operating curve. There are no ROCs for the Regularized discriminant analysis because no reliable individual probabilities were available for this algorithm.

More »

Expand

Fig 9.

Brain maps of coefficients from fitted classifiers on grey matter VBM images together with effect size maps as given by standard univariate t-tests.

Values for the random forest classifier are variable importance measures derived from the Gini index. Functions for the Gaussian process classifier and the regularized discriminant analysis did not provide fitted coefficients and maps were not available.

More »

Expand

Fig 10.

Plots of t-test based effect sizes (x-axis) versus (non-zero) coefficients from the different classifiers (y-axis) applied to grey matter VBM data.

Non parametric local regression (lowess) lines are shown in blue. Random forest values are variable importances derived from the Gini index. No coefficients were available for the Gaussian process classifier and the regularized discriminant analysis.

More »

Expand

Table 3.

Mean classification accuracies obtained by combining all data features together, after dimensionality reduction based on Principal Component Analysis (PCA Based combination), and after selecting only the 1% of variables with largest t values (t-thresholded combination).

More »

Expand

Fig 11.

Estimated classification accuracies obtained by considering all feature types together as predictors.

In (A) a principal component analysis was previously applied to the merged data to reduce computational burden and dimensionality, in (B) only the 1% of variables with largest t values as given by univariate two group comparisons was considered. Green line: mean accuracy for the 10 test samples; blue lines: approximate 95% confidence intervals for the mean accuracy; red line: highest and lowest accuracy values. ridge: Ridge regression, lasso: Lasso regression, elastic: Elastic net regularization, L0: L0-norm regularization, SVC: Support vector classifier, RDA: Regularized discriminant analysis, GPC: Gaussian process classifier, RF: Random forest.

More »

Expand

Fig 12.

Classification accuracies generated by multi-class classifiers on grey matter VBM using the one-vs-one approach.

All algorithms were used (except the regularized discriminant function analysis, which did not report reliable class probabilites). Overall accuracies are plotted together with accuracies for the three groups separately. Green line: mean accuracy for the 10 test samples; blue lines: approximate 95% confidence intervals for the mean accuracy; red line: highest and lowest accuracy values. ridge: Ridge regression, lasso: Lasso regression, elastic: Elastic net regularization, L0-norm: L0-norm regularization, SVC: Support vector classifier, GPC: Gaussian process classifier, RF: Random forest.

More »

Expand

Table 4.

Mean accuracies obtained by all classifiers (apart from the regularized discriminant function analysis) using a one-vs-one and a one-vs-all multi-class approach on grey matter VBM images.

Lower and upper limits for the 95% bootstrap confidence intervals are also reported. 0.333 is the expected accuracy when no real predictive power is present.

More »

Expand

Table 5.

Mean accuracies obtained by classifiers that provide inbuilt multiclass functionality (all but the L0-norm and the support vector classifiers).

Lower and upper limits for the 95% bootstrap confidence intervals are also reported. 0.333 is the expected accuracy when no real predictive power is present.

More »

Expand