^{*}

^{¶}

¶ Membership of the Alzheimer's Disease Neuroimaging Initiative is provided in the Acknowledgments.

ADNI, one of the funders of this study, is partly funded through generous contributions from the following: Abbott; Amorfix Life Sciences Ltd.; AstraZeneca; Bayer HealthCare; BioClinica, Inc.; Biogen Idec Inc.; Bristol-Myers Squibb Company; Eisai Inc.; Elan Pharmaceuticals Inc.; Eli Lilly and Company; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; GE Healthcare; Innogenetics, N.V.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Servier; Synarc Inc.; and Takeda Pharmaceutical Company. There are no patents, products in development or marketed products to declare. This does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials, as detailed online in the guide for authors. .

Conceived and designed the experiments: RC FCH MAE. Performed the experiments: RC FCH MAE. Analyzed the data: RC FCH MAE. Wrote the paper: RC FCH MAE.

Machine learning neuroimaging researchers have often relied on regularization techniques when classifying MRI images. Although these were originally introduced to deal with “ill-posed” problems it is rare to find studies that evaluate the ill-posedness of MRI image classification problems. In addition, to avoid the effects of the “curse of dimensionality” very often dimension reduction is applied to the data.

Baseline structural MRI data from cognitively normal and Alzheimer's disease (AD) patients from the AD Neuroimaging Initiative database were used in this study. We evaluated here the ill-posedness of this classification problem across different dimensions and sample sizes and its relationship to the performance of regularized logistic regression (RLR), linear support vector machine (SVM) and linear regression classifier (LRC). In addition, these methods were compared with their principal components space counterparts.

In voxel space the prediction performance of all methods increased as sample sizes increased. They were not only relatively robust to the increase of dimension, but they often showed improvements in accuracy. We linked this behavior to improvements in conditioning of the linear kernels matrices. In general the RLR and SVM performed similarly. Surprisingly, the LRC was often very competitive when the linear kernel matrices were best conditioned. Finally, when comparing these methods in voxel and principal component spaces, we did not find large differences in prediction performance.

We analyzed the problem of classifying AD MRI images from the perspective of linear ill-posed problems. We demonstrate empirically the impact of the linear kernel matrix conditioning on different classifiers' performance. This dependence is characterized across sample sizes and dimensions. In this context we also show that increased dimensionality does not necessarily degrade performance of machine learning methods. In general, this depends on the nature of the problem and the type of machine learning method.

In the past, it has been argued that when classifying brain structural MRI (sMRI) images, the performance of machine learning techniques is greatly affected by the curse of dimensionality (CoD). The term CoD was introduced by Richard Bellman while working on optimization problems

Recent work provides evidence that dimension reduction for classifying brain MRI images may not be necessary to achieve a high level of prediction performance in some situations. Reports in the fMRI literature

We have recently proposed solving similar structural MRI classification problems associated with AD following a different path, which is based on large-scale regularization. We use regularized logistic regression (RLR) based on a coordinate-wise descent technique as implemented in the GLMNET library _{1} penalty to the elastic net case

Most of the methods described above that are capable of handling sMRI or fMRI high dimensional data are related, in one way or another to regularization theory. Regularization techniques have been widely adopted by machine learning researchers because of their capacity to deal with problems with small sample sizes and large numbers of variables, situations for which traditional statistical techniques are not well suited

Although regularization methods are often applied to deal with the sMRI image classification problems, it is rare to find in the literature an analysis of this problem ill-posedness, even though regularization techniques were first created to deal with the ill-posedness of practical problems. The present study provides a view of the classification of MRI images from the perspective of linear ill-posed problems. For these the degree of ill-posedness is linked to the conditioning and structure of singular values of the linear kernel matrices

Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and non-profit organizations, as a $60 million, 5-year public private partnership. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD). Determination of sensitive and specific markers of very early AD progression is intended to aid researchers and clinicians to develop new treatments and monitor their effectiveness, as well as lessen the time and cost of clinical trials. The Principal Investigator of this initiative is Michael W. Weiner, MD, VA Medical Center and University of California – San Francisco. ADNI is the result of efforts of many coinvestigators from a broad range of academic institutions and private corporations, and subjects have been recruited from over 50 sites across the U.S. and Canada. The initial goal of ADNI was to recruit 800 adults, ages 55 to 90, to participate in the research. These would include approximately 200 cognitively normal older individuals to be followed for 3 years, 400 people with MCI to be followed for 3 years and 200 people with early AD to be followed for 2 years. For up-to-date information, see

We used ADNI subject data collected from 50 clinic sites that had their own individual IRB approval. Study subjects gave written informed consent at the time of enrollment for data collection, storage and research and completed questionnaires approved by each participating sites' Institutional Review Board (IRB). The data were anonymized before being shared. We used baseline structural MRI data from 727 subjects. Of these, 205 were cognitively normal controls (CN), 171 had AD, and 351 had mild cognitive impairment (MCI) at baseline

Cognitive Status | Number | Mean Age (std) | Sex (M/F) | Mean MMSE (std) |

CN | 205 | 76.1 (5.0) | 112/93 | 29.1 (1.0) |

MCI | 351 | 75.1 (7.3) | 228/123 | 27.1 (1.8) |

AD | 171 | 75.5 (7.7) | 95/76 | 23.4 (2.0) |

We used baseline 1.5 T T1-weighted MRI data as described in the ADNI acquisition protocol

We evaluated the performance of logistic regression with elastic net regularization as implemented in the GLMNET library ^{th} sample or feature vector containing the GM voxels entering the analysis, ^{th} label (0 for cognitively normal participants, 1 for participants with Alzheimer's disease), _{1} and L_{2} penalties. The regularization parameter

This is the most common classification technique used to analyze sMRI data

Kloppel and colleagues used hard-margin (HM) and soft margin (SM) linear SVM to classify structural MRI images

SVM based on the representer theorem and the properties of the reproducing kernel Hilbert spaces

The linear regression classifier is estimated by solving the least squares problem:

Principal component analysis (PCA) is a powerful tool for dimension reduction that is often used in biomedical applications to deal with high-dimensional problems. After standardization of the data, we used the princomp MATLAB function to project the GM imaging data (matrix X) corresponding to CN and AD ADNI participants into principal component space. The performance of LRC, RLR and SVM are evaluated. To estimate the SVM we did not follow the kernel implementation described above, instead we used the LIBLINEAR library (version 1.8). It provides a fast implementation of the SVM with L_{2} regularization based on coordinate descent techniques

To evaluate classifier performance, in all cases we reserved a fixed dataset of 100 randomly selected participants (50 CN and 50 AD) for testing. Training datasets were randomly generated based on the remaining CN and AD participants. To estimate the optimal values of the regularization parameters both in voxel and PC spaces, we combined 10-fold cross-validations (CV) and grid search. For RLR we set the parameter

We computed overall classification accuracy to evaluate classifier performance as:

In the context of discrete problems ill-posedness often translates into ill-conditioning of the linear kernel matrix. A common approach for evaluating the degree of ill-posedness for these problems is the study of linear kernel matrix conditioning via the use of the singular value decomposition(SVD)

The evaluation was based on the GM images of the CN and AD participants that were vectorized and used as samples to estimate the different classification models. For each sample size and dimension, computations were repeated 100 times for different samples of CN and AD participants selected at random. The imaging data were always normalized by subtracting the mean and dividing by the standard deviation in a voxel-wise fashion.

We studied the impact of dimension and training dataset sample size in the voxel space on RLR, LRC and SM-SVM in terms of prediction performance (overall classification accuracy) when discriminating between GM images of AD and CN ADNI participants. The sample size was varied from 20 to 210 (20, 30, 40, 50, 60, 90, 120, 150, 180, 210) with balanced numbers of CN and AD participants. We varied the dimensions, given by the number of voxels included in the analysis, from

To study the conditioning of the linear kernel associated with classification of sMRI images using our ADNI dataset we estimated the singular values, rank and the condition numbers of all linear kernel matrices (K) across all samples sizes, dimensions and iterations tested in Experiment # 1.

We compared across 100 iterations the prediction performance of RLR, LRC, and SVM both in voxel and PC spaces for a very large dimension (D = 750 K voxels) and a large but fixed sample size (210). To select the optimal number of components for each iteration, we estimated classification accuracy for different sets of components generated by adding one component at a time, starting from the first component up to the total number of components. The maximum value of accuracy obtained across different sets of components was taken as the final result for a given iteration.

The results of the first experiment describing the impact of dimension and sample size in the voxel space on the RLR and SM-SVM performances are presented in

The dependence of classification accuracy on sample size is depicted for all sample sizes, dimensions and methods we tested. Each panel represents the results of the three methods for a different dimension (50 K, 150 K, 310 K and 750 K). Each curve represents the average performance of a specific method across 100 iterations; bars depict one standard deviation.

More detailed information is provided for four sample sizes (40, 60, 180 and 210) using box plots. Each panel shows the behavior of the three methods across the selected sample sizes for a fixed dimension. For all three methods classification accuracy increased as the sample size increased. The performance of the two regularized methods (RLR and SVM) was in general very similar across all situations. Surprisingly, LRC was often very competitive, although it clearly performed worse for larger samples and lower dimensions.

An alternative view of the results in

Most of the results of the second experiment are shown in

For fixed sample size, improvements of the linear kernels matrices conditioning were observed as the dimension increased. The effect is more apparent for large sample sizes, when the difference in kernel's conditioning across dimensions was greatest. The worse conditioned kernel matrices were observed for larger sample sizes and the smallest dimension (50 K).

This figure highlights that for a fixed dimension increases in sample sizes led to poorer conditioning of the kernels matrices (especially when the dimension was small) while at the same time as we observed in previous figures, it led to increases in classification accuracy.

The structure of singular values across selected sample sizes (40, 60, 180 and 210) and iterations is shown for two dimensions (50 K and 750 K). The median values of the singular values for the 100 iterations are plotted in logarithmic scale. The additional information brought by the increase of sample size was reflected by patterns of greater singular values when sample sizes were large, with the exception of the singular values located towards the “tails”, which caused poorer kernel matrices conditioning especially for the smaller dimension (50 K).

In general, the ill-conditioning effects we discuss are very mild since all the kernel matrices were full rank and the reciprocal of their condition numbers were very far from the machine's precision. However, effects due to differences in conditioning did exist, and they influenced the performance of these methods as we have shown. In

The main results of the Experiment # 3 are presented in

The voxel space performances of the three methods are compared with their PC counterparts. We can see that, in general, relatively little or no gains were achieved by the three methods in the PC space. While LRC and the SVM showed slight improvements in performance, RLR showed slight decreases in performance.

This study provides a better understanding of the impact of dimension on linear classification methods that directly operate on high-dimensional neuroimaging spaces defined by voxels. It provides further evidence to dispel the common belief that due to the Curse of Dimensionality (CoD), feature selection is always necessary to achieve good performances in high-dimensional problem. Our research produced strong evidence about the robustness of several linear classifiers to increased dimensionality (up to 750 K voxels) in the context of sMRI data classification.

Some researchers have pointed out that the effect of the CoD on a machine learning algorithm depends on how it deals with sample neighborhoods

It is striking how competitive the LRC performed (especially for the higher dimension studied here) with the much more sophisticated regularized methods, such as RLR and SVM. These results are possibly related to a phenomenon characteristic of high dimensional spaces called “data piling”, recently discovered and studied _{2} regularized logistic regression and linear SVM was reported, with much smaller dimensions, fixed sample size and no optimization of regularization parameters.

There are several limitations in our study. We have evaluated the impact of dimension in a specific manner: by changing the threshold of the DARTEL study customized GM template, we generated problems of different number of variables. Although other ways of evaluating the effect of dimension could be devised, the approach chosen here naturally appears in the problem. In a GM voxel space analysis, we would first introduce the voxels with higher probability of being located in the GM tissue since it has been reported by several groups

Finally, we are not arguing that feature selection is not necessary or useful. Nor have we claimed that the methods studied here are the best solution to the problem of sMRI brain images classification. Our point is that, in some problems as the one we studied the commonly assumed devastating effect of dimension on classifiers performance is not necessarily present. The huge differences in dimensions between the voxel and PC spaces were not reflected in large differences in performance of these three methods. There are different factors that can influence the impact of dimension on a machine learning algorithm as for example the ill-posedness of the problem defined by the nature of the data. Therefore, the usefulness of feature selection in general will depend on the nature of the specific problem, employed machine learning algorithm, signal to noise ratio (e.g., sample size, etc) of the data, and other factors.

The performances of a hard margin (HM)-SVM and SM-SVM were compared across sample sizes and two dimensions 50 K and 750 K using a similar setup as in Experiment # 1. The HM-SVM shows, in general, a similar behavior to its regularized counterpart, although in situations of worse conditioning of the kernel matrices in this study (50 K and larger samples sizes) it underperforms in a similar fashion as the LRC does. The HM-SVM was implemented by setting the parameter C to 10^{6}.

(TIF)

Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete list of ADNI investigators can be found at:

Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Abbott; Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Amorfix Life Sciences Ltd.; AstraZeneca; Bayer HealthCare; BioClinica, Inc.; Biogen Idec Inc.; Bristol-Myers Squibb Company; Eisai Inc.; Elan Pharmaceuticals Inc.; Eli Lilly and Company; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; GE Healthcare; Innogenetics, N.V.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Servier; Synarc Inc.; and Takeda Pharmaceutical Company. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (

We wanted also to thank the anonymous reviewers for their thoughtful comments that helped to improve this work.