Classification of early-MCI patients from healthy controls using evolutionary optimization of graph measures of resting-state fMRI, for the Alzheimer’s disease neuroimaging initiative

Identifying individuals with early mild cognitive impairment (EMCI) can be an effective strategy for early diagnosis and delay the progression of Alzheimer’s disease (AD). Many approaches have been devised to discriminate those with EMCI from healthy control (HC) individuals. Selection of the most effective parameters has been one of the challenging aspects of these approaches. In this study we suggest an optimization method based on five evolutionary algorithms that can be used in optimization of neuroimaging data with a large number of parameters. Resting-state functional magnetic resonance imaging (rs-fMRI) measures, which measure functional connectivity, have been shown to be useful in prediction of cognitive decline. Analysis of functional connectivity data using graph measures is a common practice that results in a great number of parameters. Using graph measures we calculated 1155 parameters from the functional connectivity data of HC (n = 72) and EMCI (n = 68) extracted from the publicly available database of the Alzheimer’s disease neuroimaging initiative database (ADNI). These parameters were fed into the evolutionary algorithms to select a subset of parameters for classification of the data into two categories of EMCI and HC using a two-layer artificial neural network. All algorithms achieved classification accuracy of 94.55%, which is extremely high considering single-modality input and low number of data participants. These results highlight potential application of rs-fMRI and efficiency of such optimization methods in classification of images into HC and EMCI. This is of particular importance considering that MRI images of EMCI individuals cannot be easily identified by experts.

Identifying individuals with early mild cognitive impairment (EMCI) can be an effective strategy for early diagnosis and delay the progression of Alzheimer's disease (AD). Many approaches have been devised to discriminate those with EMCI from healthy control (HC) individuals. Selection of the most effective parameters has been one of the challenging aspects of these approaches. In this study we suggest an optimization method based on five evolutionary algorithms that can be used in optimization of neuroimaging data with a large number of parameters. Resting-state functional magnetic resonance imaging (rs-fMRI) measures, which measure functional connectivity, have been shown to be useful in prediction of cognitive decline. Analysis of functional connectivity data using graph measures is a common practice that results in a great number of parameters. Using graph measures we calculated 1155 parameters from the functional connectivity data of HC (n = 72) and EMCI (n = 68) extracted from the publicly available database of the Alzheimer's disease neuroimaging initiative database (ADNI). These parameters were fed into the evolutionary algorithms to select a subset of parameters for classification of the data into two categories of EMCI and HC using a two-layer artificial neural network. All algorithms achieved classification accuracy of 94.55%, which is extremely high considering single-modality input and low number of data participants. These results highlight potential application of rs-fMRI and efficiency of such optimization methods in classification of images into HC and EMCI. This is of particular importance considering that MRI images of EMCI individuals cannot be easily identified by experts.

Introduction
Alzheimer's disease (AD) is the most common type of dementia, with around 50 million patients worldwide [1,2]. AD is usually preceded by a period of mild cognitive impairment (MCI) [3,4]. Identifying the subjects with MCI could be an effective strategy for early diagnosis and delay the progression of AD towards irreversible brain damage [5][6][7]. While researchers were fairly successful in diagnosis of AD, researchers were significantly less successful in diagnosis of MCI [8][9][10][11]. In particular, detection of early stages of MCI (EMCI) has been proven to be very challenging [12][13][14]. Therefore, in this study we propose a novel method based on evolutionary algorithms to select a subset of graph features calculated from functional connectivity data to discriminate between healthy participants (HC) and EMCI.
It has been shown that the brain goes through many functionally, structurally and physiologically changes prior to any obvious behavioral symptoms in AD [15][16][17]. Therefore, many approaches have been devised based on biomarkers to distinguish between HC, and different stages of MCI, and AD [18][19][20]. For example, parcellation of structural magnetic resonance imaging (MRI) data has been used in many studies as brain structure changes greatly in AD [21][22][23][24]. Further, in two studies, we showed that T1-weighted MRI (structural MRI; sMRI) can be used in classification of AD and MCI. Indeed, the majority of early studies looking at classification of AD and HC was done on sMRI [22]. This is mostly due to costs and accessibility of sMRI data [23].
While structural neuroimaging has shown some success in early detection of AD, functional neuroimaging has proven to be a stronger candidate [25][26][27]. Functional MRI (fMRI) allows for the examination of brain functioning while a patient is performing a cognitive task. This technique is especially well suited to identifying changes in brain functioning before significant impairments can be detected on standard neuropsychological tests, and as such is sensitive to early identification of the disease processes [28,29]. While fMRI requires participants to perform a task, resting-state fMRI (rs-fMRI) is capable of measuring the spontaneous fluctuations of brain activity without any task, hence it is less sensitive to individual cognitive abilities [30][31][32].
One important feature of rs-fMRI is the ability to measure functional connectivity changes [33,34], which has been shown to be a prevalent change in AD [35][36][37][38]. Furthermore, it is shown that the increased severity of cognitive impairment is associated with increasing alteration in connectivity patterns, suggesting that disruptions in functional connectivity may contribute to cognitive dysfunction and may represent a potential biomarker of impaired cognitive ability in MCI. In particular, research has highlighted that longitudinal alterations of functional connectivity are more profound in earlier stages as opposed to later stages of the disease [39]. Therefore, analysis of functional connectivity can provide an excellent opportunity in identification of early states of AD.
As functional connectively analysis inherently relies of networks of activity, researchers have used graph theory measures to investigate the global, as well as local, characteristics of different brain areas [40-43] Click or tap here to enter text. This method has been used successfully in a wide range of application in both healthy participants and patients [ Graph theory analysis of rs-fMRI data, however, leads to a large number of parameters. Therefore, to reduce computational complexity, it is essential to select an optimal subset of features that can lead to high discrimination accuracy [60,61]. Feature selection is particularly complicated due to the non-linear nature of classification methods: more parameters do not necessarily lead to better performance, and there is also a dependency of parameters [62,63]. Therefore, it is extremely important to utilize a suitable optimization method that can deal with nonlinear high-dimensional search spaces.
Evolutionary algorithms (EA) are biologically-inspired algorithms that are extremely effective in optimization algorithms with large search spaces [64][65][66]. These methods, in contrast with many other search methods such as complete search, greedy search, heuristic search and random search [67,68], do not suffer from stagnation in local optima and/or high computational cost [69,70]. Feature selection has been used to improve the quality of the feature set in many machine learning tasks, such as classification, clustering and time-series prediction [71]. Classification and time-series prediction are particularly relevant to many neurodegenerative diseases: classification can be used in identification of those with brain damage [72,73] and time-series prediction can be used in estimation of disease progression [74,75].
EA has been used in characterization and diagnosis of AD [76][77][78]. Such methods have achieved reasonably high accuracy in classification of AD and HC (70-95%). They, however, have been unsuccessful in classification of the MCI patients [79]. Therefore, in this study, we devised a method that achieves higher accuracy in the classification of HC and EMCI participants compared to the past-published research. We used MRI and rs-fMRI data of a group of healthy participants and those with EMCI. We applied graph theory to extract a collection of 1155 parameters. This data is then given to five different EA methods to select an optimum subset of parameters. These selected parameters are subsequently given to an artificial neural network to classify the data into two groups of HC and EMCI. We aimed at identifying the most suitable method of optimization based on accuracy and training time, as well as identifying the most informative parameters.

Participants
Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD).
Data for 140 participants were extracted from the ADNI [80][81][82]. See Table 1 for the details of the data. EMCI participants had no other neurodegenerative diseases except MCI. The EMCI participants were recruited with memory function approximately 1.0 SD below expected education adjusted norms [83]. HC subjects had no history of cognitive impairment, head injury, major psychiatric disease, or stroke. application of denoising methods for reduction of physiological effects and other sources of noise. The processing stage consisted of extraction of functional connectivity and graph theory measures. In this stage, through two pipelines, a collection of 1155 parameters are extracted (see below) [84,85]. These parameters are then given to one of the dimension reduction methods (five EA and one statistical method) to select a subset of features. The selected features are finally given to an artificial neural network to classify the data into two categories of HC and EMCI. The classification method was performed via a 90/10 split; 90% of the data was used for the training and 10% of the data was used for validation. See Fig 1 for the summary of the procedure of the method.

Data acquisition and processing
Brain structural sMRI data with 256×256×170 voxels and 1×1×1 mm 3 voxel size were extracted for all subjects. MRI data preprocessing steps consisted of non-uniformity correction, segmentation into grey matter, white matter and cerebrospinal fluid (CSF) and spatial normalization to MNI space.  82 . Preprocessing, parcellation of brain area (132 regions based on AAL and Harvard-Oxford atlas) and extraction of the functional connectivity (8 network parameters with a total of 32 nodes), as well as the 7 graph parameters are done using CONN toolbox 84 . Subsequently the global network is calculated based on the network parameters. The 1155 ([132 brain regions + 32 nodes of brain networks + 1 global network] × 7 graph parameters) extracted parameters are given to one of the optimization methods to select the best subset of parameters that lead to best classification method. Optimization methods consisted of five evolutionary algorithms (boxes with grey shading) and one statistical algorithm. The outputs of these methods are given to an artificial neural network (ANN) with two hidden layers to classify the data into HC and EMCI. AAL: Automated anatomical atlas; GA: Genetic algorithm; NSGA-II: Nondominated sorting genetic algorithm II; ACO: Ant colony optimization; SA: Simulated annealing; PSO: Particle swarm optimization; seven graph features: Degree centrality, betweenness centrality, path length, clustering coefficient, local efficiency, cost and global efficiency.
CONN toolbox [84,85] is used to process the sMRI and rs-fMRI data. The output of this toolbox is 1155 values consisting of: (a) 132 distinct brain areas according to Automated Anatomical Labeling (AAL) and Harvard-Oxford atlases, (b) eight brain networks containing 32 nodes and (c) a global network parameter that is the average of seven graph parameters [86][87][88]. All these values are multiplied by seven graph parameters, see below. See supplementary data for details of these parameters. The sMRI images were used to register the functional images and improve the analysis of the rs-fMRI data.

Functional connectivity
Functional connectivity, also called "resting state" connectivity, is a measure for the temporal correlations among the blood-oxygen-level-dependent (BOLD) signal fluctuations in different brain areas [89][90][91]. The functional connectivity matrix is the correlation, covariance, or the mutual information between the fMRI time series of every two brain regions, which is stored in an n×n matrix for each participant, where n is the number of brain regions obtained by atlas parcellation [91]. To extract functional connectivity between different brain areas we used Pearson correlation coefficients formula as following [84,92]: where S is the BOLD time series at each voxel (for simplicity all-time series are considered central to zero means), R is the average BOLD time series within an ROI, r is the spatial map of Pearson correlation coefficients, and Z is the seed-based correlations (SBC) map of Fishertransformed correlation coefficients for this ROI [93].

Graph parameters
We used the graph theory technique to study topological features of functional connectivity graphs across multiple regions of the brain [86][87][88][89][90][91][92][93][94]. Graph nodes represented brain regions and edges represented interregional resting-state functional connectivity. The graph measurements in all of the ROIs are defended using nodes = ROIs, and edges = suprathreshold connections. For each subject, graph adjacency matrix A is computed by thresholding the associated ROI to-ROI Correlation (RRC) matrix r by an absolute (e.g., z>0.5) or relative (e.g., highest 10%) threshold. Then, from the resulting graphs, some measurements can be computed addressing topological properties of each ROI within the graph as well as of the entire network of ROIs. The adjacency matrix is employed for estimating common features of graphs including (1) degree centrality (the number of edges that connect a node to the rest of the network) (2) betweenness centrality (the proportion of shortest paths between all node pairs in the network that pass through a given index node), (3) average path length (the average distance from each node to any other node), (4) clustering coefficient (the proportion of ROIs that have connectivity with a particular ROI that also have connectivity with each other), (5) cost (the ratio of the existing number of edges to the number of all possible edges in the network), (6) local efficiency (the network ability in transmitting information at the local level), (7) global efficiency (the average inverse shortest path length in the network; this parameter is inversely related to the path length) [95].

Dimension reduction methods
We used five EA to select the most efficient set number of features. These algorithms are as follows: Genetic algorithm (GA): GA is one of the most advanced algorithms for feature selection [96]. This algorithm is based on the mechanics of natural genetics and biological evolution for finding the optimum solution. It consists of five steps: selection of initial population, evaluation of fitness function, pseudo-random selection, crossover, and mutation [97]. For further information refer to supplementary Methods section. Single point, double point, and uniform crossover methods are used to generate new individuals. In this study we used 0.3 and 0.1 as mutation percentage and mutation rate, respectively; 20 members per population, crossover percentage was 14 with 8 as selection pressure [74,98].
Nondominated sorting genetic algorithm II (NSGA-II): NSGA is a method to solve multiobjective optimization problems to capture a number of solutions simultaneously [99]. All the operators in GA are also used here. NSGA-II uses binary features to fill a mating poll. Nondomination and crowding distance are used to sort the new members. For further information refer to supplementary Methods section. In this study the mutation percentage and mutation rate were set to 0.4 and 0.1, respectively; population size was 25, and crossover percentage was 14.
Ant colony optimization algorithm (ACO): ACO is a metaheuristic optimization method based on the behavior of ants [100]. This algorithm consists of four steps: initialization, creation of ant solutions (a set of ants build a solution to the problem being solved using pheromones values and other information), local search (improvement of the created solution by ants), and global pheromone update (update in pheromone variables based on search action followed by ants) [101]. ACO requires a problem to be described as a graph: nodes represent features and edges indicate which features should be selected for the next generation. In features selection, the ACO tries to find the best solutions using prior information from previous iterations. The search for the optimal feature subset consists of an ant traveling through the graph with a minimum number of nodes required for satisfaction of stopping criterion [102]. For further information refer to supplementary Methods section. We used 10, 0.05, 1, 1 and 1 for the number of ants, evaporation rate, initial weight, exponential weight, and heuristic weight, respectively.
Simulated annealing (SA): SA is a stochastic search algorithm, which is particularly useful in large-scale linear regression models [103]. In this algorithm, the new feature subset is selected entirely at random based on the current state. After an adequate number of iterations, a dataset can be created to quantify the difference in performance with and without each predictor [104,105]. For further information refer to supplementary Methods section. We set initial temperature and temperature reduction rate with 10 and 0.99, respectively.
Particle swarm optimization (PSO): PSO is a stochastic optimization method based on the behavior of swarming animals such as birds and fish. Each member finds optimal regions of the search space by coordinating with other members in the population. In this method, each possible solution is represented as a particle with a certain position and velocity moving through the search space [106][107][108]. Particles move based on cognitive parameter (defining the degree of acceleration towards the particle's individual local best position, and global parameter (defining the acceleration towards the global best position). The overall rate of change is defined by an inertia parameter. For further information refer to supplementary Methods section. In this paper simulation we use 20 as the warm size, cognitive and social parameters were set to 1.5 and inertia as 0.72.
Statistical approach: To create a baseline to compare dimension reduction methods based on evolutionary algorithms, we also used the statistical approach to select the features based on the statistical difference between the two groups. We compared the 1155 parameters using two independent-sample t-test analyses. Subsequently we selected the parameters based on their sorted p values.

Classification method
For classification of EMCI and HC we used a multi-layer perceptron artificial neural network (ANN) with two fully-connected hidden layers with 10 nodes each. Classification method was performed via a 10-fold cross-validation. We used Levenberg-Marquardt Back propagation (LMBP) algorithm for training [109][110][111] and mean square error as a measure of performance. The LMBP has three steps: (1) propagate the input forward through the network; (2) propagate the sensitivities backward through the network from the last layer to the first layer; and finally (3) update the weights and biases using Newton's computational method [109]. In the LMBP algorithm the performance index F(x) is formulated as: where e is vector of network error, and x is the vector matrix of network weights and biases. The network weights are updated using the Hessian matrix and its gradient: Where J represent Jacobian matrix. The Hessian matrix H and its gradient G are calculated using: where the Jacobian matrix is calculated by: where a m−1 is the output of the (m−1)th layer of the network, and S m is the sensitivity of F(x) to changes in the network input element in the mth layer and is calculated by: where w m+1 represents the neuron weight at (m+1)th layer, and n is the network input [109].

Results
The preprocessing and processing of the data was successful. We extracted 1155 graph parameters per participant (see S1-S11 Figs). This data was used for the data optimization step. Using the five EA optimization methods and the statistical method, we investigated the performance of the classification for different numbers of subset of parameters. Fig 2 shows the performance of these methods for 100 subsets of parameters with 1 to 100 parameters. These plots are created based on 200 repetitions of the EA algorithms. To investigate the performance of the algorithms with more repetitions, we ran the same algorithms with 500 repetitions. These simulations showed no major improvement of increased repetition (maximum 0.84% improvement; see S11 Fig).
A threshold of 90% was chosen as the desired performance accuracy. Statistical modeling performance constantly less than this threshold. The five EA methods achieved this performance with varying number of parameters. Fig 3 shows the accuracy percentage and the optimization speed of the five EA methods.
There is growing body of literature showing gender differences. It has been shown that women are more likely to suffer from AD. Therefore, to investigate whether our analysis method performs better on a particular gender or not, we split the data into two groups of female and male participants. Our analysis showed that there is no meaningful difference between the two groups (see S2 Table).
To investigate whether increasing number of parameters would increase performance, we performed similar simulations with maximum 500 parameters in each subset. This analysis showed that the performance of the optimization methods plateaus without significant increase from 100 parameters (Fig 4). This figure shows that performance of the optimization methods was between 92.55-93.35% and 94.27-94.55% for filtered and absolute accuracy, respectively. These accuracy percentages are significantly higher than 81.97% and 87.72% for filtered and absolute accuracy in the statistical classification condition.
To investigate the contribution of different parameters in the optimization of classification we looked at the distribution of parameters in the 100 subsets calculated above (Fig 5). GA and   To identify the parameters that are most predominantly involved in classification of HC and EMCI, we extracted the list of the five most indicative brain regions and networks, Table 2. These are selected based on the total number of times that they appeared in the 100 simulations using the five EA's.

Discussions
Using CONN toolbox, we extracted 1155 graph parameters from rs-fMRI data. The optimization methods showed superior performance over statistical analysis (average 20.93% superiority). The performance of the EA algorithms did not differ greatly (range 92.55-93.35% and 94.27-94.55% for filtered and absolute accuracy, respectively) with PSO performing the best (mean 0.96% superior performance) and SA performing the worst (mean 1.07% inferior performance), (Fig 2). The minimum number of required parameters to guarantee at least 90% accuracy differed quite greatly across the methods (PSO and SA requiring 7 and 49 parameters, respectively). The processing time to achieve at least 90% accuracy also differed across the EA methods (SA and NSGA2 taking 5.1s and 22.4s per optimization) (Fig 3). Increased number of parameters per subset did not increase the performance accuracy of the methods greatly (Fig 4).
Classification of data into AD and HC has been investigated extensively. Many methods have been developed using different modalities of biomarkers. Some of these studies achieved accuracies greater than 90% [112]. Classification of earlier stages of AD, however, has been more challenging; only a handful of studies have achieved accuracy higher than 90%, Table 3. The majority of these studies implemented convolutional and deep neural networks that require extended training and testing durations with many input data.  [114] (see also [115][116][117][118]). Our method achieved an accuracy of 94.55%. To the best of our knowledge, between all

PLOS ONE
Classification of early-MCI patients using rs-fMRI the studies published to date, this accuracy level is the second highest accuracy after Wang et al (2019) [114].
Research has shown that having a combination of information from different modalities supports higher classification accuracies. For example,  showed that a combination of PET, MRI and neuropsychological test scores (NTS) can improve performance by more than 20% as compared to only PET or MRI [118]. In another study, Kang et al. (2020) showed that a combination of diffusion tensor imaging (DTI) and MRI can improve accuracy by more than 20% as compared to DTI and MRI alone [147]. Our analysis, while achieving superior accuracy compared to a majority of the prior methods, was based on one biomarker of MRI, which has a lower computational complexity than multi-modality data.
Interpretability of the selected features is one advantage of the application of evolutionary algorithms as the basis of the optimization algorithm. This is in contrast with algorithms based on CNN or deep neural networks (DNN) that are mostly considered as black boxes [149]. Although research has shown some progress in better understanding the link between the features used by the system and the prediction itself in CNN and DNN, such methods remain difficult to verify [150,151]. This has reduced trust in the internal functionality and reliability of such systems in clinical settings [152]. Our suggested method clearly selects features based on Table 2 Table 3.

Summary of the studies aiming at categorization of healthy (HC) and mild cognitive impairment (MCI) using different biomarkers and classification methods.
Only best performance of each study is reported for each group of participants and classification method. Further details of the following studies are in S1 activity of distinct brain areas, which are easy to interpret and understand . This can inform future research by bringing the focus to brain areas and the link between brain areas that are more affected by mild cognitive impairment. Our analysis showed that dorsal attention network is altered in EMCI, confirming past literature [154][155][156]. Dorsal attention network in addition to the ventral attention network form the human attention system [157]. The dorsal attention network employs dorsal fronto-

PLOS ONE
Classification of early-MCI patients using rs-fMRI parietal areas, including intraparietal sulcus (IPS) and frontal eye fields (FEF). It is involved in mediation of goal-directed process and selection for stimuli and response. Specifically, our data highlighted the role of the FEF in the dorsal attention network. This is in line with past literature showing the role of FEF in cognitive decline [158]. Our data also revealed the importance of superior frontal gyrus (SFG) in cognitive decline [159,160]. SFG is thought to contribute to higher cognitive functions and particularly to working memory (WM) [161]. Additionally, SFG interconnects multiple brain areas that are involved in a diverse range of cognitive tasks such as cognitive control and motor behavior [162].
In terms of graph parameters, our results showed importance of local efficiency, betweenness centrality and degree centrality in classification of EMCI and HC. Local efficiency is a parameter for the transformation of information in a part of the network. This parameter indicates the efficiency between two nodes and represents the efficiency in exchange of information through a network edge [87,163]. Reduction of this parameter has been linked with cognitive decline in past literature [164]. Betweenness centrality for any given node (vertex) measures the number of shortest paths between pairs of other nodes that pass through this node, reflecting how efficiently the network exchanges the information at the global level. Betweenness centrality is high for nodes that are located on many short paths in the network and low for nodes that do not participate in many short paths [164]. Finally, degree centrality reflects the number of instantaneous functional connections between a region and the rest of the brain within the entire connectivity matrix of the brain. It can assess how much a node influences the entire brain and integrates information across functionally segregated brain regions [165] (see also [166]). Our data showed that changes in these parameters can effectively contribute in classification of early-MCI patients from healthy controls.
We implemented five of the most common evolutionary algorithms. They showed similar overall optimization performance ranging between 92.55-93.35% and 94.27-94.55% for filtered and absolute accuracy, respectively. They, however, differed in optimization curve, optimization time and diversity of the selected features. PSO could guarantee a 90% accuracy with only 7 features. SA on the other hand required 49 features to guarantee a 90% accuracy. Although SA required more features to guarantee a 90% accuracy, it was the fastest optimization algorithm with only 5.1s for 49 features. NSGA-II on the other hand, required 22.4s to guarantee a 90% accuracy. These show the diversity of the algorithms and their suitability in different applications requiring highest accuracy, least number of features or fastest optimization time [71,76,167].
One distinct characteristic of GA and NSGA-II was the more focused search amongst features as compared to the other methods. GA and NSGA-II selected 222 and 224 distinct features in the first 100 parameter sets, respectively, while the other methods covered almost the whole collection of features, covering more than 97.6%. Notably GA and NSGA-II showed "curse of dimensionality" (also known as "peaking phenomenon") with optimal number of features around 50 parameters [168][169][170][171]. Therefore, perhaps the features selected by GA and NSGA-II are more indicative of distinct characteristics of the differences between HC and EMCI.
Our analysis was conducted on a sample of 140 patients. This number of datapoint in the context of classification using ANN and CNN is relatively small. For instance, Wang et al (2019) 114 used 593 samples. Having additional samples can provide more reliable results. Therefore, future research should aim to explore a larger dataset.
In this study, we proposed a method for classification of the EMCI and HC groups using graph theory. These results highlight the potential application of graph analysis of functional connectivity and efficiency of evolutionary algorithm in combination with a simple perceptron ANN in the classification of images into HC and EMCI. We proposed a fully automatic procedure for predication of early stages of AD using rs-fMRI data features. This is of particular importance considering that MRI images of EMCI individuals cannot be easily identified by experts. Further development of such methods can prove to be a powerful tool in the early diagnosis of AD.  63. Chu C., Hsu A. L., Chou K. H., Bandettini P. & Lin C. P. Does feature selection improve classification accuracy? Impact of sample size and feature selection on classification using anatomical magnetic