Hybrid multivariate pattern analysis combined with extreme learning machine for Alzheimer’s dementia diagnosis using multi-measure rs-fMRI spatial patterns

Background Early diagnosis of Alzheimer’s disease (AD) and Mild Cognitive Impairment (MCI) is essential for timely treatment. Machine learning and multivariate pattern analysis (MVPA) for the diagnosis of brain disorders are explicitly attracting attention in the neuroimaging community. In this paper, we propose a voxel-wise discriminative framework applied to multi-measure resting-state fMRI (rs-fMRI) that integrates hybrid MVPA and extreme learning machine (ELM) for the automated discrimination of AD and MCI from the cognitive normal (CN) state. Materials and methods We used two rs-fMRI cohorts: the public Alzheimer’s disease Neuroimaging Initiative database (ADNI2) and an in-house Alzheimer’s disease cohort from South Korea, both including individuals with AD, MCI, and normal controls. After extracting three-dimensional (3-D) patterns measuring regional coherence and functional connectivity during the resting state, we performed univariate statistical t-tests to generate a 3-D mask that retained only voxels showing significant changes. Given the initial univariate features, to enhance discriminative patterns, we implemented MVPA feature reduction using support vector machine-recursive feature elimination (SVM-RFE), and least absolute shrinkage and selection operator (LASSO), in combination with the univariate t-test. Classifications were performed by an ELM, and its efficiency was compared to linear and nonlinear (radial basis function) SVMs. Results The maximal accuracies achieved by the method in the ADNI2 cohort were 98.86% (p<0.001) and 98.57% (p<0.001) for AD and MCI vs. CN, respectively. In the in-house cohort, the same accuracies were 98.70% (p<0.001) and 94.16% (p<0.001). Conclusion From a clinical perspective, combining extreme learning machine and hybrid MVPA applied on concatenations of multiple rs-fMRI biomarkers can potentially assist the clinicians in AD and MCI diagnosis.


Introduction
Alzheimer's disease (AD) is the most common neurodegenerative disease and is the main cause of 60% to 70% of dementia cases in aging societies. It is characterized by cognitive decline and short-term memory loss [1,2]. Mild cognitive impairment (MCI) is referred to as the prodromal stage of AD, and subjects with MCI are at high risk of developing AD [3]. Because AD/MCI are neurodegenerative diseases and progressively attack memory cells, the development of early diagnostic tools is undoubtedly important.
In recent years, resting-state functional magnetic resonance imaging (rs-fMRI) was shown to be a powerful tool for analysing the spontaneous blood-oxygen-level-dependent (BOLD) contrasts to map neural activity associated with a variety of brain functions. In order to map the brain areas involved in a given cognitive function, the BOLD signal at the level of the individual voxel is analyzed [4]. Statistical analysis is then performed on all voxels to show regions whose BOLD signal shows significant effects. This approach is referred to as univariate t-test analysis, which is performed independently on each voxel, and has been used in neuroimaging research for decades [5,6]. However, this approach can only show differences between group averages, and is not sufficient to diagnose individual subjects. Therefore, recently, a machine learning (ML) technique known as multivariate pattern analysis (MVPA) has been promisingly applied to classify individual subjects using neuroimaging scans [7,8]. Multivariate methods such as support vector machine-recursive feature elimination (SVM-RFE) and least absolute shrinkage and selection operator (LASSO) investigate the mutual relationships between multiple voxels and spatial patterns. Thus, the combination of univariate t-test and multivariate MVPA approaches is expected to enhance the prediction performance as compared to each individual approach used alone.
Previous fMRI studies have indicated that the pathophysiology of AD/MCI can be associated with statistical changes, in the average sense, of regional spontaneous low-frequency (<0.08 Hz) BOLD fluctuation coherence measured in the resting state and analysed using univariate t-tests. The metrics used in these studies included regional homogeneity (ReHo) [9,10], amplitude of low-frequency fluctuation (ALFF) [11][12][13], and fractional ALFF (fALFF), as well as functional connectivity (FC) [14]. For example, He et. al., [10] showed that the posterior cingulate cortex (PCC) and the precuneus (PCu) have the largest ReHo differences between the AD and CN groups (p<0.05). The ALFF and fALFF studies using fMRI by Han et al., [15] revealed that MCI patients had decreased fALFF values in PCC/PCu and hippocampus, and increased fALFF values in several other regions, including occipital and temporal cortices. Rs-fMRI FC, investigated by Li et al. [16], showed that the regions with high FC were mostly located in the default mode network (DMN), and mainly involved the bilateral PCu and PCC [17]. These are all statistically significant findings at the group level. However, the discriminative ability based on the above-mentioned biomarkers related to AD/MCI diseases has not been evaluated. Since the discrimination task automatically classifies each subject into one of the studied groups (AD/MCI vs. CN), it is considered a much more complex task than the study of differences between groups [18,19].
In neuroimaging studies, preprocessed brain scans commonly contain hundreds of thousands of non-zero voxels which significantly outnumber the number of subjects (often less than 1000). Thus, selection of an adequate subset of relevant training features/voxels is of critical importance to obtain good generalization ability and reduce risks of overfitting problems and computational complexity. A growing trend today is the design of ML-based feature reduction techniques integrated with classification methods applied to neuroimaging data for the voxel-based automated discrimination of patients with brain disorders, including AD and MCI (see the reviews [18,19]). Many studies demonstrated the relevance of feature selection. Statistical hypothesis t-tests have broadly been used not only for groupdiscrimination detection but also for feature selections with success. The technique relies on an optimal threshold of significance (p-value) representing a subset of important features from whole-brain features. Though, applications of t-tests in feature selection are computational efficiency and easy to implement, this technique suffers from a significant drawback by not considering interactions between multiple features or spatial patterns which are the inherent multivariate nature of fMRI data. By contrast, MVPA methods do evaluate the relationships between multiple patterns. However, the primary drawback of whole-brain MVPA is its computationally demanding because of 3-D and high dimensionality of the data as well as the large number of images being analyzed [20][21][22]. Thus, to select the most informative features, a univariate feature selection strategy should be performed prior to MVPA in order to reduce the dimensionality sufficient for memory capacity, computational efficiency and ensure high sensitivity to fine-grained spatial discriminative patterns, while preserving the appealing properties of whole-brain fMRI analysis and multivariate nature of fMRI data [21,22]. Practically, many previous studies have employed hybrid combinations of filter-based t-test and MVPA techniques, i.e. wrapper-based SVM-RFE, to diagnoze the brain disorders using neuroimaging data, e.g., ADHD [23][24][25], MCI [26][27][28], Autism [29], AD [30,31], or for high-dimensional gene selections [22,32] with success (accuracies>90%).
In this study, we propose a ML-based AD/MCI diagnosis framework combining MVPA and extreme learning machines (ELM) applied to multi-measure rs-fMRI data. We first extracted maps of 3-D regional coherence (ReHo, ALFF, and fALFF) and of resting-state FC (rsFC) (degree centrality (DC), seed-based rsFC) of multiple individual subjects. We then performed statistical univariate two-sample t-tests on whole-brain 3-D maps between two pre-defined training groups, to generate an analysis mask that retained only an initial set of relevant features (voxels) showing significant changes in any one of the measures, i.e. ReHo, fALFF, rs-FC. Next, MVPA techniques such as the wrapper-based SVM-RFE proposed by Guyon [20] and embedded-based LASSO were implemented to optimize the discriminative performance. In this study we used ELM and competing methods, including linear and non-linear SVM classifiers, to distinguish AD/MCI patients from the CN controls. We hypothesized that a hybrid combination of univariate statistical t-test and MVPA approaches applied on concatenation of multiple functional biomarkers could boost the classification performance. Thus, the major contributions present in this study can be summarized as follows: • We propose a voxel-wise ML-based discriminative framework integrating ELM classifier and hybrid MVPA techniques for automated AD/MCI diagnosis using multi-measure rs-fMRI.
• The proposed framework extracts a maximum amount of information from multiple rs-fMRI biomarkers of a public Alzheimer's disease Neuroimaging Initiative (ADNI2) and an in-house AD cohort from South Korea and, therefore, achieves maximal classification accuracies as compared to all other previous studies.
• We demonstrate that, compared to conventional univariate statistical analysis t-test, the hybrid combination of multivariate methods (univariate t-test + SVM-RFE and univariate ttest + LASSO) increases the classification performance of the discriminative patterns.
• The effectiveness of the ELM classifier, superior to that of linear and radial basis function (RBF)-based SVM classifiers, when combined with hybrid feature selection methods for AD/ MCI identifications based on multi-biomarker rs-fMRI is addressed for the first time in this work.
• We showed that the highest classification accuracies are achieved when all patterns from multiple regional coherence and functional connectivity biomarkers are concatenated. This suggests that different brain regions suffer different functional losses due to AD/MCI. Hence, classification framework should include the maximum amount of informative changes to achieve best performance.
The remainder of this paper is organized as follows. Section 2 provides details on the datasets, subjects, preprocessing of rs-fMRI data, classification algorithms, univariate and MVPA feature reduction techniques, and permutation test used for the validation of the results. Section 3 presents the comparative results, while Section 4 is devoted to the discussion and conclusions of the article.

Materials and methods
We used two independent rs-fMRI datasets: the ADNI2 dataset, publicly available online and an in-house dataset whose subjects were recruited from the Chosun University Hospital in Gwangju, South Korea. Subjects ADNI2 cohort. We used a cohort of 33 (17 females) Alzheimer's disease (AD) subjects, 31 (14 females) early Mild Cognitive Impairment (MCI) and 31 (17 females) Cognitive Normal (CN) subjects from the ADNI2 database, which is publicly available on the web (www.adni. loni.usc.edu). The mean ages of AD, MCI, and CN are 73.59 ± 5.18, 74.52 ± 5.18, and 74.66 ± 5.56. General criteria for categorizing AD, MCI, and CN are well explained on the ADNI web site (http://adni.loni.ucla.edu). The subjects ranged in age from 56 to 89 years, and functional assessments of AD/MCI patients, such as Mini-Mental State Examination (MMSE) and Clinical Dementia Rating (CDR), were independently performed by the research institutions. The general criteria were as follows: the CN subjects had MMSE scores between 24 and 30, a CDR of 0, and were non-depressed, non-MCI, and non-demented. MCI patients had MMSE scores between 24 and 30, CDR scores between 0 and 1, no significant levels of impairment in other cognitive domains, essentially preserved daily living activities, and absence of dementia. The MMSE scores of AD patients were between 15 and 26, their CDR scores were 0.5 or 1, and they met the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer's disease and Related Disorders Association (NINCDS/ADRDA) criteria for probable AD. In this study, to minimize the effect of different image sizes and resolutions, we selected images from subjects with the same image dimension and resolution, and we used only the baseline fMRI scans.
In-house cohort. A total of 365 subjects were included the in-house dataset: 81 AD subjects, 132 MCI subjects, and 152 CN subjects. This dataset was a part of a large cohort enrolled at the National Dementia Research Center, Chosun University, Gwangju, South Korea. All subjects provided written informed consent before the data collection. In case of AD patients with the inability of consent, the next of kin of patients gave consent before participation. Psychological tests or assessments were not used to determine whether subjects were able to provide written informed consent. The consent procedure and data acquisition were approved by the Institutional Review Board (IRB) of the Chosun University Hospital, Gwangju, South Korea (IRB number 2013-12-018). Briefly, subjects were between 56 and 87 years of age, and the study partners were able to provide independent functional evaluations. The MMSE and CDR scores, and the other clinical criteria for inclusion in the three groups were the same as in the ADNI2 cohort. The demographics of the participants from two cohorts are shown in Table 1 and subject IDs are provided in supporting S1 Table. Rs-fMRI data acquisition ADNI2 cohort. ADNI2 subjects were scanned at different centres using 3.

Preprocessing of rs-fMRI data
Preprocessing of rs-fMRI data was carried out using the Data Processing Assistant for Resting-State fMRI (DPARSF; http://www.restfmri.net) [33] and the Statistical Parametric Mapping platform (SPM8; http://www.fil.ion.ucl.ac.uk/spm). All Digital Imaging and Communications in Medicine (DICOM) files were obtained from the scanners as described above, and converted into the Neuroimaging informatics Technology initiative (NIfTI) file format. The first 10 time points for each participant were disregarded to allow for signal calibration and participants' adaption to the scanning noise. Subsequently, functional images went through the following preprocessing steps: slice-timing correction was referred to the last slice; realignment for head movement compensation was performed by applying a Friston 24-parameter model (6 head motion parameters, 6 head motion parameters from the previous time point, and 12 corresponding squared items); individual structural images (T1-weighted MPRAGE) were coregistered to the mean functional image after realignment; normalization the rs-fMRI to the original space was performed with the Diffeomorphic Anatomical Registration Through Exponentiated Lie algebra (DARTEL) toolbox [34] (resampling voxel size = 3 × 3 × 3 mm 3 ); spatial smoothing was performed with a 6-mm full-width at half-maximum (FWHM) Gaussian kernel. Then, linear trend removal and temporal band-pass filtering (0.01 Hz < f < 0.08 Hz) were performed on the time series of each voxel. Finally, we regressed out cerebrospinal and white matter signals as well as six head-motion parameters to further reduce the effects of nuisance signals and focus only on the gray matters signal. A mask image was created according to the intersection of the subject-specific normalized T1 anatomical images. Only the voxels within the mask were further analyzed. The mask image was also used for correcting for multiple comparisons in later analyses. Due to the small size of the datasets, we used leaveone-out (LOO-CV) and 10-fold cross-validation (10-fold CV) for the ADNI2 and the in-house cohort, respectively, to validate the classification performance of the methods. In LOO-CV, one sample was selected as testing data whereas the rest was used for training. In 10-fold CV, 90% of the data were used for training and the remaining 10% for testing. Given training 3-D spatial maps, we then performed univariate statistical t-tests to obtain a 3-D mask which identified a set of 'active' voxels. We then implemented the MVPA techniques (SVM-RFE and LASSO) on 1-D concatenated training features to select the most relevant features for training the ELM and SVM classifiers. Finally, given the indices of the highest ranked features on the training data, we extracted the testing data for classification.

Feature extraction
We describe here some biomarkers measured from rs-fMRI using the Resting-State fMRI Data Analysis Toolkit (REST) toolbox [35]. The measures can be categorized into regional spontaneous measures (ReHo, ALFF, fALFF), and functional connectivity measures (DC, seed-based rsFC), as described below.
Regional homogeneity (ReHo). We used the ReHo measure to explore regional brain activity during the resting state. The computation was performed on a voxel-wise basis by calculating Kendall's Coefficient of Concordance (KCC) [36] of fMRI time series of a given voxel with those of its nearest neighbours. From all the voxels in the brain, an individual ReHo map was obtained for each subject. A higher regional coherence within a cluster, consisting of a voxel and its nearest neighbours, was represented by a larger ReHo value for the voxel. Several recent studies in literature have shown the potential value of ReHo in clinical applications [9,10,37].
Amplitude of low-frequency fluctuation (ALFF) and fractional ALFF (fALFF). The regional spontaneous activities can be examined by the ALFF measure and its improved version, the fALFF measure. After preprocessing, the filtered time series was transformed to a frequency domain using a fast Fourier transform (FFT), and the power spectrum was obtained. The average square root of the power spectrum (amplitude) between the frequencies of 0.01 and 0.08 Hz was computed at each voxel to give the ALFF measure [11,38]. The fALFF measure is a modified version of ALFF, defined as the ratio of the average amplitude in the low-frequency range (0.01-0.08 Hz) to that of the entire frequency range (0-0.25 Hz) [33].
Degree centrality (DC). We used a commonly employed graph-based measure of network organization, degree centrality (DC), to perform a full-brain exploration of the regions that were influenced by AD and MCI. Within the study mask, individual network centrality maps were generated in a voxel-wise fashion. First, the preprocessed functional runs were subjected to voxel-based whole-brain correlation analysis. The time course of each voxel from each participant that was within the gray matter mask was correlated with the time course of every other voxel, to obtain a correlation matrix. An undirected adjacency matrix was then obtained by thresholding the correlation at r > 0.25 [39,40], and the DC was computed as the sum of the weights of the significant weighted connections for each voxel. Finally, the individual-level voxel-wise DC was converted into a z-score map by subtracting the mean DC across the entire brain and dividing by the standard deviation of the whole-brain DC.
Seed-based resting-state functional connectivity (rsFC). To examine the detailed rsFC differences among the AD, MCI and CN groups at the regional level, we performed seedbased rsFC analysis. Briefly, the mean time course within each seed was extracted by averaging the time courses of all the voxels belonging to the seed. Subsequently, the mean time course was used to compute the correlation coefficients with the time courses of all voxels. The resulting correlation coefficients were then converted to z-scores using Fisher's r-to-z transform to improve normality [16,41]. In this study, we selected bilateral PCC, bilateral Hippocampus, and bilateral Precuneus as the seeds. Table 2 provides detailed information about the seeds.
Feature concatenation. Combining multiple measures is a very effective approach for boosting the performance of a machine learning setup [42], which has been used in many research domains, including neuroimaging classification [43]. In this work, we investigated a common feature concatenation that linked many feature measures of the same dataset. We believe that feature concatenation will enhance accuracy and enable the inference of indirect or direct associations between multiple features extracted from the same fMRI data.

Feature reduction techniques
The number of predictor voxels obtained in our spatial maps was larger than the number of subjects. Thus, a dimensionality reduction process was necessary in order to select the most relevant features, discard redundant features and noise, and avoid numerical singularities and overfitting problems, and thus enhance the classification performance. Importantly, feature reduction was performed using the training data only. Once identified, the same brain regions identified during training were used to assess the classifier predictive accuracy [44] on the testing data. In this study, we used univariate t-test and MVPA approaches, including SVM-RFE and LASSO, as voxel-wise feature reduction techniques. The univariate t-test is performed voxel-wise to identify independent voxels, whereas the multivariate RFE and LASSO investigate the mutual associations between multiple features and spatial patterns. We also used hybrid combinations of univariate and MVPA approaches to outperform the individual techniques.
Univariate two sample t-test. Many neuroimaging studies have shown abnormalities, at the level of the average signal, in one or more brain features in a diseased group compared to a control group using univariate statistical tests [19]. Recently, classification studies have used ttests to select informative features for machine learning in neuroimaging [8,45]. The key results of the analysis based on statistical tests are usually expressed by means of p-values. Subsequently, the optimal p-value cutoff to select the relevant features is determined through a cross-validation process, and the features thus selected are used in the subsequent machine learning analysis. In this study, we applied t-test-based feature reductions techniques to machine learning based diagnosis. Using t-tests on the training dataset, we generated an analytical mask that retained only the voxels presenting significant changes in any of the analytical feature measures, i.e. ReHo, ALFF, fALFF, DC, rsFC, between any of the two groups at the threshold p-values (p<0.05 with |t|>1.9715, p<0.01 with |t|> 2.599, and p<0.001 with |t|> 3.3381). The correction cluster size threshold p = 0.05 corresponding to corrected individual voxel p-values was computed by Monte Carlo simulations with the program AlphaSim in REST [35] (1000 iterations) to determine the cluster size. As a result, cluster sizes of 85 voxels (2295 mm 3 ), 18 voxels (486 mm 3 ), 6 voxels (162 mm 3 ) were found to correspond to corrected individual voxel p-values of 0.05, 0.01, and 0.001, respectively. Fig 2 shows selected regions resulted from univariate t-test applied to ReHo maps of one-fold training data, i.e., AD vs. CN and MCI vs. CN (out of >62 different folds for ADNI2 cohort and 10 folds for in-house cohort). Support vector machine-recursive feature elimination (SVM-RFE). While the t-test is a univariate procedure that does not take into account interactions between multiple features and spatial patterns, support vector machine-recursive feature elimination (SVM-RFE) is a multivariate wrapper-model-based feature reduction algorithm, which efficiently fits a model and removes the weakest features until the specified informative number of features is reached. The ranking criterion of SVM-RFE is closely related to the SVM model. In each iteration of the RFE, an SVM model is trained. Then, the feature with smallest ranking criterion is removed since it has the least effect on classification, while the remaining features are kept for the SVM model in the next iteration. The sequential process is repeated until all the features have been eliminated. Then, according to the order of elimination, the features are graded. The later a feature is eliminated, the more significant it should be [46]. A detailed description of the SVM-RFE algorithm can be found in a previous paper [20]. In this work, after the application of SVM-RFE, the most important training features that maximize cross-validated accuracy were kept for training the classifiers. Fig 3 illustrates the process of hybrid combination of univariate t-test and multivariate SVM-RFE as well as LASSO to select the most relevant features.
Least absolute shrinkage and selection operator (LASSO). A good example of MVPA feature reduction with error and regularization terms is LASSO, which has been successfully applied in neuroimaging machine learning tasks to mitigate problems related to the so-called curse-of-dimensionality. LASSO computes model coefficients γ j by minimizing the following function: where x i is the voxel-wise feature input data, a vector of q values at observation i, and n is the number of observations. u i is the response at observation i. Lambda (λ) is a non-negative userdefined regularization parameter which controls the balance between limiting the number of non-zero coefficients γ j (sparsity) and high prediction accuracy. Interestingly, as λ approaches 1, the model becomes increasingly sparse, meaning it will produce few relevant features, while as λ approaches 0, the model becomes less sparse and includes more relevant features [5]. The parameter γ 0 is a scalar. The function minimized by LASSO involves the l 1 norm of γ j [47][48][49].
In this paper, we chose the value of λ that minimized the cross validated mean squared error (MSE), as shown in Fig 4. The hybrid combination of univariate t-test and multivariate LASSO for selecting the most discriminative training features is shown in Fig 3.

Classification
In this study, three machine learning classification algorithms were used namely, ELM, linear SVM, and non-linear SVM. We have compared the results of all the classifiers, and ELM proves to be the most efficient algorithm both in terms of computation time and accuracy. Brief description of each method is described as follows. ELM classifier. An ELM consists of an input layer, a hidden layer, and an output layer. Whereas traditional feedforward neural networks require weights and biases for all layers to be adjusted by gradient-based learning algorithms, ELM arbitrarily assigns input weights and hidden layer biases without iterative adjustment, and computes the output weights by solving a single linear system [23]. Thus, ELM learns much faster than traditional neural networks and is widely employed in various classification applications as an efficient learning algorithm [24]. In this work, the number of hidden nodes was set between 1 and 400, and we selected a sigmoid activation function. A grid search method on training data was used to tune this parameter for achieving maximum cross-validated validation accuracy. To minimize the random effects due to the weight initializations, each value of the number of hidden nodes was used 100 times and the average performance was presented.
SVM classifier. Support vector machines (SVM) have recently become popular as supervised classifiers of fMRI data due to their high performance, their ability to deal with large high-dimensional datasets, and their flexibility in modeling diverse sources of data [4]. In the present study, we utilized a linear SVM and a non-linear SVM based on radial basis function (RBF) kernels. In SVMs, the parameters that need to be tuned are the gamma value of the kernel scale (γ) and the box constraint (C). We used a greedy search method on training data to tune these parameters to maximize cross-validated test accuracy. In this study, the search scale for selecting gamma values of kernel scale and box constraint were set to γ = [0.001, 0.01, 0.1, 1, 10, 100, 1000, 10000], and C = [0.001, 0.01, 0.1, 1, 10, 100, 1000, 10000], respectively.

Cross-validation, performance evaluation, and significant testing methods
Cross-validation. In this work, we used Leave-One-Out cross-validation (LOO-CV) for the ADNI2 cohort and 10-fold cross-validation (CV) for the in-house cohort. In the LOO-CV, N-1 subjects out of N were used for training, and the remaining one was left for testing, and the procedure was repeated for all the N subjects. In 10-fold CV, the subjects were randomly Significant testing methods. To assess the statistical significance of the classifiers' performance, a permutation test was performed on the classification accuracies, by randomly permuting 1000 times the labels of the test data of each of the N (LOO-CV) or 10 (10-fold CV) folds to get the probability of random successful classification. In general, the lower the pvalue of the permuted prediction rate against the prediction rate of the original data labels, the higher the significance of the classifier performance.

Results
Classification results: Univariate t-test ADNI2 cohort. Tables 3 and 4 summarize the classification performance in discriminating between AD and CN, and between MCI and CN, respectively, of all the competing methods on the ADNI2 cohort. In terms of the mean diagnosis accuracy, the ELM classifier with concatenated features obtained a maximal accuracy of 89.92% (p-value<0.001) with a sensitivity of 86.51%, specificity of 84.17%, balanced accuracy of 84.58%, PPV of 94.00%, and NPV of 87.40% when discriminating between AD and CN; and a maximal accuracy of 85.81% (p-value<0.001) with a sensitivity of 86.67%, specificity of 85.83%, balanced accuracy of 84.85% %, PPV of 86.50% and NPV of 90.00% when discriminating between MCI and CN. The concatenated measure outperformed all individual measures. In addition, the ELM outperformed the linear and RBF-based non-linear SVMs in terms of diagnosis accuracy in all measures, including concatenated ones.
In-house cohort. The experimental results on the in-house dataset are summarized in Table 5 (AD against CN) and Table 6 (MCI against CN). Our proposed method with the ELM classifier achieved very high mean accuracies for all types of measures (above 90% mean accuracy for AC against CN; around 80% mean accuracy for MCI against CN). Note that, in AD vs. CN, the concatenation of all measures resulted in a maximal mean accuracy of 94.45% (p-value<0.001) with a sensitivity of 83.67%, a specificity of 96.67%, a balance accuracy of 90.17%, a PPV of 95.67%, and a NPV of 91.07%; in MCI vs. CN, the maximal mean accuracy was 87.20% (p-value<0.001), with a sensitivity of 78.85%, a specificity of 87.50%, a balance accuracy of 81.69%, a PPV of 84.66%, and a NPV of 81.27%. Again, the performance of the combined measures was superior to that of the individual measures. In addition, the mean accuracy of the ELM classifier was superior to that of the linear and non-linear SVMs, as can be seen in the Tables 5 (AD vs. CN) and 6 (MCI vs. CN). Abbreviation: ReHo

Classification results: Group differences and classifications
To date, there are no guidelines available for the optimal user-defined threshold of significance (p-values) to select the relevant features to be used in machine learning for the differentiation of AD and MCI vs. CN [5,44]. To investigate the effects of univariate statistical p-values, we show in Table 7 the ELM classification performance at different p-values (p = 0.05, 001 and 0.001). Interestingly, the best performance was found with the least significant difference (pvalue = 0.05) for both datasets and both classification problems (AD vs. CN and MCI vs. CN). Specifically, in the ADNI2 cohort, the maximal mean accuracy in AD vs. CN classification was , with a sensitivity of 78.85%, and a specificity of 87.50%. Therefore we can conclude that a highly significant group difference (p-value = 0.01, 0.001) does not necessarily result in a stronger classification performance, and, conversely, that a high classification performance does not necessarily mean that strong differences exist between the means of the groups.

Classification results: Hybrid combination of MVPA methods
In the previous section, we reported the results using only univariate t-tests, not combined with MVPA methods, for discriminating AD and MCI from CN. In this section we will examine the hybrid combinations of t-tests and multivariate techniques, including LASSO and SVM-RFE. Table 8 presents the performance in AD and MCI discrimination using the ELM classifier with only the univariate t-test (on concatenated features), and its combination with LASSO or SVM-RFE. The results show that the ELM classifier combined with the hybrid feature optimization framework outperformed the same classifier without feature optimization, in both cohorts and in both AD and MCI discrimination (accuracies up to 98.86% for AD and 98.57% for MCI diagnosis in the ADNI2 cohort; up to 98.70% for AD and 94.16% for MCI diagnosis in the in-house cohort). In addition, the ELM performance with combined univariate t-test and SVM-RFE is clearly superior to that of combined univariate t-test and LASSO. Interestingly, the hybrid combinations of univariate t-test with different threshold p-values and SVM-RFE resulted in similar accuracies. These similar performances can be explained as follows: In this paper, we chose the highest ranked features using grid search cross validation method on only training data, and SVM-RFE eliminated the remaining, low-ranked features. Even though with different p-values, the number of highest features are the same for the classifiers, and that resulted in equal performance.

Comparison with previous studies
In recent years, many studies have been carried out to classify AD/MCI subjects using rs-fMRI. Studies based on the use of a binary classification reported accuracies from about 75% to about 95% [18,19]. Table 9 summarizes the results of recently published studies using rs-fMRI neuroimaging-based machine learning to discriminate AD and MCI from CN and compares them with our results. It should be noted that our method outperformed the ones proposed in [26,[50][51][52], which used the same MCI and CN subject selection from the ADNI2 cohort. Direct performance comparison with other studies would not be fair, because of the different datasets, preprocessing pipelines, feature measures, and classifiers. Nevertheless, it is noteworthy that the method we propose achieved the highest accuracy among all the methods described in the classification of AD and MCI vs. CN using only rs-fMRI data.

Feature selection techniques on ADNI cohort
Recent years have shown wide applications of MVPA feature selection methods applied on neuroimaging data sets from public ADNI cohorts. In Table 10    uninformative sources of noises. Salvatore et al. used PCA method to reduce the dimensions of WM and GM density maps [64]. The reduced density maps were used for SVM classifiers to identify AD (accuracy = 76%) and MCI (accuracy = 72%) patients from CN. Similar predictive improvements due to a single MVPA feature selection or their hybrid combinations were obtained in unimodal rs-fMRI studies [26,27], sMRI [60,63,65], PET [60,66] and multimodel sMRI+PET [60,66]. The hybrid combinations of feature selection methods were demonstrated to diagnose the AD and MCI diseases with success. In studies [26,27], Wee et al. combined two filter-based methods (t-test and minimum redundancy and maximum relevance-mRMR) and wrapperbased SVM-RFE methods to select the most discriminative functional connectivity extracted from rs-fMRI images. They reported maximum accuracies of 92.35% and 84% for identifications of AD and MCI patients from healthy controls. In other studies [67,68], a new hybrid voxel-wise feature selection approach that combines t-test with Fisher criterion-based genetic algorithm was proposed predict AD patients from CNs using segmented GM images. They reported that the hybrid method's performance (accuracy: 93.01%) is superior to those with PCA-based feature selection method (88.70%) and with no feature selection (accuracy: 87.63%). In addition, combinations of PCA with LDA and FDR (Fisher discriminant ratio) as feature selection methods outperform the whole-brain vovel-wise approach as they achieved AD classification accuracy results of up to 96.7% and 89.5% for PET and SPECT images, respectively [69].

rs-fMRI MCI
By contrast, some studies have reported that feature selection without utilizing prior knowledge did not increase classification accuracy. Chu et al. [70] compared four common feature selection methods: 1) pre-selected ROIs based on pre-knowledge, 2) univariate t-test, 3) RFE, and 4) t-test constrained by ROIs, extracted from segmented GM maps from T1 MRI scans of three patient groups (AD, MCI, CN). Surprisingly, the results showed that: 1) the predictive accuracies with either univariate t-test or RFE were no better than those achieved using the whole brain data, 2) the hybrid method (t-test + ROI) that used the ROI as spatially constrain and t-test as the ranking of features did show significant improvements of classification accuracy in AD vs. CN and MCI vs. CN. Similarly, voxel-wise hybrid combinations of t-test and SVM-RFE applied to whole-brain GM maps were not significantly improved the AD-and MCI-diagnosis performances as compared to whole-GM approach [65].
Hybrid combinations of feature selection methods have also been used for AD and MCI classifications using other cohorts rather than standard ADNI data sets. Typically, Jie et al. [28] combined t-test and RFE to select the most topological features extracted from fMRI scans for MCI discrimination from CN subjects. They reported a maximal accuracy of 91.9%. Other study [31] utilized a hybrid feature selection approach that combines three filter-and two wrapper-based methods, and compared the performance of six different combinations of them. They reported the best accuracy of 90.4% using the proposed hybrid approach with SVM classifier in LOO-CV for AD patients diagnosis taken from Open Access Series of Imaging Studies (OASIS) database (http://www.oasis-brains.org/).

The benefits of MVPA feature reduction methods
It is known that the performance of pattern recognition methods such as SVM and ELM decreases with the increase of non-informative features [19]. Machine learning techniques take advantage of the multivariate nature of the fMRI data and are able to identify maximally discriminative spatial patterns [58]. In the present work, we have examined and assessed an approach for fMRI pattern discrimination analysis based on ELM and hybrid combinations of multi-voxels, including univariate and MVPA feature reductions. Our results show that the conventional univariate t-test, as used alone, can be used with a classifier for identification of AD/MCI patients. In addition, as shown in Table 7, a very low p-value cut-off does not guarantee a strongly informative feature, while a larger p-value does not necessarily indicate an irrelevant feature. Thus, by discarding voxels based only on the results of statistical tests sensitive to group means, could lead to loss of discriminative ability. Therefore, additional MVPA methods should be used in combination with the univariate group-level t-test.
We also demonstrated that the hybrid combination of multi-voxel methods (t-test + SVM-RFE and t-test + LASSO) increases the discriminative power of the patterns (Table 8). In our studies, we searched for the most relevant discriminative patterns using SVM-RFE, which iteratively eliminates the lowest-ranked patterns based on multivariate information classified by RBF-based SVM; and LASSO, which chooses the sparse features that contribute the most to the accuracy of the model during training. It is worth noting that because of the lesser sensitivity of the univariate method, the wisest setting for combining univariate and multivariate is to use larger p-value thresholds (thus preventing the exclusion of potentially relevant voxels), and then remove irrelevant voxels based on multivariate ranking functions.

Clinical significance of the results
The regions showing significant changes in a univariate t-test play an important role in achieving highly accurate differential diagnosis when used in combination with MVPA feature reduction methods. The following discussion of the significant regions may have clinical relevance.
We showed that the highest discrimination patterns were achieved when all information from regional coherence and functional connectivity measures were combined. This may imply that different parts of the brain undergo different functional failures as a consequence of AD/MCI. Therefore, classification methods should include the maximum amount of informative change to achieve optimal discrimination.
One important finding of the current study is that the significant regional features depend on the dataset: Therefore we cannot label any regional feature as a global biomarker of AD or MCI. Our binary classification results between folds indicated that the significant features are subject to change when the cross-validation subgroups of AD and MCI subjects are changed. Therefore, no specific regional feature would be an appropriate global biomarker for AD and MCI diagnosis. For instance, Figs 5 and 6 present an example of the statistical group-level differences between AD and CN, and between MCI and CN, for all measures of a CV fold. Regions with significant changes were mostly located in the DMN (mainly involving in the prefrontal cortex, the PCu, and the PCC).

Limitations and future perspectives
Notwithstanding the discriminative power of the framework we presented for AD and MCI, this work has several limitations that we now describe. First, the limited sample size of the inhouse cohort (81 AD, 132 MCI, and 152 CN), but especially of the ADNI2 one (33 AD, 31 MCI, and 31 CN), prevented the algorithm from learning during the training phase. Therefore these small datasets certainly do not adequately represent the patient population, so that the generalization of our results to other groups is not guaranteed.
A second limitation has to do with model complexity, as our proposed voxel-wise method may require more computation and resources than methods based on regions-of-interest (ROIs). However, the computation and resource burden only occur in the training phase, which can be implemented offline, whereas the computation for testing consists of simple functions. Thus, from the clinical perspective, we believe such limitation is acceptable when considering the better accuracies obtained.
Third, our multi-measure classification framework only considers functional MRI data. However, it is expected that combining as many modalities as possible would be advantageous for the discrimination of AD and MCI from CN [71]. Accordingly, in future studies, we plan to develop a multi-modal classification framework combining multiple data sources, including structural MRI and PET data.

Conclusion
In conclusion, we proved the possibility of using rs-fMRI scans for AD/MCI prediction in individual subjects. Using a standard Alzheimer's disease Neuroimaging Initiative cohort and an in-house AD cohort from South Korea, the proposed framework extracts the maximum amount of information changes due to AD/MCI from concatenations of multiple rs-fMRI biomarkers which lead to maximal classification accuracies as compared to all other recent researches. The combination of t-test-based univariate, and RFE-based multivariate feature selection techniques performed on the concatenated measure extracted from rs-fMRI data provided the best discriminative performance when the features thus selected were used by the ELM classifier, superior to that of linear and non-linear SVM classifiers. These results may direct future studies using rs-fMRI scans for the classification of patients with preclinical AD or MCI.
Supporting information S1 Table. The subject IDs of three groups of ADNI2 cohort used in this study. (DOC)