Multimodal Discrimination of Alzheimer’s Disease Based on Regional Cortical Atrophy and Hypometabolism

Structural MR image (MRI) and 18F-Fluorodeoxyglucose-positron emission tomography (FDG-PET) have been widely employed in diagnosis of both Alzheimer’s disease (AD) and mild cognitive impairment (MCI) pathology, which has led to the development of methods to distinguish AD and MCI from normal controls (NC). Synaptic dysfunction leads to a reduction in the rate of metabolism of glucose in the brain and is thought to represent AD progression. FDG-PET has the unique ability to estimate glucose metabolism, providing information on the distribution of hypometabolism. In addition, patients with AD exhibit significant neuronal loss in cerebral regions, and previous AD research has shown that structural MRI can be used to sensitively measure cortical atrophy. In this paper, we introduced a new method to discriminate AD from NC based on complementary information obtained by FDG and MRI. For accurate classification, surface-based features were employed and 12 predefined regions were selected from previous studies based on both MRI and FDG-PET. Partial least square linear discriminant analysis was employed for making diagnoses. We obtained 93.6% classification accuracy, 90.1% sensitivity, and 96.5% specificity in discriminating AD from NC. The classification scheme had an accuracy of 76.5% and sensitivity and specificity of 46.5% and 89.6%, respectively, for discriminating MCI from AD. Our method exhibited a superior classification performance compared with single modal approaches and yielded parallel accuracy to previous multimodal classification studies using MRI and FDG-PET.


Introduction
Alzheimer's disease (AD), the most common cause of dementia in the elderly, is a gradually progressive degenerative neurological disorder characterized by increased cognitive impairment, neurofibrillary tangles, characteristic degenerative pathology, and synaptic loss compared with normal aging, while mild cognitive impairment (MCI) represents an intermediate period between normal aging and clinically probable AD [1][2][3][4][5][6]. An early diagnosis of AD and distinguishing MCI from NC is important because effective intervention at the earlier stages of AD may delay or reduce the prevalence of disease onset [7,8]. While many neuroimaging modalities have been studied to detect structural and functional changes in the brain due to AD pathology, T1-weighted volume structural magnetic resonance imaging (MRI) and 18 F-Fluorodeoxyglucose-positron emission tomography (FDG-PET) are widely used in the early diagnosis of AD because they capture many of the important structural and functional changes that occur as part of the pathology of AD. Decline in synaptic number, which coincides with accumulation of neurofibrillary tangles and is associated with abnormal cytoarchitecture, may appear as cortical atrophy in structural MRI [9,10]. Indeed, many studies have used structural MRI to detect AD-induced cerebral atrophy and changes in shape [11][12][13][14][15][16][17][18][19]. In addition, decreased glucose metabolism, also known as hypometabolism, is thought to result from a reduction in neuronal activity caused by neuronal death and synaptic dysfunction, and can be detected as a lower intensity by FDG-PET [20][21][22][23][24][25][26]. Many studies have employed FDG-PET to diagnose AD based on hypometabolism [22,[27][28][29][30]. Since structural MRI and FDG-PET detect different aspects of neuronal changes, their complementary sensitivity to the disease might be beneficial to the early diagnosis of AD. Indeed, it has already been reported that utilizing specific combinations of MRI and FDG-PET features can enhance classification performance compared with single-modal image features [31][32][33][34][35].
It is necessary to extract and select suitable features that clearly represent AD characteristics and are robust with respect to technical limitations for accurate classification. Voxel-and region of interest (ROI)-based approaches for volume spaces have been widely used in discriminating AD and MCI from normal controls (NCs) using MRI [11][12][13][14][15] and PET [27][28][29][30]. Voxel intensity, however, tends to be influenced by the partial volume effect (PVE) from the different brain tissues because of limited voxel resolution. Furthermore, the insufficient biological theory of spatial normalization causes poor correspondence [36,37], especially in individuals with anatomical abnormalities. Surface-based approaches have been suggested as a way to overcome the limitations of voxel-and volumetric ROI-based approaches because cortical surfaces generally provide better accuracy and correspondence [37][38][39]. Many studies using MRI have extracted cortical thickness as a surface-based feature in the classification of AD and MCI. For example, Oliveira, Nitrini (16), Querbes, Aubry (17) and Lerch,Pruessner (19) used mean cortical thickness of neuroanatomical ROIs as diagnostic features to classify AD patients. Since surface-based features are generally limited to capturing the changes of cerebral cortex and it is known that subcortical structures such as that hippocampus are significantly vulnerable during the early stages of AD, Desikan,Cabral (18) combined the volume of subcortcial ROIs including the amygdala and hippocampus with mean cortical thickness.
FDG-PET is also associated with PVE issues that lead to misestimation of hypometabolism according to cortical atrophy [40]. Thus, partial volume correction (PVC) is required to identify changes in true radiopharmaceutical uptake by removing the atrophy effect from glucose metabolism [41][42][43]. Park,Lee (44) proposed surface-based statistical parametric mapping of PET intensity, and showed that surface-based FDG uptake is more precise and robust than voxel-based measurements with respect to PVC and spatial normalization. However, despite the advantages of cortical surface-based FDG-PET analysis, this approach has not been proposed in the classification of AD to the best of our knowledge.
The high dimensionality of features in classification studies can be a challenge because an extremely high number of features exceeding the number of samples significantly complicates evaluation of classifier robustness [45,46]. Therefore, dimension reduction of features is considered as a necessary step for classification studies, to which two distinct approaches, data-driven and prior knowledge, are generally applied. The data-driven approach consists of region selection and dimension reduction. The selection of discriminant regions from group comparisons that selects significant voxels (i.e. opting for high ranked voxels or ROIs from statistical results) has been suggested [18,30,47,48]. Another method for feature selection, dimension reduction can be applied prior to a training classifier. For example, manifold harmonic transform has been used to represent vertex-wise cortical thickness data as spatial frequency components [49] and unsupervised machine learning algorithms have been used for locally linear embedding to transform regional features to a lower dimensional space [50]. Although data-driven approaches can achieve high classification accuracy, the results might be sensitive to a specific training dataset rather than a biologically relevant AD pathology. On the contrary, defining and using features pertaining to neurodegenerative pathology that are independent from the dataset could be supportive of a diagnosis and clinical relationships in multimodal image classification instead of employing methods that have the potential to extract data-driven features. Indeed, previous studies mentioned that the use of prior knowledge allows for better accuracy or class of diagnostic function than data-driven feature selection methods [51,52].
The objective of this paper was to combine multimodal neuroimaging features including structural MRI and FDG-PET to discriminate between AD, MCI, and NC. Avoiding PVE issues in imaging space, the surface-based features were extracted from both structural MRI and FDG-PET instead of volumetric features. To effectively reduce the high dimensionality of feature space, we selected 12 anatomic areas which were frequently reported in previous neuroimaging studies as AD associated regions. We then tested discrimination power of the each selected regions. Finally, we validated the diagnostic accuracy of ours and compared with the results of previous classification studies.

Methods and Materials Ethics statement
Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu/) from over 50 sites. The institutional review board at all participating sites approved the study and written consent was obtained from all participants and the data were anonymized before being shared. More information can be found at http://www.adni-info.org/scientists/doc/ADNI_Protocol_Extension_ A2_091908.pdf.

Data
We used the baseline imaging data of 319 subjects (71 AD, 163 MCI and 85 NC) from the ADNI database (http://adni.loni.usc.edu/) ( Table 1). The datasets included standard T1-weighted images and FDG-PET images. T1-weighted images were acquired using a repeated volumetric three-dimensional (3D) magnetization-prepared rapid acquisition gradient echo (MPRAGE) with varying resolution (typically 0.94×0.94 mm in-plane spatial resolution and 1.2 mm thick sagittal slices). We co-registered the corresponding repeated 3D MPRAGE image to obtain an increased signal to noise ratio (SNR). Only images obtained using 1.5T scanners were used in this study. FDG-PET images were acquired using Siemens, GE, or Philips PET scanners according to the ADNI protocol (http://adni.loni.usc.edu/) with multiple frames (six frame scan for 30 minutes) of 3D data, starting approximately 30 minutes after injection of FDG (for all subjects: 197±47 MBq). The Dynamic scans were reconstructed using scanner-specific algorithms, coregistered to the first frame, and averaged to create a single image.

Image processing
Cortical thickness measurement. Structural MRIs were registered to the ICBM 152 average template using a linear transformation, corrected for intensity nonuniformity artifacts, and discretely classified into white matter (WM), gray matter (GM), cerebrospinal fluid (CSF) and background using an advanced neural network classifier [53,54]. Hemispheric cortical surfaces were automatically extracted from each T1-weighted image using the Constrained Laplacianbased Automated Segmentation with Proximities (CLASP) algorithm, which reconstructs the inner cortical surface by deforming a spherical mesh onto the WM/GM boundary and then expanding the deformable model to the GM/CSF boundary [39,55]. Cortical thickness was defined using the t-link method, which captures the Euclidean distance between linked vertices [39,56]. Each individual thickness map was transformed to a surface group template using a two-dimensional (2D) surface-based registration [37] and the mean cortical thickness of 39 regions using a surface-based automated anatomical labeling (AAL) template [57].
Surface-based FDG uptake. We aligned FDG-PET images to the corresponding structural MRI using a rigid body transformation, segmented the cerebellum [58] where glucose utilization is relatively preserved [59], and extracted the distribution volume ratio (DVR) image for intensity normalization. The three partial volume estimation maps of GM, WM and CSF indicating the portion of tissues within each voxel were calculated from MRI scans, and a weighted partial volume estimation map (wPVE) was calculated by the weighted sum of three partial volume estimation maps under the assumption that CSF is a non-uptake region and WM uptake is approximately one fourth that of GM [60]. The wPVE was smoothed with a 6 mm full width at half maximum (FWHM) Gaussian filter to consider the resolution of the FDG-PET image [40,44]. The intensity profile of the wPVE along the linked vertices between GM/CSF and WM/GM boundaries was derived from the volume image, and 5 equal proportions were linearly interpolated. A wPVE surface map (swPVE) was obtained by averaging the values of 6 intermediate vertices including vertices of the GM/CSF and WM/GM boundaries (Fig 1). In a similar way, the intensity profile of the DVR image along the linked vertices between GM/CSF and WM/GM boundaries was derived from the volume image and 5 equal proportions were linearly interpolated. FDG-PET surface maps (sFDGs) were obtained by averaging the values of 6 intermediate vertices including vertices of GM/CSF and WM/GM boundaries (Fig 1). The partial volume corrected sFDG (csFDG) was obtained by dividing sFDG by swPVE after diffusion smoothing with a 20 mm FWHM filter. Each csFDG was transformed to the surface template utilizing sphere-to-sphere warping surface registration and 39 regional uptake values were obtained using the AAL template [37,57].
Hippocampal volume and its FDG uptake. Hippocampus segmentation was performed separately using an automated method based on the graph-cuts algorithm [61] combined with atlas-based segmentation and morphological opening [62]. A priori information combining atlas-based segmentation with estimation of partial volume probabilities at each voxel was applied to define the initial hippocampal region for the graph-cuts algorithm in this framework. Morphological opening was applied to reduce errors in the graph-cuts results. The segmented hippocampal volume was normalized by the individual intracranial volume to account for differences in brain size [63]. PVC was performed by dividing the DVR image by the wPVE map of the segmented hippocampus.
Volume-based features. Automatic whole-brain segmentation into 58 regions was performed in the native space of each MRI using the AAL template [57]. We computed the volumes of 39 GM regions in the cortex and obtained their FDG uptakes using the masked segmentation of the aligned DVR image. The native MRI was used for comparison purposes, and the results were interpreted using the surface-based AAL template. Volume normalization and PVC on all the regions were performed in the same way described in section of "Hippocampal volume and its FDG uptake".
We also performed two-sample t-tests in 40 regions between the AD and NC groups and selected the top 12 regions based on the absolute t-values as a data-driven method as a way to compare the results with our feature selection method.

Partial least squares-linear discriminant analysis (PLS-LDA) classification
Latent variable values were derived using the regression coefficients of the partial least squares (PLS) model, which is typically used to reduce data dimensionality [72,73]. The PLS model finds the orthogonal linear combinations of the data matrix X and class vector Y that explain covariance between X and Y. In the present study, X was formed as an N×d matrix (N: the number of subjects in pairs of diagnostic groups, d: the number of features) and Y was coded as either 0 or 1 according to the specific group in this study. The PLS model can be written as: where E and F are residual error terms and P and Q are the associating normalized loading matrices in the form of an N×A matrix, where A is the number of PLS latent components. The inner relationship of the maximal covariance between values for each latent component is given by: where the vectors t a and u a are the values of the a-th PLS latent component for X and Y, and β a is the regression coefficient for the a-th latent component. The optimal number of latent components (K) was determined by the prediction residual sum of squares algorithm [74,75].
Linear discriminant analysis (LDA) was used to generate the classification system. LDA either maximizes the between-classes variance or minimizes the within-class variance for each group and then maps the resulting data onto the axes in order to maximally separate the groups in the dataset. A simple description of the LDA classifier is given as follows [76]. Suppose K is the optimal number of latent components and the vector T = (t 1 . . . t k ) T is the latent variable assumed to have a normal distribution within class g = 0,1 (like class vector Y) with a mean μ g and covariance matrix Ʃ g . In the LDA classifier, Ʃ g is assumed to be the same for all classes for all g,Ʃ g = Ʃ. Using estimatesm g andŜ in place of u and S, the discriminant rule assigns the i-th

Validation
We used a leave-one-out cross-validation (LOOCV) strategy to determine classification performance (accuracy, sensitivity and specificity) [77]. The accuracy of a classifier was defined as the ratio of true results in the test outcomes, sensitivity was defined as the true positive fraction, and specificity was defined as the proportion of true negatives calculated by LOOCV. Specifically, all subjects except one were used as a training dataset to generate the classifier and the 'left' was classified based on the classifier. Since LOOCV was performed exactly once for each subject per comparison, there was no bias at the subject level. The predicted values of 'left one' mapped onto LDA axes were used to build a receiver operating characteristic (ROC) curve, which provides an overall measure of classifier performance. Several simple logistic regression models were applied to identify the discriminant power of each of the selected regions between AD and NC subjects. No covariate was included, and the statistical P-value and area under the ROC curve (AU-ROC) was computed for each logistic regression model.

Classification with multi-modal imaging features
The performance of the classification method based on the 24 selected multi-modal features (SMFs) was assessed among three clinically relevant pairs of diagnostic groups (AD/NC, AD/ MCI, and MCI/NC). Table 2 shows the results of the LOOCV of PLS-LDA in terms of accuracy, sensitivity, and specificity. With respect to the diagnosis of AD from NC, we achieved a 93.6% classification accuracy, 90.1% sensitivity, and 96.5% specificity for the SMF set. On the contrary, the best accuracy of the 24 selected single-modal features (SSF) was 87.8% when only FDG uptake was used. The ROC curves for the predicted values based on SMF or SSF are displayed in Fig 3. The AU-ROC of the SMF was 0.951, indicating an excellent diagnostic power that was better than that of the SSF. These results indicated that classification with SMF exhibited an improved performance compared with any other procedure using SSF alone, which also held true for the results of other diagnostic groups except for FDG uptake between MCI and AD (see Table 2 and Fig 3).
With respect to distinguishing MCI from NC, we obtained a classification accuracy of 69.0% (AU-ROC: 0.721) for the SMF while each classifier of the SSF achieved a classification accuracy of 66.5% (AU-ROC: 0.697) by MRI and 68.6% (AU-ROC: 0.698) by FDG PET. There Table 2. Comparison of classification performance among SSF and SMF. Classification accuracy (acc.), sensitivity (sens.), specificity (spec.), and area under the receiver operating characteristic curve (AU-ROC) are shown. was similar accuracy among SMF and SSFs, but ROC curves showed obvious difference in their values. On the other hand, the classification accuracy of the SMF between MCI and AD was 76.5%, which was higher than the 75.6% accuracy achieved with MRI alone. Although the classification accuracy of the SSF for AD from MCI using FDG-PET alone was the same as that of the SMF, a clear disparity was noted in that SMF was superior to SSF in AU-ROC (SMF: 0.799 and SSF: 0.753).

MRI
The classification performance of the 24 data-driven features with the 12 selected regions determined with a two sample t-test, 24 volume-based features from the same regions as SMF, and 80 whole brain features from 40 regions consisting of 39 regions on surface-based AAL template and hippocampus are shown in Table 3. While data-driven features and volumebased features exhibited a worse accuracy, whole brain features showed a better accuracy and slightly higher performance (78.2% in AD/MCI classification and 70.2% in MCI/NC classification) than SMF (76.5% and 69.0%).

Regional features
Simple logistic regression models were applied to each selected regional feature (p < 0.001, see Table 4). All of the features from structural MRI showed high beta coefficients and significant differences. However, four regions of FDG uptake, (in inferior occipital, parahippocampal, rectus, and supramarginal gyri) showed no significant distinction between AD and NC (p > 0.05), indicating their lack of discriminant power as single features. Interestingly, this result was in disagreement with previous studies indicating that these four regions are related to neurodegenerative pathology.

Discussion
In this paper, we propose a method for classifying AD and MCI based on cortical surfacebased features obtained from structural MRI and FDG-PET. Furthermore, we used predefined regions to prevent bias related to the number of features and data-driven feature selection method. Our method achieved a better diagnostic accuracy than single-modal, voxel-based, and data-driven features. Specifically, we achieved a 93.6% classification accuracy, 90.1% sensitivity, and 96.5% specificity for the diagnosis of AD from NC.

Surface-based multi-modal imaging features
When multi-modal imaging features such as structural MRI and CSF [78], FDG-PET and CSF [79], and MRI, FDG-PET, and CSF [31][32][33][34][35] are used together in the classification of AD and MCI, better performance is generally achieved compared with the use of single modal features. This observation is consistent with previous studies reporting that our multi-modal classification combining structural MRI and FDG-PET is more accurate than single-modal classification for all pairs of diagnostic groups regardless of the method of feature selection (see Tables  2 and 3).
In the present study, we employed surface-based features in our classification scheme in order to improve spatial normalization, smoothing, and PVC issues associated with voxelbased analysis. Surface-based registration seems to be a more robust method for analyzing abnormal brains [36,37]. Moreover, three-dimensional Gaussian smoothing for increasing SNR in volume space cannot be adapted to the complicated gyral pattern of human brain architecture. Because of the complicated sulcal/gyral morphology, surface smoothing across the cortical surface can be a reliable method [80]. Likewise, surface-base PVC methods have the advantage of not only eliminating PVE from cortical atrophy, but also achieving high spatial accuracy due to spatial normalization and smoothing [44]. The advantages of surface-based analysis included a higher diagnostic performance than voxel-based methods, and we compared the classification accuracies of the proposed methods with voxel-based features to show that the surface-based features did indeed yield better performance under all conditions (Tables 2 and 3). Several previous studies have used multi-modal features similar to our study (see Table 5). However, due to discrepancies in datasets as well as feature extraction methods, the number of modalities and classifiers, it may be improper to directly compare our results with these studies in terms of the advantages of surface-based features and predefined regions. Our dataset was, therefore, applied to other classifiers such as support vector machine (SVM) which is most often classifier used in previous discriminant studies [81] and multi-modal imaging and multi-level characteristics with multi-classifier (M3) incorporating features from multimodal imaging data through weighted voting [82]. While higher classification results were shown in AD/MCI and MCI/NC classification using M3 method, it is still notable that our classification method achieved higher accuracy between NC and AD as shown in S1 Table. Feature selection based on AD pathology Feature selection is required to select effective features and obtain optimal accuracy in classification studies [45,46]. Proper feature selection methods can improve diagnostic performances, especially with prior knowledge of the disease [51,52]. In this study, we selected features with 12 predefined regions associated with neurodegeneration based on prior knowledge (Table 4); the diagnostic accuracy with this approach was better than that of data-driven and no feature selection results (see Tables 2 and 3). The 12 predefined regions selected in this study are widely known to be related to neurodegenerative pathology. Neuronal damage of the orbitofrontal cortex has been examined from the viewpoint of neurofibrillary tangles, which are masses of hyper-phosphorylated tau proteins observed postmortem [83]. Some neuroimaging studies with FDG and MRI have shown that the characteristics of orbitofrontal cortex including superior/inferior/medial orbital and rectal gyri are definitely separable from NC [66,69,[84][85][86]. As it plays a central role in memory, the temporal lobe which contains the hippocampus, parahippocampal, and middle temporal gyri exhibits the most distinctive functional and structurally distinctive patterns in AD and MCI patients [22,24,65,70,[87][88][89]. In particular, the hippocampus and parahippocampal gyrus exhibit a strong correlation with the posterior cingulate cortex [90][91][92]. In addition, previous AD pathology studies have demonstrated hypometabolism and cortical atrophy in the posterior cingulate cortex based on this correlation [23,64,66,68,70,[93][94][95].
Along with the posterior cingulate cortex, the precuneus is the earliest functionally changed region in FDG studies [68,70,96], and there are also significant differences in atrophy in AD patients [67,71]. This is especially important due to the clinical importance of language function impairment in AD patients, in which there is synaptic loss and dysfunction in the parietotemporal cortex involving the angular gyrus, supramarginal gyrus, inferior parietal lobule [64,66,69,70,86,95]. Based on these findings, we selected these regions for diagnosing AD, MCI, and NC.
Some FDG features have a lower discriminant power than others, although there are sufficient previous findings of biological meaning (Table 4). There are two possible explanations for the lower diagnostic performance in the four FDG regional features: inferior occipital, parahippocampal, rectus, and supramarginal gyri. First, FDG uptake appears to provide largely redundant information compared with structural features in classification methods that use multimodal imaging features [31,97]. Second, it is possible that characteristics of the dataset may influence the classification performance of each individual feature. In future work, we hope that using different datasets for diagnosis will clarify whether such supposition of features is unnecessary.

Methodological issues of surface-based FDG-PET
Our cortical surface-based FDG-PET analysis had several distinct features compared with Park, Lee (44). First, the virtual glucose uptake image, referred to as iPVE in Park, Lee (44), was generated by smoothing the segmented GM and WM regions, which can represent the intensity reduction of FDG uptake due to PVE but does not precisely quantify the mixed tissues. Because iPVEs are generated from binary categorized images consisting of GM, WM and CSF, the PVE may remain iPVE. Moreover, the selection of the maximum value of the intensity profile to vertices on the surface may result in inexact mapping. The locations of the maximum FDG uptake and PVE are likely to be different, which might create a less accurate PVC (see right side in Fig 1). To overcome this issue, we used the mean intensity instead of maximum value. Due to the disadvantages described above, it may make more sense to use swPVE for PVC rather than iPVE. The comparison between iPVE and swPVE using correlation with cortical thickness is shown in Fig 4. Indeed, the correlation coefficient shows that swPVE (r = 0.792) is more reliable than that of iPVE(r = 0.785) with respect to PVE estimation. Plot representing the correlation between whole brain mean value of PVE maps and cortical thickness. Both PVE maps were significantly correlated with cortical thickness. swPVE exhibited a higher correlation (r = 0.792) than that of iPVE (r = 0.785). doi:10.1371/journal.pone.0129250.g004

Limitations
The results for MCI classification were not as good as for AD classification because of the nature of the MCI cohort, which has generally heterogeneous characteristics [98]. Therefore, some studies have divided MCI subjects into subtypes, i.e. stable vs. progressive or converter vs. non-converter, based on changes in disease status [27,79,99]. While classification for distinct MCI subtypes might show better results, we did not divide the MCI group in this study because of a lack of longitudinal information for disease status in some subjects.

Acknowledgments
Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.ucla.edu/wp-content/uploads/how_to_apply/ADNI_