A Subset of Cerebrospinal Fluid Proteins from a Multi-Analyte Panel Associated with Brain Atrophy, Disease Classification and Prediction in Alzheimer’s Disease

In this exploratory neuroimaging-proteomic study, we aimed to identify CSF proteins associated with AD and test their prognostic ability for disease classification and MCI to AD conversion prediction. Our study sample consisted of 295 subjects with CSF multi-analyte panel data and MRI at baseline downloaded from ADNI. Firstly, we tested the statistical effects of CSF proteins (n = 83) to measures of brain atrophy, CSF biomarkers, ApoE genotype and cognitive decline. We found that several proteins (primarily CgA and FABP) were related to either brain atrophy or CSF biomarkers. In relation to ApoE genotype, a unique biochemical profile characterised by low CSF levels of Apo E was evident in ε4 carriers compared to ε3 carriers. In an exploratory analysis, 3/83 proteins (SGOT, MCP-1, IL6r) were also found to be mildly associated with cognitive decline in MCI subjects over a 4-year period. Future studies are warranted to establish the validity of these proteins as prognostic factors for cognitive decline. For disease classification, a subset of proteins (n = 24) combined with MRI measurements and CSF biomarkers achieved an accuracy of 95.1% (Sensitivity 87.7%; Specificity 94.3%; AUC 0.95) and accurately detected 94.1% of MCI subjects progressing to AD at 12 months. The subset of proteins included FABP, CgA, MMP-2, and PPP as strong predictors in the model. Our findings suggest that the marker of panel of proteins identified here may be important candidates for improving the earlier detection of AD. Further targeted proteomic and longitudinal studies would be required to validate these findings with more generalisability.


Introduction
Alzheimer's disease (AD) is a progressive neurodegenerative disorder pathologically characterised by lesions of misfolded proteins, the loss of synapses and an overall reduction in brain volume. There is accumulating evidence to suggest that the clinical symptoms of the disease are preceded by a long presymptomatic phase (~15-20 years) of abnormal β-amyloid (Aβ) aggregation in the form of extracellular senile plaques [1,2]. The neuropathology of the disease is associated with the development of neurofibrillary tangles prior to the onset of cognitive impairment and the subsequent emergence of full-blown dementia [3,4]. The failure of several clinical trials assessing therapeutic strategies to target amyloid deposition has led to the impetus to discover biomarkers earlier in the AD pathological cascade prior to the development of cognitive symptoms.
One method is to study structural neuroimaging biomarkers of AD which have been advocated for use in early diagnosis [5], as well as for predicting disease progression in a prodromal form of the disease known as Mild Cognitive Impairment (MCI) [6]. Another rich source of biomarkers can be found in analytes from cerebrospinal fluid (CSF), particularly, concentrations of Aβ142, p-tau181 and t-tau which reflect biochemical changes associated with Aβ deposition, neurofibrillary tangle formation, and neuronal cell death [7,8].
Several neuroimaging studies have since found that the combined use of MRI measures from regions affected in AD and CSF biomarkers can provide mutually complimentary information for disease classification and prediction [9,10]. Nevertheless, there still remains a substantial overlap in CSF biomarker concentrations between AD and cognitively normal (CN) individuals with an increased risk of developing the disease [11]. Moreover, additional biomarkers are still required to understand the exact temporospatial relationship between Aβ deposition and tau neurodegeneration during different stages of the disease pathophysiology. Early genetic and in-vivo experimental studies have suggested that markers of inflammation, microglial activity and synaptic function may be important for reflecting biochemical changes associated with the Aβ toxicity and tau neurodegeneration [12,13]. While some proteomic studies using multiplex platforms have identified a number of protein candidates detected in AD [14][15][16], few have been validated and tested in relation to well-established neuroimaging endophenotypes of AD pathology. Discovering proteins in relation to established measures of disease pathology may yield biologically important peripheral signatures associated with mechanisms early in the disease.
In this study we aimed to discover CSF proteins associated with AD pathophysiology by testing the multiplex panel with established neuroimaging measures, CSF biomarkers of AD, Apolipoprotein E (ApoE) genotype and cognitive decline. Most importantly, we aimed to identify a subset of proteins from the multiplex panel in order to test its diagnostic utility with existing AD biomarkers for disease classification and MCI to AD conversion prediction at follow up.

Participants
Data used in this study was obtained from the ADNI database (adni.loni.ucla.edu). ADNI was launched by the National Institute of Ageing (NIA) and is a multicenter project supported by private pharmaceutical companies, and non-profit organisations for the development of biomarkers in monitoring disease progression in MCI and AD [17]. ADNI subjects aged 55-90 from over 50 sites across the U.S and Canada participated in the research (for further information, see www.adni-info.org). Written informed consent was given from all participants in the study and prior ethics committee approval was obtained from each participating site. A total of 295 subjects with baseline data that included structural imaging and multiplex CSF samples were available for analysis and consisted of 142 subjects with MCI, 65 patients with AD, and 88 healthy control subjects.
CSF multiplex proteomic samples were measured for levels of 159 analytes using the Human Discovery Multi-Analyte Profile (MAP) 1.0 panel and Luminex 100 platform developed by Rules Based Medicine, Inc. (RBM), (Austin, TX) [20]. This panel is based upon multiplex immunoassay technology to measure a range of inflammatory, metabolic, lipid, and other disease relevant proteins. The protocol used to quantify CSF analytes is described in detail elsewhere [21,22]. Of the 159 analytes, only those with <10% of missing values were quantifiable leaving 83 in total for analysis. The remaining 76 analytes were mostly below the assay detection limit, or had other assay limitations. Each analyte has an individual standard curve with between 6-8 reference standards. Each plate is run with 3 levels of QCs (low, medium and high) for each analyte. A total of 16 of the CSF samples were retested using a separate never before thawed replicate aliquot on the fifth of the five 96 well plates to provide blinded test/retest quality control data. Assays are qualified based on least detectable dose, precision, crossreactivity, dilutional linearity, spike recovery (assessment of accuracy), and test/re-test performance. Cross validation to alternative methods is reported for some assays where feasible. Further information on the process, aliquoting and storage of analytes is described in the ADNI Biomarker Core Laboratory Standard Operating Procedures (http://adni.loni.usc.edu/wpcontent/uploads/2012/01/2011Dec28-Biomarkers-Consortium-Data-Primer-FINAL1.pdf). Further assay documentation and validation reports are available from Myriad RBM (www. myriadrbm.com). Distributions of data for individual CSF proteins were checked for normality using Box-Cox methods and, when appropriate, transformed to approximate a normal distribution. Information regarding the biological preparation of CSF samples and quality control criteria of the RBM Human Discovery MAP panel can be found on the ADNI websites [23,24]. A complete list of the analytes is given in S1 Table. Magnetic Resonance Imaging Data Acquisition and Analysis Structural MRI images (at 1.5T) were acquired at multiple ADNI sites across the US and Canada based on a standardized protocol [25]. The imaging protocol included a high resolution sagittal 3D T1-weighted MPRAGE volume (voxel size 1.1 × 1.1 × 1.2 mm³). The MPRAGE volume was acquired using a custom pulse sequence specifically designed for the ADNI study to ensure compatibility across scanners [26]. Full brain and skull coverage was required for all MR images according to previously published quality control criteria [27,28].
Image analysis was carried out using the Freesurfer image analysis pipeline (version 5.1.0) to produce 34 regional cortical thickness and 23 subcortical volumetric measures as previously described [29,30]. All volumetric measures from each subject were normalized by the subject's intracranial volume while cortical thickness measures were used in their raw form [31]. Measures of hippocampal and entorhinal cortex volume were selected as key a priori regions to reflect AD pathology. A previously validated MRI-based marker of AD and MCI known as SPARE-AD (Spatial Pattern of Abnormalities for Recognition of Early AD) was also used as a neuroimaging marker of AD. Individualised scores of diagnostic and predictive value were used for analysis. A complete list of regional MRI measures is given in S2 Table. Details of this particular method have been widely published elsewhere [9,32,33].

Statistical Analysis
Firstly, the RBM panel of CSF proteins were tested in relation to regional MRI measurements (hippocampal and entorhinal volume), an MRI-based measure known as SPARE-AD score and CSF biomarkers (Aβ142, P-tau 181 , T-tau) using a Spearman rank partial correlation test. This test was adjusted for covariates including age, gender, years of education and ApoE E4 genotype. Secondly, CSF proteins from the RBM panel were also tested in relation to different ApoE polymorphisms (ε2 carriers, ε3 carriers and ε4 carriers) using a generalized linear model adjusting for age, gender and years of education. Thirdly, to test the effect of CSF proteins on longitudinal MMSE score, we used a linear mixed model approach. Global MMSE score was used as the response variable and the time from baseline visit in months, CSF protein from the RBM panel, age, sex, years of education and ApoE ε4 genotype were included as fixed effects. Models contained a random intercept and slope. The applicability of our mixed models were assessed by examining models with and without the random effect of data collection site, the linearity of CSF proteins over time within subjects and the normality of model residuals using diagnostic plots. All models were tested in both AD patients (n = 59) and MCI subjects (n = 142) with serial MMSE measurements. As a large number of proteins from the RBM panel were tested we used a false discovery rate correction to account for multiple comparisons.
A multivariate support vector machine (SVM) algorithm was applied to the ADNI cohort, in an unbiased fashion, to distinguish AD patients from CN individuals. In particular, a linear SVM algorithm was constructed using the LIBSVM implementation [34]. In the algorithm, the parameter C (representing the error/trade off parameter used for adjusting separation error in the creation of separation space) was optimized using 5-fold cross validation on the training set. The grid search routine suggested by Hsu et al (2010) [35] was implemented to identify optimal parameter settings for differentiating AD from CN individuals. A multi-kernel learning approach for linear SVM [36] was implemented for treating variables of a different nature. A general framework for kernel methods used to integrate data from different modalities has been described previously in more extensive detail [36][37][38].
To identify a subset of CSF proteins associated with AD, we adopted a recursive feature elimination (RFE) wrapper. The final subset of CSF proteins (CSF RFE subset) were then combined with CSF biomarkers and regional MRI measurements to test their utility for disease classification. Classification accuracy in each of these models was evaluated using ten-fold cross validation. Measures of accuracy, sensitivity, specificity, and area under the curve (AUC) were used to compare AD vs. CN models.
MCI subjects were divided into subjects that progressed to an AD diagnosis (MCI-converters) and others that remained clinically stable over a 12 month follow up period (MCI nonconverters). Subsequently, models from the AD vs. CN comparisons were used as training classifiers to prospectively predict MCI to AD conversion in MCI converters (MCI-c), as well as predicting MCI non-converters (MCI-nc) that remained stable at 12 months. Discriminant scores from the model were then used to classify MCI subjects as either having an AD-like or CN-like phenotype. The combined model (CSF RFE subset + CSF biomarkers + regional MRI measures) was also used to predict MCI to AD conversion in moderately late MCI-c (subjects that progressed to AD between 18-24 months follow up) and late MCI-c (subjects that progressed to AD at 36 months). MCI-nc predictions were also made using the combined model for subjects that remained clinically stable between 0-12 months, 18-24 months and 36 months follow up.
The R statistical software environment (v. 3.1.0; The R Foundation for Statistical Computing), was used to perform all statistical analyses.

Demographic Characteristics
Baseline sample characteristics of the ADNI cohort for demographic, cognitive, MRI and CSF biomarkers are presented by diagnostic group in Table 1. Significant differences between groups were found in hippocampal and entorhinal volume, as well as, SPARE-AD score, CSF biomarkers of AD, MMSE score and ApoE ε4 genotype. Subject age, gender and years of education were not found to differ significantly between groups.
CSF proteins from the multiplex RBM panel associated with neuroimaging markers of brain atrophy and CSF biomarkers of AD Due to the exploratory nature of this study we first tested the association of the entire multiplex panel of CSF proteins (n = 83) with neuroimaging and CSF biomarkers to identify candidates related AD pathogenesis. Associations were tested using a partial spearman rank correlation test that co-varied for the effects of age, gender, years of education and ApoE ε4 genotype. For several proteins (n = 50) we found an association with either neuroimaging markers of brain atrophy or CSF biomarkers of AD (Fig 1). Many proteins in this subset (n = 37) were also found to be significantly associated with both P-tau181 and T-tau CSF levels. Reduced levels of CgA were found to be significantly associated across all comparisons with neuroimaging and CSF biomarkers, but only remained significant in association with hippocampal (p = <0.001) and entorhinal volume (p = 0.008) after multiple comparison correction. Increased levels of Fatty Acid Binding Protein (FABP) emerged as the most significantly associated with CSF levels of P-tau181 and T-tau as well as SPARE-AD score. Results from our partial spearman rank correlation test are displayed in S3 Table. Bear in mind most associations were mild and reflected by p-values that were uncorrected for multiple comparisons.
CSF proteins associated with different ApoE gene polymorphisms CSF proteins from the RBM panel were also tested in relation to the ApoE polymorphism rather than diagnostic status. Significant differences in CSF levels were examined by ApoE genotype (ε2 carriers, ε3 carriers and ε4 carriers). We found that 9 CSF proteins were associated with the overall effect of ApoE genotype (Apo E, FABP, FGF-4, IL-8, AGRP, MIF, IL-3, ANG-2, and Osteopontin). However, only CSF levels of Apo E, IL-3 and MIF were found to differ between ApoE groups. CSF levels of these proteins compared to each ApoE group are shown in Fig 2. In particular, the strongest overall effect was observed for CSF levels of Apo-E which passed multiple comparison correction (p = .00046; FDR corrected = .034). Pairwise comparisons revealed that Apo-E levels were significantly lower in ε4 carriers irrespective of diagnosis compared to ε2 carriers (t = -3.63; p = < .0001) and significantly lower in ε3 carriers compared to ε2 carriers (t = -2.57; p = .027).
CSF proteins related to the rate of cognitive decline on longitudinal MMSE score We also tested the association of baseline CSF proteins with the rate of cognitive decline using at least three or four serial measurements of MMSE score. Firstly, we tested this in a sample of AD patients (n = 59) and found no CSF proteins were able to significantly predict a longitudinal change in MMSE score. However, in a sample of MCI subjects (n = 142) we found three proteins (SGOT, MCP-1, and IL-6r) were able to significantly predict cognitive decline (Table 2). Again, it should be noted that these associations were mild and no protein remained significant after multiple comparison correction.

Disease Classification
The recursive feature elimination (RFE) wrapper method identified a subset of 24 CSF proteins which best distinguished AD patients from CN individuals (Table 3). Overall, we found that the inclusion of these CSF proteins from the RBM panel improved the accuracy and specificity of models. In particular, combining the CSF RFE subset with CSF biomarkers resulted in an accuracy of 84.3% and an AUC of 91% (Table 4). We found that combining the CSF RFE subset improved the sensitivity in a model generated using CSF biomarkers from 70.8% to 83.1% which was statistically significant (Venkatraman's Test: Z = 2.94; p = .0042). Moreover, the CSF RFE subset combined with CSF biomarkers and regional MRI measures, achieved an accuracy of 91.5% (SEN = 87.7%, SPE = 94.3%, AUC = 0.95) which was significantly better than using CSF biomarkers alone (Z = 2.91; p = .0036) (Fig 3). In the combined model (CSF RFE subset + CSF biomarkers + regional MRI measures) we found CgA, FABP, MMP-2, and PPP contributed most strongly toward the detection of AD. However, the regional MRI measures and CSF biomarkers model gave the best result with an accuracy of 92.2% (SEN = 85.7%, SPE = 96.4%, AUC = 0.96) but this was not found to be significantly better than the CSF RFE subset combined with CSF biomarkers and regional MRI measures (Z = 0.38; p = 0.70). All model results are shown in Table 4.

MCI to AD conversion prediction
Over a follow up period of 36 months, 72 MCI subjects (50.7%) from our sample converted to an AD diagnosis. Table 5 shows the number of MCI subjects that were predicted as either ADlike or CN-like at a 12 month follow-up interval using all AD vs. CN models. Firstly, we tested all models in early MCI subjects who progressed to an AD diagnosis between 0-12 months (n = 34) [Early MCI-c]. We found that the inclusion of the CSF RFE subset with CSF biomarkers and regional MRI measures provided the best result, accurately predicting 94.1% of MCI-c progressing to AD whereas the regional MRI measures and CSF biomarker model was only able to achieve a prediction of 76.5%. Therefore, we further tested the combined model in moderately late MCI-c (n = 26) who were also correctly predicted with a 92.3% accuracy as progressing to AD, and late MCI-c (n = 12) who were predicted correctly with an 82.4% accuracy. Fig 4a displays the predicted probabilities from the combined model of MCI subjects that progressed to AD. For comparison, we also overlaid the predicted probabilities of AD patients and CN individuals. The majority of subjects that converted to AD at different follow up periods were found to already possess an AD-like phenotype at the prodromal MCI stage (p > 0.05; Kolmogorov-Smirnov test).
In contrast, MCI-nc predictions were less accurate with predictions ranging from 57.4% to 25.0%. Over a 36 month follow up period 70 MCI subjects (49.3%) remained clinically stable. The regional MRI measures model was found to yield the best prediction at a 12 month follow up. For the combined model, Fig 4b displays an almost bimodal distribution of MCI-nc predictive values, with some correctly predicted as CN-like and others predicted as having an ADlike phenotype. Despite these subjects remaining clinically stable at their respective period of follow up, some MCI subjects are expected to convert in the near future and as a result also display an AD-like phenotype at baseline. Further follow ups will determine whether these subjects remain clinically stable or convert to an AD diagnosis. Data are percentages and confidence intervals are presented in parenthesis. ACC = Accuracy, SENS = sensitivity, SPE = specificity, AUC = area under the curve.
The combined model includes regional MRI measures, CSF biomarkers of AD and the CSF RFE subset of proteins (n = 24).
The combined model includes regional MRI measures, CSF biomarkers of AD and the CSF RFE subset of proteins (n = 24). doi:10.1371/journal.pone.0134368.t005

Discussion
In this neuroimaging-proteomic study there were a number of key findings. Firstly, we identified several CSF proteins (n = 50) related to neuroimaging phenotypes of brain atrophy and CSF biomarkers of AD, suggesting that these candidates may be related to AD pathophysiology. Second, a unique biochemical profile of CSF proteins was found to be associated with ApoE genotype characterised by reduced levels of Apo E protein in ε4 carriers. Third, some proteins (SGOT, MCP-1, and IL6-r) were found to be related to a longitudinal change in MMSE score over a 4 year period. Although the statistical effects associated with this finding were mild and no result passed multiple comparison correction, further studies will determine whether they may serve as important prognostic factors related to the rate of cognitive decline. More importantly, we showed that reducing the RBM panel to a subset of 24 CSF proteins complemented existing AD biomarkers for AD detection and MCI to AD conversion prediction. Our findings were in agreement with some previous studies identifying a panel of candidate proteins associated with AD [15,21,39]. In particular, our first finding showed that several proteins were associated with brain atrophy and CSF biomarkers, however, levels of CgA and FABP emerged as the most consistently present across most our comparisons. Although the effects associated with these findings were mild, previous studies have linked these candidates to AD pathophysiology [40,41]. For instance, we found elevated levels of FABP protein to be significantly related to neuroimaging SPARE-AD score which is in agreement with previous findings reporting elevated levels of FABP protein in AD and prodromal MCI subjects [15,21,42]. Increased levels of CgA protein related to hippocampal and entorhinal volume has also been previously linked to early synaptic dysfunction in AD [40], reduced microglial regulation of synaptic function [43], and Aβ 1-42 metabolism in CN individuals [44].
We also found a unique biochemical profile of CSF proteins was associated with different ApoE gene polymorphisms. For instance, ApoE protein and IL-3 levels were reduced in ε4 carriers, whilst MIF protein levels were elevated. Previous studies have also reported a peripheral CSF signature associated with ApoE genotype [44] and similar findings have also been observed in blood plasma [45]. Moreover, many of the CSF candidates previously described in the literature (e.g. FABP, FGF-4, IL-8, AGRP, ANG-2, and Osteopontin) also showed mild associations with ApoE genotype, suggesting that the biological variability of proteins identified in AD cases may also be in part driven by genotype status.
For differentiating between AD and CN individuals we found that a subset of proteins (n = 24) from the RBM multiplex panel improved the accuracy and performance of models but was unable to achieve a better accuracy when regional MRI measures and CSF biomarkers were combined. Despite this, the combination of regional MRI measures, CSF biomarkers (Aβ 1-42 , T-tau and P-tau) and the CSF RFE subset achieved an accuracy of 91.5%. In this subset, four proteins namely CgA, FABP, MMP-2, and PPP were the strongest predictors for distinguishing AD from CN individuals. This is in agreement with previous studies showing that CSF candidates identified using immunoassay panel technology can complement CSF biomarkers of AD for the earlier detection of AD [21,46]. Recent neuroimaging-proteomic studies have also shown several proteins to be associated with longitudinal rates of brain atrophy [39], as well as whole brain atrophy [47]. To our knowledge this is the first study to test whether CSF proteins from an immunoassay panel can complement CSF biomarkers and regional MRI measures for disease classification and prediction. Previous studies have suggested that the use of conventional imaging, such as MRI, combined with biomarkers from different modalities may be complimentary to the early and specific diagnosis of AD [48,49]. Several studies have reported that this combined approach improved AD disease classification [10,36] and future MCI to AD conversion prediction [9,50].
For MCI to AD conversion prediction, the CSF RFE subset, CSF biomarker and regional MRI measures model also gave the best results and outperformed all other models. The model was particularly sensitive for correctly predicting MCI-c with an AD-like phenotype (Fig 4a) suggesting that our panel of 24 proteins may also have the prognostic potential to detect prodromal AD. However, the model failed to correctly detect MCI-nc, with a large proportion of subjects being predicted as AD-like, suggesting that the model lacked specificity. Several previous studies on MCI to AD conversion prediction have also noted the heterogeneity of the MCI-nc group using similar high dimensional pattern classification algorithms [10,51]. It is anticipated that many MCI-nc will convert to AD in the near future. Although future studies with longer follow up times will refine our estimates of specificity, the ability to detect MCI-nc many years prior to clinical diagnosis could provide useful tools for an earlier diagnosis.
Despite some promising results, there exist a number of limitations to our findings. Firstly, although we identified a number of CSF proteins showing promise in AD detection and MCI to AD conversion prediction our results are somewhat limited by the inability to validate these candidates in an independent cohort. Therefore future studies are warranted to further explore the prognostic potential of the candidates identified here in other well-characterised prospective cohorts. Nonetheless, we do show that the panel of CSF proteins for detecting AD also have a good prognostic potential for detecting AD in the prodromal or amnestic MCI stage. Secondly, CSF proteins identified in this study were from a multiplex panel of proteins known to be associated with microglial activity and synaptic function. It may be likely that an alternative set of CSF proteins unrelated to these processes could also show strong effects in detecting AD and predicting MCI to AD conversion.
In summary, the relation of CSF proteins to key neuroimaging phenotypes and traditional CSF biomarkers provides some evidence of their importance in reflecting early neuropathological changes in AD pathogenesis. Combining a subset of proteins (n = 24) from the RBM multiplex panel with established biomarkers in AD provides further evidence to implicate the role of peripheral CSF proteins for improving the accuracy and prognostic ability of biomarkers for disease classification and progression. Future studies are warranted to further validate our findings with more generalisability in other well-characterised independent cohorts.
Supporting Information S1 Table. Complete list of analytes from the RBM multiplex panel used for analysis. (DOCX) S2 Table. Complete list of regional MRI measures from the Freesurfer image analysis pipeline used for analysis. (DOCX) S3 Table. Baseline CSF proteins that were significantly associated with regional MRI measures, SPARE-AD score or CSF biomarkers in AD patients and MCI subjects (n = 207). (DOCX)