Can Genetic Analysis of Putative Blood Alzheimer’s Disease Biomarkers Lead to Identification of Susceptibility Loci?

Although 24 Alzheimer’s disease (AD) risk loci have been reliably identified, a large portion of the predicted heritability for AD remains unexplained. It is expected that additional loci of small effect will be identified with an increased sample size. However, the cost of a significant increase in Case-Control sample size is prohibitive. The current study tests whether exploring the genetic basis of endophenotypes, in this case based on putative blood biomarkers for AD, can accelerate the identification of susceptibility loci using modest sample sizes. Each endophenotype was used as the outcome variable in an independent GWAS. Endophenotypes were based on circulating concentrations of proteins that contributed significantly to a published blood-based predictive algorithm for AD. Endophenotypes included Monocyte Chemoattractant Protein 1 (MCP1), Vascular Cell Adhesion Molecule 1 (VCAM1), Pancreatic Polypeptide (PP), Beta2 Microglobulin (B2M), Factor VII (F7), Adiponectin (ADN) and Tenascin C (TN-C). Across the seven endophenotypes, 47 SNPs were associated with outcome with a p-value ≤1x10-7. Each signal was further characterized with respect to known genetic loci associated with AD. Signals for several endophenotypes were observed in the vicinity of CR1, MS4A6A/MS4A4E, PICALM, CLU, and PTK2B. The strongest signal was observed in association with Factor VII levels and was located within the F7 gene. Additional signals were observed in MAP3K13, ZNF320, ATP9B and TREM1. Conditional regression analyses suggested that the SNPs contributed to variation in protein concentration independent of AD status. The identification of two putatively novel AD loci (in the Factor VII and ATP9B genes), which have not been located in previous studies despite massive sample sizes, highlights the benefits of an endophenotypic approach for resolving the genetic basis for complex diseases. The coincidence of several of the endophenotypic signals with known AD loci may point to novel genetic interactions and should be further investigated.


Introduction
All of the common loci that have been linked to late onset Alzheimer's disease (AD) other than APOE have small effect sizes and a large portion of the predicted heritability for AD remains unidentified [1]. A number of explanations and potential sources have been postulated for this missing heritability, which is observed for many complex human diseases. Examples include rare variants with large effect sizes, epistatic interactions between multiple common alleles, inflated heritability statistics and genetic heterogeneity, among others [2].
Another approach to the identification of genes involved in Alzheimer's disease pathogenesis is to ascertain quantitative endophenotypes that are associated with AD risk and then look for genetic variants that are associated with those endophenotypes. Endophenotypes are intermediate traits that are closer to the underlying molecular mechanism than the complex phenotype, and are in principle more likely to be affected by the genetic variation. Discovering genetic and environmental factors contributing to complex human diseases, as well as the development of effective therapies often requires understanding endophenotypes of the disease. For example, discovery of genetic factors contributing to coronary artery disease and the eventual development of effective therapies based on HMG-CoA reductase inhibition was made possible by understanding the endophenotype of hypercholesterolemia [3]. Potential endophenotypes of Alzheimer's disease include quantitative neuroimaging, such as measures of hippocampal atrophy [4][5][6], or levels of amyloid or tau proteins in the brain or cerebrospinal fluid (CSF) [7][8][9][10]. An additional and still evolving source of AD biomarkers is the pool of circulating proteins in the blood [11][12][13][14].
Our objective in this project was to identify the genetic variants that impact concentrations of proteins associated with diagnostic status for Alzheimer's disease. It was expected that genotypes of some variants would be correlated with protein levels and AD status while others would be correlated with protein levels alone. We used conditional regression analysis to assess the relationship between AD risk, biomarkers, SNPs and non-genetic risk factors.

Materials and Methods
Study Cohorts-TARCC and ADNI TARCC methodologies have been described in detail elsewhere [15]. Criteria for categorizing subjects as probable AD, mild cognitive impairment (MCI) or normal control (NC) are based on neurocognitive evaluations, family and/or caregiver interviews and medical history. NC must have normal psychometric test scores and a clinical dementia rating (CDR) score of 0. MCI subjects are classified based on the Mayo Clinic Alzheimer's Disease Research Criteria [16]. Patients are characterized as probable AD according to the National Institute of Neurological and Communicative Disorders and Stroke (NINCDS) and the Alzheimer's Disease and Related Disorders Association (ADRDA) criteria [17]. Each participating site that enrolled participants operates with Institutional Review Board (IRB) approval and each of the following IRBs approved this study (University of North Texas Health Science Center IRB, University of  Texas Southwestern Medical Center IRB, Baylor College of Medicine IRB, University of Texas  Health Science Center at San Antonio IRB, Texas Tech University Health Sciences Center  IRB). Written informed consent was obtained for every participant at the site of enrollment. Data that were used in this study as a validation set were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) (adni.loni.usc.edu). Details of ADNI clinical evaluation and sample characterization are described elsewhere [18,19]. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD). The Principal Investigator of this initiative is Michael W. Weiner, MD, VA Medical Center and University of California-San Francisco. The initial goal of ADNI was to recruit 800 subjects but ADNI has been followed by ADNI-GO and ADNI-2. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_ apply/ADNI_Acknowledgment_List.pdf. To date these three protocols have recruited over 1500 adults, ages 55 to 90. The follow up duration of each group is specified in the protocols for ADNI-1, ADNI-2 and ADNI-GO. For up-to-date information, see www.adni-info.org. Demographic data for the TARCC and ADNI cohorts are provided in Table 1.

Measurement of Serum/Plasma Proteins
During clinical visits, a blood draw was collected from each subject; both plasma and serum were collected from TARCC subjects, who were non-fasting, whereas only plasma was collected from ADNI subjects, who were fasting. Plasma and serum were isolated from whole blood samples as described previously for each cohort [15] , [18,19]. Frozen specimens (serum for TARCC subjects and plasma for ADNI subjects), either from baseline or from the year-one

Genotyping
The TARCC cohort was genotyped using the Genome-Wide Human SNP Array 6.0 (Affymetrix, Santa Clara, CA), which includes 906,600 SNP markers. The ADNI cohort was genotyped using the Illumina 610-Quad BeadChip (Illumina, San Diego, CA), which includes 550,000 SNP markers. Both panels obtain genome-wide coverage. The BirdSeed v2 algorithm [20] was manually optimized and used for genotype calling.

Quality Control Measures
Locally developed Java programs (collectively termed MACHTools) were used to perform critical data quality checking/filtering, imputation analysis, and data restructuring to affect overall computational performance. Participants were excluded from analysis if blood protein concentration data were not available, if the recorded sex did not agree with chromosome markers or if >5% of the markers did not successfully run. Markers were excluded from analysis if 5% of samples were missing, if they were monomorphic (threshold set at 0.01) or if they were out of Hardy-Weinberg Equilibrium (threshold set at 0.000001). In addition, genotype calls for important markers were manually checked independent of phenotype and recalled as necessary in order to account for obvious atypical hybridization intensities (such as discussed in (Didion et al.) [21][21] (21).This checking was conducted on the entire sample, without knowledge of diagnostic status or phenotype. Results were limited to loci with a minor allele frequency greater than 5%.

Data Analysis
An analysis pipeline was developed to fully analyze these GWA data in association with the quantitative RBM traits (Fig 1). Principle component analysis was performed using the Eigenstrat tool [22] for population substructure covariate determination. Relevant eigenvectors were used as covariates in the analyses, along with sex and education. The following plasma/serum protein concentrations were used as quantitative phenotypes: Adiponectin, Beta 2 Microglobulin, Factor VII, Monocyte Chemotactic Protein 1, Pancreatic Polypeptide, Tenascin C, and Vascular Cell Adhesion Molecule 1 (Table 2). Additionally, age-of-onset and case/control status were analyzed. Preliminary linear mixed model regressions were generated for all quantitative phenotypes using PLINK [23]. Phasing, imputation (using data from the HapMap II, HapMap III and 1000 Genomes databases), and subsequent regressions with the newly imputed GTs were performed using custom applications of the program MaCH [24]. Genotype calls at all significant SNPs were manually checked for proper clustering as described above. After re-clustering, the association regressions and imputation analyses were repeated. This iterative loop was executed three times. Lambda calculations and QQ plots were used to confirm the absence of underlying biases and/or confounders. Manhattan plots were generated for each GWA study in both TARCC and ADNI. Both Manhattan and QQ plots were generated using ggplot2 in R [25]. Final association results for typed and imputed SNPs in both TARCC and ADNI data sets were analyzed in conjunction using Metal [26]. For each quantitative trait, signals with p-values 1x10 -7 in the meta-analysis were further investigated by plotting the local 1Mbp window (+/-500Kbp) for all three association studies (TARCC, ADNI, and Metal) using Locus-Zoom [27]. Only signals that were significant in the meta analysis at p 1x10 -7 and showed evidence of a significant peak in both the TARCC and ADNI cohorts were reported.

Conditional Regression Methods
Relationships between diagnostic status, each protein biomarker and its associated genotypes were assessed in a series of conditional regressions. In this set of experiments, we included only combinations of proteins and genotypes that were identified as significant in the meta-analyses of both TARCC and ADNI cohorts ( Table 3). The conditional regression analyses were conducted using a pair of analytical design models. The first analytical design was to use AD status (AD or NC) as the dependent variable, with either protein concentration or genotype as the independent variable. The resulting residuals from this regression were then used as the dependent variable in a second regression with either genotype or protein concentration as the independent variable. The second design was to use each protein concentration as the dependent variable with either AD status or genotype as the independent variable. The resulting residuals from this regression were then used as the dependent variable in a second regression with either genotype or AD status as the independent variable.
All conditional regressions were performed in R. The glm package was used for regressions when the dependent variable was continuous (protein concentration) and the lm package was used when the dependent variables were non-parametric (AD status or residuals). All initial regression equations were adjusted for sex, years of education and population substructure (10 most relevant Eigenvectors from the principal components analysis). For these analyses, an adjusted p-value of 0.01 in either cohort, or 0.05 in both cohorts was considered significant.

Results
Quantile-quantile (QQ) plots showed no evidence of population substructure or inflation due to mistyped SNPs for any trait in either cohort (Figs 2-5). Meta-analyses showed many interesting signals (supplemental data), including a strong replication of the association between AD status and variants in the APOE/TOMM40 region. Conversely, no genome-wide significant (GWS; p<1x10 -7 ) associations were observed for age of onset of disease symptoms. In this paper, we focus on four associations between genetic loci and three of the seven   Table 3. Genome-wide significant signals for each endophenotype. P-values, chromosomal and gene location are presented for each signal from the meta-analyses (Meta) and from the individual (ADNI) and (TARCC) cohorts. P-values are also shown for the association between each endophenotypic signal and age of onset (AOO) and case-control status (CC). endophenotypes analyzed. In these four instances, the association reached GWS in the metaanalyses and evidence for each of the signals was observed in both the TARCC and ADNI cohorts ( Table 3). The strongest associations were found for blood concentrations of Factor VII (F7). This signal contained 12 SNPs on chromosome 13 within the F7 gene that were associated with serum/ plasma concentrations of Factor VII at genome wide significance (Fig 2). Significance within this region ranged from p = 9.66x10 -7 to p = 2.67x10 -8 . The signal within F7 is a sharp peak that roughly corresponds to the width of the F7 gene.
Seventeen SNPs on chromosome 18 were significantly associated with serum/plasma levels of monocyte chemoattractant protein -1 (MCP-1) (Fig 3). These polymorphisms were concentrated within the ATP9B gene. Significance within this signal ranged from p = 9.70x10 -7 to p = 3.88x10 -7 . As with F7, the width of the ATP9B signal corresponds to the length and location of the ATP9B gene. Associations for many SNPs throughout the entire ATP9B gene region are elevated, forming a plateaued signal.
One of the many interesting associations that did not have support in both cohorts was between blood concentrations of MCP-1 and polymorphisms on chromosome 6 within the triggering receptor expressed on monocytes (TREM1) gene (Fig 3). In this case, the meta- analysis showed an association that reached genome wide significance (Table 3), but the signal was only apparent within the ADNI cohort (Fig 3). Four SNPs on chromosome 3 were significantly associated with serum/plasma levels of adiponectin (Table 3). These polymorphisms were concentrated within the MAP3K13 gene. Significance within this signal ranged from p = 9.52x10 -7 to p = 3.16x10 -8 . Unlike the signals within F7 and ATP9B, the width of the MAP3K13 signal for adiponectin is much narrower than the MAPK13 gene (Fig 4). In addition, although the signal reaches 10 −8 for a pair of SNPs, there are far fewer SNPs in the MAP3K13 signal compared to the signals in F7 and ATP9B.
Despite the limited samples size in the present study, several previously reported associations for case control status were replicated at p0.05 and all published SNPs that were in the dataset showed a trend for association ( Table 4). The most strongly associated SNP (rs2075650), which reached GWS for association with case control status (Fig 5) is located within the intron of the translocase of outer mitochondrial membrane 40 (TOMM40) gene [28]. TOMM40 is in the same region as the APOE gene, but has been reported to contribute additional genetic risk for AD [28].
Conditional regression analyses recapitulated the GWAS results for F7, MCP-1 and adiponectin (Table 5). In addition, there were significant associations between AD status and blood concentrations of F7, MCP-1 and adiponectin, which were expected, given the membership of these proteins in the AD biomarker panel [12]. Finally, conditional regression analyses suggested that the SNPs contributed to variation in protein concentration independent of AD status. It was not possible to determine whether genotypes also contributed directly to disease risk.

Discussion
The use of quantitative endophenotypes as outcome variables in genome-wide association studies has proven to be useful for identifying the genetic basis of complex disease [29][30][31][32][33][34]. This method is likely to be maximally effective for diseases that exhibit significant phenotypic heterogeneity, such as Alzheimer's. The use of endophenotypes presumably provides increased statistical power due to greater proximity of the outcome variable to functional genetic variants, which reduces the impact of confounding non-genetic factors.
The strongest overall signal in the meta-analysis was between diagnostic status for Alzheimer's disease and a group of SNPs in the region of the APOE gene. Given the well-replicated strength of the APOE signal, this result was not surprising even with the small sample size that was employed in the present study. The strongest signal observed in the meta-analysis was between the serum/plasma concentration of F7 and a group of SNPs within the F7 gene on chromosome 13. Factor VII is a serine protease that is a key member of the coagulation cascade [35]. Along with tissue factor, F7 is responsible for initiating the coagulation cascade. The process begins with release of tissue  factor from the external wall of blood vessels following vascular injury. Once inside the circulation, tissue factor binds to F7, which is converted to F7a, leading to conversion of factors IX and X into active proteases; factors IXa and Xa [35]. Factor VII is a vitamin K dependent enzyme and the target of warfarin and other anticoagulants that are used to prevent thrombosis and thromboembolism [36]. Serum concentrations of F7 were negatively associated with AD status in prior work [12]. Polymorphisms within the F7 gene have not been suggested previously as contributing to AD risk, despite multiple large-scale studies. Nevertheless, a SNP within this region (rs6046) has been associated with variation in risk for cardiovascular disease, venous thrombosis and stroke [37][38][39][40][41]; conditions that are associated with risk for AD and other forms of dementia. The rs6046 polymorphism, which is located in exon 9 and is predicted to cause the substitution of glutamine in place of arginine at amino acid position 353 (R353Q), has been shown to result in reduced levels of F7 activity [39]. The haplotype containing this SNP has been reported as both protective and a risk factor for coagulation related disease phenotypes [39][40][41]. The rs6046 SNP was associated with a later age of AD onset in our study.
Monocyte Chemoattractant Protein -1 (MCP-1) is one of the key chemokines involved in the regulation of monocyte and macrophage migration during the inflammatory response (see Deshmane et al. 2009 for review [42]). A variety of cells produce MCP-1, including epithelial, smooth muscle, astrocytes and microglial cells [42]. Increased MCP-1 has been shown to contribute to a variety of disease states, including Alzheimer's disease, [43] atherosclerosis [44,45], increased risk for AD following traumatic brain injury [46], insulin resistance [47], and Table 5. Results of conditional regression analyses. For each regression equation, dependent variables are listed in the top row, independent variables in the second row. For conditional regressions (two columns to the far right) the dependent variables were the residuals from an initial regression and the independent variables were the genotypes of candidate SNPs. [Protein~Disease status] indicates that protein concentration was the dependent variable and disease status was the independent variable in the initial regression. Similarly, [Disease status~Protein] indicates that disease status was the dependent variable and protein concentration was the independent variable in the initial regression. Correlation statistics indicating the amount of variance explained by the independent variable is presented where appropriate for each regression. Results of regression analyses recapitulated the GWAS results for F7, MCP-1 and adiponectin (column one). In addition, there were significant associations between AD status and blood concentrations of F7, MCP-1 and adiponectin (column two). Conditional regression analyses suggested that candidate SNPs contributed to variation in protein concentration independent of AD status (column three). None of the results of regressions interrogating whether genotypes contributed directly to disease risk independent of protein endophenotype were significant (column four). However, due to insufficient statistical power, it was not possible to determine the true relationship between these factors. neuronal death following ischemia [48]. Serum concentrations of MCP-1 were negatively associated with AD status in prior work [12]. We observed a significant signal in association with MCP-1 levels located on chromosome 18 in the coding region for ATP9B (ATPase, class II, type 9B). ATP9B is a class 2 P4-ATPase. Generally speaking, the P4-ATPases orchestrate phospholipid translocation from the exoplasmic to cytoplasmic leaflet which is critical for the maintenance of biological membrane characteristics and protein trafficking through vesicular transport. Alterations in the functionality of this family of flippases have been associated with multiple diseases and disorders (e.g., variants in ATP8B4 have been associated with Alzheimer's disease [49,50]). The ATP9B gene product has recently been shown to function independent of the CDC50 subunit complex, a characteristic unique to the class 2 P4-ATPases, and localize specifically to the trans-Golgi network [51]. The implied relationship between MCP-1 levels and the function of ATP9B gene products is not clear.

Dependent
The adipocyte-derived hormone adiponectin (also known as 30-kDa adipocyte complement-related protein; Acrp30) has been mapped to a susceptibility locus for type 2 diabetes within the AdipoQ gene [52]. AdipoQ is shown to be dysregulated in obesity, metabolic syndrome, and cardiovascular disease [53][54][55]. Adipose tissues secrete many factors into the bloodstream, such as leptin, TNF-α, adipsin, and adiponectin. These proteins are referred to as adipocytokines, and are secreted to sensitize the tissues to insulin [52]. Levels of adiponectin in the blood are lower in individuals with diabetes, insulin resistance, and obesity. [56] Serum concentrations of adiponectin were positively associated with AD status in prior work [12].
We observed two significant signals in association with adiponectin levels. The first signal is located on chromosome 3 within the MAP3K13 gene. The MAP3K13 gene encodes a protein kinase that is expressed most strongly in the pancreas, brain, and liver, but not detected in heart, lung, skeletal muscle, or kidney. Protein kinases are involved in a litany of pathways, however MAP3K13 has been shown to be involved in the stress-activated JNK1 pathway [57]. This gene's specific involvement with adiponectin expression is unclear.
The second signal associated with adiponectin levels was located on chromosome 19 within a zinc finger gene, ZNF320. Zinc fingers are a heterogeneous class of protein structural motifs that are involved in the expression or repression of genes. Specifically, ZNF320 has been shown to be implicated in glioblastoma. The ZNF320 gene's involvement with adiponectin expression is also unclear.
Results of the current study appear to confirm the utility of an endophenotypic approach to the analysis of GWAS data from complex traits. Four novel associations were observed between genetic variants and endophenotypes for AD. Polymorphisms within the F7, ATP9B, MAP3K13 and ZNF320 genes have not been suggested previously as contributing to AD risk, despite multiple large-scale GWAS studies [1,[58][59][60][61][62][63][64][65]. Interestingly, meta-analysis showed that two SNPs within the F7 gene (including rs6046) showed a trend for association with age at disease onset in our sample (Table 3). Unsurprisingly due to the relatively small sample, none of the SNPs were associated with diagnostic status.
In an attempt to determine the specifics of the most likely biological relationship model for AD, a series of conditional regressions were performed that assessed the relationships between diagnostic status and non-genetic factors as well as specific protein biomarkers and their associated genotypes. The majority of significant associations recapitulated either current GWAS relationships between specific SNPs and blood protein concentrations, or relationships inherent to the AD biomarker panel between blood protein concentrations and disease status. In addition, associated genotypes explained a significant amount of the variation that remained in protein concentration after disease status and covariates (sex, population substructure eigenvectors) were accounted for. It was not possible to determine conclusively whether associated SNPs also contributed directly to AD risk, independent of protein concentration.
It was interesting that several AD loci that have been reliably reported in the literature were associated with our AD endophenotypes. If these associations are confirmed in independent cohorts, they may help to explain the etiological mechanisms and functional variants that are responsible for these previously published associations.
There a number of caveats to these results. First, although the observations reported were derived from a meta-analysis of two entirely independent cohorts of study participants, the sample sizes were small. This is particularly true in light of recent publications by the IGAP group, which were based upon an international sample of nearly 75,000 individuals. However, the analytical approach that we adopted provided much greater statistical power than would have been possible with a traditional GWAS, where diagnostic status is used as the outcome variable.
In summary, the use of endophenotypes for Alzheimer's disease in the place of diagnostic status as the outcome variable in GWAS analysis overcame sample size constraints and allowed the identification and independent replication of two putative novel genetic loci that appear to impact risk for AD. Polymorphisms in F7 and ATP9B may impact the risk or development of AD and should be studied in a larger, independent cohort. funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (http://www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer 0 s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of California, Los Angeles. Investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. Partial support and computer resources were provided by the Renaissance Computing Institute (RENCI) at the University of North Carolina at Chapel Hill.