Alzheimer’s Disease Risk Polymorphisms Regulate Gene Expression in the ZCWPW1 and the CELF1 Loci

Late onset Alzheimer’s disease (LOAD) is a genetically complex and clinically heterogeneous disease. Recent large-scale genome wide association studies (GWAS) have identified more than twenty loci that modify risk for AD. Despite the identification of these loci, little progress has been made in identifying the functional variants that explain the association with AD risk. Thus, we sought to determine whether the novel LOAD GWAS single nucleotide polymorphisms (SNPs) alter expression of LOAD GWAS genes and whether expression of these genes is altered in AD brains. The majority of LOAD GWAS SNPs occur in gene dense regions under large linkage disequilibrium (LD) blocks, making it unclear which gene(s) are modified by the SNP. Thus, we tested for brain expression quantitative trait loci (eQTLs) between LOAD GWAS SNPs and SNPs in high LD with the LOAD GWAS SNPs in all of the genes within the GWAS loci. We found a significant eQTL between rs1476679 and PILRB and GATS, which occurs within the ZCWPW1 locus. PILRB and GATS expression levels, within the ZCWPW1 locus, were also associated with AD status. Rs7120548 was associated with MTCH2 expression, which occurs within the CELF1 locus. Additionally, expression of several genes within the CELF1 locus, including MTCH2, were highly correlated with one another and were associated with AD status. We further demonstrate that PILRB, as well as other genes within the GWAS loci, are most highly expressed in microglia. These findings together with the function of PILRB as a DAP12 receptor supports the critical role of microglia and neuroinflammation in AD risk.


Introduction
Late onset Alzheimer's disease (LOAD) is a complex, heterogeneous disease with a strong genetic component (reviewed in [1]).APOEε4 is the strongest genetic risk factor for LOAD: carrying one copy of APOEε4 increases AD risk by 3 fold and carrying two copies of APOEε4 increases AD risk by 8-10 fold (reviewed in [2]).However, only 50% of LOAD cases carry an APOEε4 allele, suggesting that other genetic factors contribute to risk for LOAD.
In the last six years, genome wide association studies (GWAS) have facilitated the analysis of millions of single nucleotide polymorphisms (SNPs) in tens of thousands of samples [3][4][5][6][7][8][9][10]. The International Genomics of Alzheimer's Project (IGAP) has recently applied this approach to LOAD case and control studies in 74,046 individuals, revealing 21 loci that modify LOAD risk: ABCA7, APOE, BIN1, CASS4, CD2AP, CD33, CELF1, CLU, CR1, EPHA1, FERMT2, HLA-DRB5/DRB1, INPP5D, MEF2C, MS4A6A, NME8, PICALM, PTK2B, SLC24A4, SORL1, and ZCWPW1 [9].The IGAP GWAS genes fall into several common pathways that have been previously implicated in AD: neural development, synapse function, endocytosis, immune response, axonal transport, and lipid metabolism (reviewed in [1]).However, the specific effects of these SNPs on gene function and the resulting impact on disease remains poorly understood [11][12][13][14].Two aspects of GWAS approaches have limited the interpretations that we can make regarding the functional impact of these SNPs on the molecular mechanisms underlying AD.First, the majority of the most significant GWAS SNPs are located in non-coding or gene-dense regions, making it challenging to identify which gene the SNP is modifying.Second, the majority of GWAS top SNPs are in high linkage disequilibrium (LD) with many SNPs, which in some cases span hundreds of kilobases, making it difficult to determine which SNP is the functional variant responsible for modifying LOAD risk.
Despite the identification of additional, novel GWAS loci that modulate LOAD risk, we still know little of the functional impact of LOAD GWAS SNPs and the role of these genes in AD pathogenesis.We sought to examine functional effects of IGAP GWAS SNPs by examining eQTLs in several human brain expression cohorts.To do this, we identified all of the genes that fell within the LD block for each IGAP GWAS locus.We then analyzed eQTLs and association with AD status.rs1476679 and rs7120548 are associated with PILRB and MTCH2 expression, respectively.Additionally, the expression of several genes within the CELF1 locus, including MTCH2, were highly correlated and were associated with AD status.Importantly, these significant eQTLs and expression differences in LOAD brains were observed in genes that occur within the IGAP GWAS loci but not the named IGAP GWAS gene.Together, our findings demonstrate that several LOAD risk variants modify expression of nearby genes and may contribute to LOAD risk.
The majority of GWAS SNPs occur in regions of high LD that span multiple genes [9].Thus, we asked whether IGAP GWAS SNPs alter expression of genes that are within the LD block rather than the genes immediately under the SNP with the highest p-value.Manhattan plots reported in Lambert et al. were used to identify all of the genes within the LD block for each IGAP GWAS SNP (Table 2) [9].Eleven of the 21 IGAP GWAS SNPs have multiple genes within the LD block.We tested whether the IGAP GWAS SNPs have functional effects on gene expression by examining all of the genes within each region.

eQTLs in AD Risk Loci
To determine whether IGAP GWAS SNPs modify expression of genes within the GWAS loci, we examined cis-eQTLs in a publically available dataset from neuropathologically confirmed normal control brains (UKBEC [20]; Table 3 and S2 Table ).Rs1476679 was significantly associated with expression of multiple PILRB transcripts in most brain regions (Table 3).Several transcripts shared between PILRB and PILRA were associated with rs1476679.However, transcripts specific to PILRA did not exhibit an eQTL with rs1476679, suggesting that the effect is driven by differences in PILRB specifically (Table 3).GATS, which is also present within the LD block of the IGAP SNP, had a single transcript that also displayed an eQTL with rs1476679 in most brain regions (Table 3).The IGAP SNP, rs9331896, was significantly associated with CLU expression in the white matter, hippocampus, temporal cortex and occipital cortex (Table 3).EQTLs were also observed between rs6656401 and CR1, CR2 and CR1L, all of which occur within the LD block for rs6656401.The IGAP SNP rs10838725, occurs in a gene dense region and several genes within this region exhibited eQTLs with the IGAP SNP: CELF1, NDUF3, KBTD4, PTPMT1, MTCH2, FNBP4, MADD and NUP160 (Table 3).IGAP SNPs rs983392, rs10792832, rs2718058, and rs7274581 exhibited eQTLs with MS4A6A, EED, GPR141, and CASS4, respectively (Table 3).Although several loci showed nominal association, only the CR1 eQTL in WHMT survived a strict multiple test correction (Bonferroni p = 3.9x10 -5 ).Our initial analyses were performed using the candidate genes manually selected from genes within the IGAP GWAS loci.We next applied an unbiased approach to determine which genes are most highly associated with the IGAP GWAS SNPs (UKBEC; S3 Table ).A subset of the HLA-DRB6, HLA-DQA1, HLA-DQB1  IGAP SNPs, rs6656401, rs9331896, rs28834970, and rs10498633, and rs190982, were associated with expression of the named IGAP gene, CR1, CLU, PTK2B, SLC24A4, and MEF2C, respectively (S3 Table ).For the majority of the IGAP SNPs, genes that occur within the LD block for the IGAP GWAS loci are among the ten most highly associated eQTLs; however, the associations failed to achieve statistical significance (S3 Table ).
In a third replication dataset containing expression and genotype information from AD and control brains (GSE15222), we were able to replicate the eQTL between rs1476679 and PILRB (p = 0.0022; Table 5; Bonferroni p = 0.003).Additionally, we replicated the eQTL observed in the UKBEC dataset between rs7120548 and MTCH2, a gene located in the CELF1 locus (p = 0.0011; Table 5) [22].In GSE15222, very few genes were present in the cleaned dataset that occur within the GWAS loci for the IGAP SNPs (e.g.CLU and CR1), making it impossible to independently replicate a subset of eQTLs (S6 Table ).
Thus, three independent datasets demonstrate that rs1476679 is associated with altered PILRB expression in multiple brain regions.Additionally, these datasets provide evidence for a much more complex picture of AD genetic risk than was previously reported in the original IGAP GWAS: (1) the majority of IGAP GWAS SNPs do not significantly affect expression of nearby genes in brain homogenates and (2) eQTLs occur in genes that are near the IGAP SNP but not that have been named as an IGAP gene.Identifying the most significant eQTL SNP within IGAP GWAS loci Because the majority of GWAS top SNPs are in high LD with many SNPs, it is difficult to determine which SNP is the functional variant responsible for modifying LOAD risk.To determine whether other SNPs within the IGAP GWAS loci more significantly contribute to eQTLs, we identified all SNPs within the IGAP GWAS loci with a p-value of 10 −5 or lower [9].We then used an unbiased approach to determine which genes are most highly associated with the SNPs within the IGAP GWAS loci (UKBEC; S7 and S8 Tables).Assuming the most stringent cut-off for multiple test correction (p = 10 −6 ) [20], we identified SNPs within the CR1, ZCWPW1, CLU, and PTK2B loci that produced significant eQTLs (S7 Table ).
To determine whether the SNPs that produce the most significant eQTL within each IGAP GWAS locus represent the same signal as the GWAS top SNP or an independent signal, we tested for the association of the IGAP GWAS top SNP with AD risk and then conditioned the analysis based on the most significant eQTL SNP within each locus (S9 Table ).Using this approach, in the ADGC subset of the IGAP dataset, we found that for each locus, the most significant eQTL SNP and the IGAP top SNP represented the same signal.Thus, while we identified SNPs within IGAP GWAS loci that produce stronger eQTLs than the IGAP top SNP, these SNPs are likely marking a single risk locus.

Expression differences in AD brains
To determine whether the named LOAD GWAS genes or genes within the GWAS loci exhibit altered expression in AD brains, we examined gene expression in a study of laser micro-dissected neurons (GSE5281; Table 6).MTCH2 expression was significantly associated with AD status, where expression levels were lower in AD cases compared with controls (p = 2.2x10 -12 and 2.9x10 -12 ; Table 6; Bonferroni p = 5x10 -4 ).PILRB expression was also associated with disease status, where PILBR expression was lower in AD cases compared with controls (p = 1.2x10 -3 ; Table 6).Expression of GATS, also within the ZCWPW1 locus, was associated with AD status in the same direction as PILRB (p = 2.1x10 -7 ; Table 6).

Cell-type specific expression of genes within the GWAS loci
Evidence from multiple, independent datasets have identified eQTLs between IGAP SNPs and PILRB and multiple genes within the CELF1 locus (including MTCH2).We have also observed altered expression levels of PILRB and GATS within the ZCWPW1 locus and MTCH2 and other genes within the CELF1 locus in AD brains.This could be due to differences in expression within a cell or to differences in the numbers of cells in which these genes are expressed.To determine whether genes within the CELF1 locus and other IGAP GWAS loci are preferentially expressed in certain cell-types in the brain, we examined a dataset containing RNAseq performed in isolated cell-types in the mouse brain (http://web.stanford.edu/group/barres_lab/brain_rnaseq.html[23]).We found that genes within the CELF1 locus are most highly expressed in non-neuronal cell-types (S11 Table ).This suggests that genes within this region act cooperatively to modify AD risk.We examined cell-type specific expression of all of the genes within the IGAP GWAS loci (S11 Table ).We found that the majority of genes within the IGAP GWAS loci are most highly expressed in microglia (37%): MEF2C, BIN1, PICALM, CD33, CSTF1, HLA-DRB1, HLA-DQA1, HLA-DQB1, RIN3, INPP5D, PILRA, SLC39A13, CASS4, and PTK2B.To a lesser extent, genes within the IGAP GWAS loci are expressed in endothelial (20%), oligodendrocytes (20%), and astrocytes (17%).Neuronally expressed genes, which have been the central focus of functional studies regarding these and other AD risk genes, only represent 11% of IGAP GWAS loci: ABCA7, MADD, CELF1, and MEF2C.These findings provide further evidence of the complex interplay between genotype, expression, and cell-type that mediates AD risk.

Discussion
Recent studies have identified novel GWAS loci that modulate LOAD risk; however, we still know little of the functional impact of LOAD GWAS SNPs and the role of these genes in AD pathogenesis.In this study, we examined functional effects of IGAP GWAS SNPs by examining eQTLs in several human brain expression cohorts.We found that rs1476679 and rs7120548 are consistently associated with PILRB and MTCH2 expression across multiple cohorts, respectively.Additionally, expression of several genes within the CELF1 locus, including MTCH2, were associated with AD status.From this study, we have generated two important findings: (1) the majority of IGAP GWAS SNPs do not significantly affect expression of nearby genes in human brain homogenates and (2) eQTLs occur in genes that are near the IGAP SNP but that are not named as an AD risk gene.
PILRB is a paired immunoglobin-like type 2 receptor that is involved in regulation of immune response [24].PILRB contains highly related activating and inhibitory receptors.PILRA is the inhibitory counterpart to PILRB.PILRB, through activation, and PILRA, through inhibition, function cooperatively to control cell signaling via SHP-1, which mediates dephosphorylation of protein tyrosine residues.PILRA and PILRB are mainly expressed by cells of the myeloid lineage [24].PILRB associates with DAP12, a signaling adaptor protein that is cleaved by γ-secretase and associates with TREM2, another AD risk gene [25][26][27][28].PILRB also contains a sialic acid binding domain, similar to the one described for CD33 [11,14,29].Rs1476679 produced an eQTL with PILRB transcripts in human brain homogenates as well as in monocytes (Table 1) [30], suggesting that this AD risk SNP may influence PILRB expression in microglia in the brain.
One hypothesis based on our observation that multiple genes within the CELF1 loci have eQTLs or are associated with AD status is that there is a key regulator within this region that is influencing the expression of many genes.MTCH2 is a mitochondrial carrier protein that induces mitochondrial depolarization [31].MTCH2 associates with truncated BID to activate apoptosis [31].MTCH2 interacts with presenilin 1 [32,33].A second mitochondrial protein that displayed some eQTL evidence and association with disease status, NDUFS3, also occurs within the CELF1 locus.NDUFS3 is a component of the NADH-ubiquinone oxidoreductase (Complex 1).NDUFS3 occurs in KEGG pathways for AD, Parkinson's disease, and Huntington's disease (KO05010, KO05012, KO05016).The third gene within this locus, NUP160, with some evidence of an eQTL and altered expression in AD brains, is a key component of the nuclear pore complex, which mediates nucleoplasmic transport.NUP160 has an extremely long half-life and is thus susceptible to oxidative and age-related damage.Age-related defects in NUP160 and the nuclear pore complex has been proposed to contribute to abnormal protein trafficking, and in turn to neurodegenerative diseases [34,35].PTPMT1 is a lipid phosphatase that dephosphorylates mitochondria proteins, which in turn regulates mitochondrial membrane integrity.PSMC3 encodes the 26S proteasomal subunit, which plays a critical role in ATP-dependent degradation of ubiquitinated proteins.MTCH2, NUP160, NDUFS3, PTPMT1, and PSMC3 expression are highly correlated in human brains and this correlation is lost in AD brains.
Our cell-type specific expression studies illustrate that the majority of the genes expressed within the IGAP GWAS loci are most highly expressed in microglia.These findings illustrate the important role of immune response and clearance in LOAD pathogenesis.This has been further supported in recent studies demonstrating that genetic variants linked with neurodegeneration are more likely to affect gene regulation in monocytes than in T cells [36].
One caveat to the study is that not all of the genes present within all of the IGAP loci were present in the cleaned expression dataset for GSE15745.Thus, we cannot exclude the possibility that other genes within the loci also have significant eQTLs with the IGAP SNPs.Most of these eQTL studies are also based on RNA extracted from brain homogenates, thus eQTLs in cells that represent a minority of cells within that tissue homogenate may not be detectable using this approach.It also remains possible that GWAS SNPs drive changes at the protein level or drive transient changes in human brains.However, our findings of several strong associations with IGAP SNPs and expression of genes that were not named as AD risk genes emphasizes that the IGAP SNPs with putative functional effects may act on genes within the GWAS loci rather than the genes immediately under the most significant IGAP SNP.

Publically available expression datasets
GSE15745.The GSE15745 dataset was obtained from control brains [21].Brains from 150 neurologically normal individuals of European descent were obtained from the Department of Neuropathology, Johns Hopkins University, Baltimore and from the Miami Brain Bank.Brain tissue was collected from the cerebellum, frontal cortex, pons and the temporal cortex.The samples were 31.3%female with a mean age of 45.8 years (range 15-101) and an average PMI of 14.3 hours.SNP genotyping was performed on DNA extracted from cerebellar tissue for each subject using Infinium HumanHap550 version 3 BeadChips.RNA expression was measured using HumanRef-8 Expression BeadChips (Illumina).To analyze RNA expression residual values were used that were log transformed and incorporated gender, age, and PMI as covariates [21].
GSE15222.The GSE15222 dataset was used to examine eQTLs [37].Neuropathologically confirmed AD (n = 176) or normal controls (n = 188) of self-identified individuals of European descent, were obtained from 20 National Alzheimer's Coordinating Center (NACC) brain banks and from the Miami Brain Bank.The 188 control brains came from one of three brain regions: 21% frontal cortex, 73% temporal cortex and 2% parietal cortex.The samples were 45% female with a mean age of 81 years (range 65-100) and an average post mortem interval (PMI) of 10 hours.The 176 LOAD brains were composed of 18% frontal cortex, 60% temporal cortex and 10% parietal cortex.The samples were 50% female with a mean age of 84 years (range 68-102) and an average PMI of 9 hours.An Affymetrix 500K chip was used to obtain genotype data, and an Illumina ref-seq 8 chip was used to obtain RNA expression data.To analyze RNA expression, residual values were used that were log transformed and then gender, APOE genotype, age, hybridization date, site, and PMI were included as covariates.
GSE5281.The GSE5281 dataset was obtained from laser microdissected neurons from AD and control brains [38].Brain samples from 47 individuals of European descent that were collected from Washington University, Duke University, and Sun Health Research Institute were included in the study.Samples were clinically and neuropathologically confirmed AD or controls.The 33 AD samples were 54.5% female with a mean age of 79.9 years (range 73-86.8)and an average PMI of 2.5 hours.The 14 control brains were 28.6% female with a mean age of 79.8 years (range 70.1-88.9).All samples were obtained from the entorhinal cortex, hippocampus, medial temporal gyrus, posterior cingulate, superior frontal gyrus, and primary visual cortex.RNA expression was measured using an Affymetrix GeneChip for gene expression.To analyze RNA expression, the log transformed expression values were analyzed with brain region, age, and gender as covariates.

IGAP LOAD GWAS
International Genomics of Alzheimer's Project (IGAP) is a large two-stage study based upon genome-wide association studies (GWAS) on individuals of European ancestry.In stage 1, IGAP used genotyped and imputed data on 7,055,881 single nucleotide polymorphisms (SNPs) to meta-analyze four previously-published GWAS datasets consisting of 17,008 Alzheimer's disease cases and 37,154 controls (The European Alzheimer's disease Initiative-EADI the Alzheimer Disease Genetics Consortium-ADGC The Cohorts for Heart and Aging Research in Genomic Epidemiology consortium-CHARGE The Genetic and Environmental Risk in AD consortium-GERAD).In stage 2, 11,632 SNPs were genotyped and tested for association in an independent set of 8,572 Alzheimer's disease cases and 11,312 controls.Finally, a meta-analysis was performed combining results from stages 1 & 2.

Statistical analysis
Relative gene expression values were log transformed to achieve a normal distribution.To identify covariates that influence the expression of each gene, a stepwise discriminant analysis was performed using CDR, age, gender, disease status, PMI (post mortem interval), RIN (RNA integrity number), and APOE genotype.After applying the appropriate covariates to the model, analysis of covariance (ANCOVA) was used to test for association between genotypes and gene expression.SNPs were tested using an additive model.All analyses were performed using statistical analysis software (SAS).Conditional analyses were performed by adjusting for the most significant eQTL SNP within each IGAP GWAS locus to determine whether the eQTL SNP represented an independent association.Additional covariates included in the analyses were age, gender, principal components 1-3, and site.

Table 1 .
Regulatory effects of IGAP top SNPs.

Table 6 .
Expression of IGAP GWAS loci is associated with disease status in GSE5281.