Correction
11 Feb 2011: Jones L, Holmans PA, Hamshere ML, Harold D, Moskvina V, et al. (2011) Correction: Genetic Evidence Implicates the Immune System and Cholesterol Metabolism in the Aetiology of Alzheimer's Disease. PLOS ONE 6(2): 10.1371/annotation/a0bb886d-d345-4a20-a82e-adce9b047798. https://doi.org/10.1371/annotation/a0bb886d-d345-4a20-a82e-adce9b047798 View correction
Figures
Abstract
Background
Late Onset Alzheimer's disease (LOAD) is the leading cause of dementia. Recent large genome-wide association studies (GWAS) identified the first strongly supported LOAD susceptibility genes since the discovery of the involvement of APOE in the early 1990s. We have now exploited these GWAS datasets to uncover key LOAD pathophysiological processes.
Methodology
We applied a recently developed tool for mining GWAS data for biologically meaningful information to a LOAD GWAS dataset. The principal findings were then tested in an independent GWAS dataset.
Principal Findings
We found a significant overrepresentation of association signals in pathways related to cholesterol metabolism and the immune response in both of the two largest genome-wide association studies for LOAD.
Significance
Processes related to cholesterol metabolism and the innate immune response have previously been implicated by pathological and epidemiological studies of Alzheimer's disease, but it has been unclear whether those findings reflected primary aetiological events or consequences of the disease process. Our independent evidence from two large studies now demonstrates that these processes are aetiologically relevant, and suggests that they may be suitable targets for novel and existing therapeutic approaches.
Citation: Jones L, Holmans PA, Hamshere ML, Harold D, Moskvina V, Ivanov D, et al. (2010) Genetic Evidence Implicates the Immune System and Cholesterol Metabolism in the Aetiology of Alzheimer's Disease. PLoS ONE 5(11): e13950. https://doi.org/10.1371/journal.pone.0013950
Editor: Joseph El Khoury, Massachusetts General Hospital and Harvard Medical School, United States of America
Received: June 23, 2010; Accepted: October 6, 2010; Published: November 15, 2010
Copyright: © 2010 Jones et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: Cardiff University was supported by the Wellcome Trust, Medical Research Council (MRC, UK), Alzheimer's Research Trust (ART) and the Welsh Assembly Government. ART supported sample collections at the Institute of Psychiatry, the South West Dementia Bank and the Universities of Cambridge, Nottingham, Manchester and Belfast. The Belfast group acknowledges support from the Alzheimer's Society, Ulster Garden Villages, Northern Ireland Research and Development Office and the Royal College of Physicians-Dunhill Medical Trust. The MRC and Mercer's Institute for Research on Ageing supported the Trinity College group. The South West Dementia Brain Bank acknowledges support from Bristol Research into Alzheimer's and Care of the Elderly. The Charles Wolfson Charitable Trust supported the Oxford Project to Investigate Memory and Ageing (OPTIMA) group. A.A-C. and C.E.S. thank the Motor Neurone Disease Association and MRC for support. D.C.R. is a Wellcome Trust Senior Clinical Research Fellow. Washington University was funded by US National Institutes of Health (NIH) grants, the Barnes Jewish Foundation and the Charles and Joanne Knight Alzheimer's Research Initiative. The Mayo GWAS was supported by NIH grants, the Robert and Clarice Smith and Abigail Van Buren AD Research Program, and the Palumbo Professorship in AD Research. Patient recruitment for the MRC Prion Unit/University College London Department of Neurodegenerative Disease collection was supported by the UCL Hospital/UCL Biomedical Centre. London and the South East Region (LASER)-AD was funded by Lundbeck. The Bonn group was supported by the German Federal Ministry of Education and Research (BMBF), Competence Network Dementia and Competence Network Degenerative Dementia, and by the Alfried Krupp von Bohlen und Halbach-Stiftung. The Kooperative gesundheitsforschung in der region Augsburg (KORA) F4 studies were financed by Helmholtz Zentrum München, the German Research Center for Environmental Health, BMBF, the German National Genome Research Network and the Munich Center of Health Sciences. The Heinz Nixdorf Recall cohort was funded by the Heinz Nixdorf Foundation (G. Schmidt, chairman) and BMBF. Coriell Cell Repositories is supported by the US National Institute of Neurological Disorders and Stroke and the Intramural Research Program (IRP) of the National Institute on Aging (NIA). Work on this sample was supported in part by the IRP of the NIA, NIH, Department of Health and Human Services; Z01 AG000950-06. The authors acknowledge use of DNA from the 1958 Birth Cohort collection, funded by the MRC and the Wellcome Trust, which was genotyped by the Wellcome Trust Case Control Consortium and the Type-1 Diabetes Genetics Consortium, sponsored by the US National Institute of Diabetes and Digestive and Kidney Diseases, National Institute of Allergy and Infectious Diseases, National Human Genome Research Institute, National Institute of Child Health and Human Development and Juvenile Diabetes Research Foundation International. The Antwerp site was supported by the VIB Genetic Service Facility, the Biobank of the Institute Born-Bunge, the Special Research Fund of the University of Antwerp, the Fund for Scientific Research-Flanders, the Foundation for Alzheimer Research and the Interuniversity Attraction Poles program P6/43 of the Belgian Federal Science Policy Office. K.S. is a postdoctoral fellow and K.B. a PhD fellow (Fund for Scientific Research-Flanders). The funders had no role in study design, data collectionand analysis, decision to publish, or preparation of the manuscript.
Competing interests: Profs Owen, J Williams and Dr Harold have a patent application in respect of genes identified in the GWAS Harold et al. Nature Genetics 2009;41(10):1088–93; this study provided data for this manuscript and was funded by the MRC and the Wellcome Trust. Dr Passmore has consulted for Pfizer and received compensation. Dr Hull has been funded by Wyeth and consulted for Pfizer, Wyeth and Merz; he has a patent pending for AD diagnostic tests. Dr Lynch has received travel expenses from Pfizer and Novartis. Prof Goate's work has been funded by NIH and AHAF. Prof Fox is a board member of the Alzheimer's Research Forum, has consulted for Abbott Laboratories and has received compensation for consulting for GE Healthcare. He has a patent for QA Box that did not arise from this work which contributes funds to his institution. Dr Morris is funded by NIA. Dr Livingston has received compensation from Lundbeck sa. Ms Stretton holds a CASE PhD studentship jointly funded by MRC and GSK. The remaining authors declare no potential conflict of interest. This does not alter the authors' adherence to all the PloS ONE policies on sharing data and materials.
Introduction
Alzheimer's disease (AD) is the leading cause of dementia [1], [2] with a heritability of 56–79% [3]. It causes great social, emotional, and financial burdens to sufferers, their families and carers and there are no effective treatments that can slow or halt disease progression [4].
Genetic studies have been successful in identifying a number of causal loci (APP, PSEN1 and PSEN2) for familial early onset forms of AD and in doing so have supported the amyloid cascade hypothesis [5]. Identical amyloid pathology to that observed in early onset disease is seen in the more common late onset form of AD (LOAD), thus implying the relevance of the amyloid cascade in both forms of disease. However, genetic variation at the early onset loci has not been reliably associated with LOAD. Indeed until recently, APOE was the only genetic locus with robust support in LOAD [6]. However, the publication of two genome-wide association studies (GWAS) and replications have recently established three novel LOAD susceptibility loci: CLU, PICALM and CR1 [7], [8], [9], [10].
Genome-wide significant SNPs in complex traits generally explain only a proportion of the heritability of that disorder [11]. Much of the residual heritability underlying common traits appears to lie in SNPs that do not achieve genome-wide significance, meaning that a substantial proportion of the associated genetic signal in current GWAS is hidden below the genome-wide significance threshold. We know that SNPs that are robustly associated with particular common disorders are not randomly distributed across all genes. Instead, the implicated genes show biologically relevant relationships between each other [12], [13], [14], [15]. This is also true for SNPs in genes for which there is weaker individual evidence for association that falls short of stringent levels of genome-wide significance and statistical approaches have recently been developed to identify sets of functionally related genes containing genetic variants that collectively show evidence for association [14], [16]. We used the ALIGATOR algorithm [16] to examine SNPs in two AD GWAS [7], [8] for enrichment in related categories of genes. We also confirmed the results using gene set enrichment [15] and set-based analyses [17] to uncover sets of functionally related genes showing evidence for association with disease. The identification of such patterns in association datasets is likely to be crucial in moving beyond the genetic data to an understanding of function.
Materials and Methods
Data summary
The GWA studies were performed as described in Harold and colleagues [7] and Lambert and colleagues [8]. We have obtained approval to perform a genome-wide association study including 19,000 participants (Multi-centre Research Ethics Committee for Wales MREC 04/09/030; Amendment 2 and 4; approved 27th July 2007). All individuals included in these analyses have provided informed written consent to take part in genetic association studies.
Statistical analysis
Excess of SNPs passing significance thresholds.
The number N of independent SNPs in the whole genome (excluding APOE, CLU and PICALM) was estimated by the method of Moskvina & Schmidt [18], as were the observed number of independent SNPs significant at each p-value criterion. In the absence of excess association, the expected number of independent SNPs significant at significance level α is distributed as a binomial (N,α).
Pathway analyses.
ALIGATOR analysis was carried out essentially as in Holmans and colleagues [16] using gene ontology (GO) and KEGG defined functional categories [19], [20]. ALIGATOR converts a list of significant SNPs into a list of significant genes, and tests this list for enrichment within functional categories. Unlike methods designed for gene-expression data (where there is typically only one measurement per gene), ALIGATOR corrects for variable numbers of SNPs per gene. Each gene is counted once regardless of how many significant SNPs it contains, thus eliminating the influence of LD between SNPs within genes. Replicate gene lists of the same length as the original are generated by randomly sampling SNPs (thus correcting for variable gene size). The lists are used to obtain p-values for enrichment for each category and to correct these for testing multiple non-independent categories, and to test whether the number of significantly enriched categories is higher than expected. The present analysis was restricted to categories containing at least three genes: 6723 GO and 194 KEGG categories. Categories required at least two signals to be counted as enriched to remove the possibility of a small category being deemed significantly enriched based on one signal. SNPs that mapped to within 20kb of a gene (genome build 36_3) were assigned to that gene: if SNPs mapped within 20kb of more than one gene all such genes were included. Based upon the linkage disequilibrium (LD) structure of the region, 33 genes near APOE (chromosome 19: 49.6–50.6 Mb) were removed from the analysis. This was to remove the effects of genes whose evidence for association was merely a consequence of LD with the very strong APOE signal. APOE itself was included in the analysis since it is likely to be the AD susceptibility gene in this region. Any one SNP was not allowed to add more than one gene to any category to prevent the analysis being biased by SNPs located in multiple overlapping genes that are functionally related.
As independent validation of the results obtained from the analysis of GO categories, we also utilised the Mouse Genome Informatics (MGI) database [21]. This contains a comprehensive catalogue of behavioural, physiological and anatomical phenotypes observed in mutant mice. Extracting phenotype data for single gene studies (excluding all transgenes), we converted mouse genes to their human orthologs using the MGI's mouse/human orthology assignment. We were able to map 5671 different phenotypic annotation terms to 6297 human genes, and the gene sets corresponding to each annotation were tested for enrichment in the Harold et al. data using ALIGATOR, as described previously.
Set-based analyses on genes and gene sets.
Two gene-wide analyses were carried out using PLINK [17]. The first was based on the most significant single-SNP p-value and the second, ‘set-based’, analysis was based on the average chi-squared statistic of all SNPs in the gene, calculated under an allelic association model. The former analysis will detect significant association in genes with a single strong signal, while the latter analysis will highlight genes with several independent signals, even if each of these is of modest significance individually. The analyses are thus complementary. Significance in each case was obtained by comparing the test statistic in the observed data to that obtained when disease status was randomly permuted among individuals, thereby accounting for inter-SNP LD. 1000 permutations were performed (10000 for genes with a gene-wide p-value<0.01). Genes without at least one SNPs p<0.05 were not analysed.
As a validation of the ALIGATOR results, set-based analysis was also performed on the set of SNPs within each of the GO processes that were significantly enriched in both GWAS datasets. 1000 permutations were used for each process. Set-based analysis is robust to LD between and within genes, as well as SNPs being in several genes.
Gene-set-enrichment (GSEA) analysis.
As a further validation of the ALIGATOR results, gene-set enrichment analysis (GSEA) was performed using the method described in Wang et al. [15]. Rather than defining a list of significant genes, GSEA ranks all genes in order of a gene-wide association statistic, and tests whether the genes in a particular gene set have higher rank overall than would be expected by chance. Following Wang et al., in order to allow for varying numbers of SNPs per gene, the gene-wide statistic used was the Simes-corrected single-SNP p-value [22]. Since apparently significant GSEA enrichments can result from a single gene that is strongly associated with disease [23], we removed the APOE region before performing the analysis.
Results
In the GWAS study of Harold et al. [7] involving approximately 12,000 AD cases and controls, we observed a considerable excess of SNPs surpassing different thresholds of significance when compared with those expected by chance (Table 1), suggesting the existence of many LOAD susceptibility loci that were not detected at genome-wide significance. To exploit any signal arising from the excess of nominally significant SNPs in the GWAS, we used ALIGATOR [16], to identify functional categories that were enriched for association signals.
We found that in the real data, significantly more GO categories were enriched for genes containing at least one SNP surpassing varying thresholds of nominal significance compared with the simulated data (Table 2). The most significant excess in enriched GO categories was based upon a list of 589 autosomal genes defined by having at least 1 SNP with p<0.001. In that analysis, there was a significant excess of categories regardless of the threshold (p<0.05, p<0.01, p<0.001) for defining a category containing a significant excess of associated genes. This list was used to define enriched GO categories for further study [16]. However, we note that significant excesses of enriched categories were also observed for gene lists defined by other SNP association criteria and that the categories themselves were similar, suggesting the conclusions of this study are not highly sensitive to the threshold used to define nominal SNP association.
From the most significantly enriched categories in the Harold GWAS [7] (Table 3, Table S1), two main themes emerged: sterol and lipid metabolism and the immune response. Many of the top 20 categories relate to these processes and aspects of these processes are detected throughout the significant GO categories. Note that several categories show significant enrichment even after correcting for the multiple GO categories tested (study-wide p<0.05). A similar analysis was performed on the GWAS data from Lambert and colleagues, in which the same SNP threshold of p<0.001 defined a list of 423 autosomal genes. Sterol and lipid metabolism and the immune system again emerge as clear themes in the list of significantly enriched categories derived from the Lambert data (Table S2). None of the categories relating to β-amyloid (Aβ) and its processing were significant in this analysis either in the Harold (Table S1) or Lambert (Table S2) data.
In order to investigate whether we could replicate this signal we restricted enrichment analysis of the Lambert data [24] to the 173 GO processes with enrichment p<0.05 in the Harold data [7]. Of the 173 categories, twenty-five processes were also enriched for genes containing a SNP with p<0.05 in the replication dataset, a number that is significantly greater than expected (p = 0.0045). This provides evidence for a common underlying genetic association between the studies. Note that the significance of this overlap is not due to the biological areas in question being relatively well annotated since the same set of processes was tested in both the real and simulated gene lists (see Methods). Table 4 shows that these processes relate to the immune system and complement pathways and to cholesterol and lipid metabolism with one exception: cholinergic synaptic transmission. For the majority of these processes, their joint enrichment (defined as the product of the enrichment p-values in the two studies) is significant even after correction for testing multiple GO categories, thus providing strong evidence for their involvement in disease susceptibility.
ALIGATOR enrichment analysis was also performed on 194 KEGG [20] human pathways. Six KEGG pathways were significantly enriched (p<0.05) in both the Harold and Lambert datasets [7], [8]. This is higher than would be expected by chance (p = 1.16×10−3). These pathways, and their enrichment p-values, are listed in Table S3. The genes contained in the pathways, together with the p-values of the most significant SNP are listed in Table S4. Inspection of Table S4 reveals that, in addition to CR1 and CR2 (members of pathway hsa4640: hematopoietic cell lineage), there are several genes in the HLA region contributing to the enrichment signal in both datasets. These genes may reflect the same association signal due to LD, and were therefore collapsed into one signal: when the enrichment analysis was repeated, no pathway was significantly enriched (p<0.05) in both datasets. The enrichment significance for each of the MGI mouse phenotype annotations is shown in Table S5. It can be seen that several of the most significantly enriched annotations relate to lipids, cholesterol and innate immunity, similar to the top-ranking GO categories in Tables 3 and 4.
To investigate which genes contribute to the association signals seen in the enriched GO processes identified by both GWAS, two further analyses were performed in the Harold data using PLINK [17]. First, a gene-wide correction was applied to the most significant single-SNP p-value in each gene. Second, a ‘set-based’ analysis was applied to each gene based on the average single-SNP chi-squared statistic of all SNPs in that gene. The latter analysis measures the overall association evidence across a gene, highlighting genes with multiple association signals. Results for all genes in the cholesterol-related processes listed in Table 4 are given in Table S6, and for all genes in the immune-related processes in Table S7. Gene-wide significance of genes with a SNP with p<0.001 in either study are shown for lipid-related genes in Table 5 and for immune-related genes in Table 6. As expected, most of the genes in Tables 5 and 6 show gene-wide significant association evidence (Tables S6 and S7), but other genes in these processes are also significant. Tables 5 and 6 also give the most significantly associated SNP from each gene for both studies and the r2 between them. Note that the immune-related genes include both CLU, which contains a SNP showing genome-wide significant association in both GWAS, and CR1, which contains a SNP that is genome-wide significant in one study [8] and has a p-value<1×10−5 in the other [7]. It was not possible to perform gene-wide analyses on the Lambert data since individual genotypes were not available. However, the most significant p-values from the genes of interest are shown in Tables 5, S6 and S7. Similar gene-wide analyses were performed on the genes in the enriched KEGG pathways (Table S4).
Set-based and GSEA analysis was applied to each of the 25 GO processes with ALIGATOR p<0.05 in both the GWAS datasets (Table 4). GSEA analysis was applied in both Harold [7] and Lambert [8] datasets, while the set-based analysis was applied in the Harold dataset only (with the APOE region removed) since individual genotypes were not available in the Lambert dataset. Set-based analyses were also applied to the complete set of cholesterol-related genes in Table S6, and the complete set of immune-related genes in Table S7. The cholesterol-related genes gave a set-based p = 0.005, and the immune-related genes p = 0.005. After removing the SNPs giving rise to the GO signal (i.e. the most significant SNPs from the genes in Tables 5 and 6), the p-values are p = 0.009 and p = 0.007, respectively. This shows that the association signal in these genes is not restricted to a few highly-significant SNPs. GSEA analysis in the Harold dataset was significant for all of the processes except for GO:0007271 (synaptic transmission, cholinergic), with p-values very similar to that of the ALIGATOR analysis. In the Lambert dataset, all the immune-related pathways gave significant GSEA p-values, as did some of the lipid/cholesterol-related pathways. A pathway giving significant results in ALIGATOR but not in GSEA is likely due to the genes containing SNPs with p<0.001 being large (and thus subject to a stringent Simes correction), and the remaining genes showing little association evidence. In general, the set-based and GSEA analyses gave similar results to the ALIGATOR analyses, giving confidence that the results obtained by the latter reflect underlying biology.
Discussion
Our analysis of two large independent GWAS of LOAD strongly implicates genetic variation in the functions of the immune system and in lipid metabolism as causes of LOAD susceptibility. A previous analysis of the Lambert et al. data [8], [24] highlighted similar biological processes despite not showing an overall excess of enriched GO categories. It highlights potential mechanisms related to these processes that should be the subject of further detailed genetic and functional investigations. This study has implications for the interpretation of GWAS of complex disease as it demonstrates that useful biological insights may be gained from association signals below the threshold for genome-wide significance, as previously shown for the WTCCC study [16], [25] where pathways known to be related to the diseases studied were highlighted by ALIGATOR. These analyses potentially highlight non-genome-wide significant SNPs that could explain some disease heritability which current GWAS do not have the power to detect.
The power of genetic data lies in their ability to highlight primary susceptibilities to disease, that is, they illuminate aetiology. This does not mean that all genes with a nominally significant SNP in an enriched GO category are true susceptibility genes for the phenotype under consideration, rather that that category itself is likely to be relevant to aetiology since it contains an excess of nominally associated SNPs. In this context, while the Harold [7] and Lambert [8] GWAS show a remarkable overlap in processes identified by ALIGATOR [16], the signal within each category did not necessarily reflect the same set of SNPs or genes. Tables 5 and 6 show that linkage disequilibrium between the most significant SNPs from each gene in the two GWAS varies from 1 (the same SNP) to none. In a pathway analysis this is perhaps unsurprising as there are several explanations for this observation. First, although we observe an excess of associated SNPs at all significance levels (Table 1), not all SNPs that surpass nominal significance can be expected to represent true associations. Second, in a set of genes that influence disease aetiology through a common biological pathway, it is likely that a number of SNPs will be associated with disease risk and affected individuals need not have the same combination of risk alleles. Individuals may have susceptibility alleles in different genes in a pathway or multiple rare susceptibility alleles may occur in a single gene; the latter will tend to be poorly tagged in GWAS. As a consequence even fairly large studies will have modest power to detect (or replicate between studies) any one signal, as compared with the power of tests based on the whole pathway. It is therefore noteworthy that the only non-immune and non-lipid related process detected in both studies was cholinergic synaptic transmission (Table 4); boosting cholinergic transmission is the target of one of the few available therapies for AD [26].
This analysis has limitations. We used categories curated in GO and KEGG databases and phenotypes annotated in the MGI database and will not have detected signal in functional processes not represented or well annotated by those systems. We chose to use GO and KEGG to define pathways since they are publicly available in a format that enables systematic testing of all pathways simultaneously in a statistically rigorous manner. The large number of GO categories increases the chance of alignment with the unknown disease biology underlying the GWAS results and the smaller number of results provided by the KEGG analysis supports this conclusion. The power to detect enrichment is highest for well-defined processes, and is greatly reduced if biologically important gene products are incorrectly or incompletely classified, or omitted. The quality of annotation in GO is variable, since some of it is inferred electronically, although there is some evidence that the majority of such annotations are correct [27]. However, enrichment analysis of an independent set of experimentally determined annotations, the MGI mouse phenotypes, highlighted the same biological processes, thus validating the GO results. The same analysis method applied to other diseases [16] found relevant biological pathways which were different to those presented here. Thus, the significance of these results is not simply due to the immune system and lipid metabolism being relatively well annotated. Furthermore, the ALIGATOR results were validated by applying GSEA and set-based analyses to the most significantly enriched pathways. These analyses produced similar results to ALIGATOR, giving confidence that the results obtained by ALIGATOR are genuine. This is supported by a direct analysis of SNPs in lipid-pathway genes in AD [28] which showed that more SNPs in lipid pathway genes than expected showed association with AD.
There are relatively few pathways highlighted by the KEGG analysis and this is likely due to the KEGG pathways including a more restricted range of biological processes than GO: while there are KEGG pathways relating to cholesterol and bile acid biosynthesis there are no pathways relating directly to lipid efflux from and transport between cells. Lambert et al. [24] detected an enrichment with the Alzheimer's disease KEGG pathway in a GSEA analysis. However, this enrichment is likely to have driven by the strong APOE association. We found significant enrichment of this pathway in the Lambert data when APOE was included, but not when it was removed. The KEGG pathways also tend to be large and the KEGG database does not have the hierarchical structure of the GO database that allows more specific functions to be defined. KEGG pathways with apparently similar names do not always contain similar genes to their corresponding GO categories. For example, KEGG pathway hsa4610 (complement and coagulation cascades) and GO:0006958 (complement activation, classical pathway) both relate to the complement cascade. However, hsa4610 also contains several genes that are not part of the complement cascade, making it larger than GO:0006958 (67 genes to 28) and reducing its significance in the enrichment analysis, since none of the extra genes have a SNP with p<0.001.
Cholesterol metabolism and innate immune processes have previously been implicated in AD pathogenesis [29], [30]. Epidemiological studies show that high cholesterol levels in mid-life are correlated with later dementia, and statins, which lower cholesterol levels, may have a protective effect against the development of dementia [31]. There have been trials and epidemiological surveys of the effects of anti-inflammatory treatment in AD which indicate that, although non-steroidal anti-inflammatories may have an effect on disease susceptibility, the drugs investigated so far are not a treatment for manifest disease [32]. Better targeted drugs to the parts of the immune system involved in AD susceptibility may offer new therapeutic avenues for research.
Although APOE was identified as a susceptibility factor for AD over 15 years ago [33], it is still not clear how the ε4 variant contributes to disease risk. The brain requires de novo cholesterol synthesis. This occurs in astrocytes and microglia, the cholesterol then being loaded into APOE lipoprotein particles and transported to the main cholesterol users, neurons and oligodendrocytes [34]. So while the impact of APOE is clearly of importance in AD, our data indicate that other participants in sterol metabolic processes also impact upon susceptibility. It is notable that some of these genes are not expressed in the brain, for instance LIPC, APOA1, SCARB1 and LIPG, but are important in the systemic control of sterol metabolism in the liver and blood. Some of these gene products may well be useful in providing clues for possible systemic biomarkers of disease progress.
APOE has been implicated in Aβ clearance. The lipidation state of APOE is critical to its ability to transport Aβ across the BBB, APOE4 being associated with the least efficient transport [35]. Aβ in the blood is transported in cholesterol-rich HDL particles, which have ApoA1 or ApoE as associated lipoproteins, before elimination by the liver [36]. Our data suggest that the role of APOE in cholesterol metabolism is important in AD, and may implicate the systemic clearance of Aβ-HDL through the liver, in which APOE is certainly involved, as a primary modulator of AD susceptibility [36], [37]. CLU, encoding APOJ, is associated with cholesterol transport and has been demonstrated to promote export of Aβ over the BBB [38] and thus may modulate Aβ clearance from the brain in concert with APOE.
Apart from the APOE locus, CLU, which encodes the complement activation inhibitor clusterin and CR1 which encodes complement receptor 1 both contain genome-wide significant signals and are involved in the innate immune response [7], [8]. The set of immune-related genes remained significantly associated (set-based p-value 0.006) after the removal of CLU. Complement components have been detected in AD amyloid plaques [37] and fibrillar APP activates complement pathways. The phagocytotic action of both microglia and blood-derived macrophages has been implicated in Aβ clearance [38]. However, until now, these observations have been considered to be consequences of disease pathology because activation of microglia, the resident immune cells of the brain, can result from neurodegeneration [39].
Our data suggest that the primary causes of LOAD include genetic variation in cholesterol metabolism and the innate immune system. They also indicate that common variation in genes directly related to Aβ metabolism does not underlie individual differences in susceptibility to LOAD. Nevertheless these findings do not exclude a central role for the amyloid cascade [5] in pathogenesis, and indeed, both processes highlighted by our analysis have been implicated in Aβ clearance in the brain [40] though further work is required to determine whether the risk these processes confer is mediated solely or in part through Aβ and whether they impact on risk via other mechanisms. Importantly both processes represent modifiable risk factors that might be addressed by drugs already in our armoury.
Supporting Information
Table S1.
Gene ontology categories identified by ALIGATOR analysis of the AD GWA data of Harold and colleagues (7). The 589 genes identified as having GWAS SNP signals p<0.001 were used: APOE was included in the gene list. In this analysis one SNP was not allowed to add more than one gene to any gene ontology category. “Study-wide p-value” is the probability of obtaining by chance at least one GO category with a category-specific enrichment p-value at least as significant as that observed.
https://doi.org/10.1371/journal.pone.0013950.s001
(0.11 MB PDF)
Table S2.
Gene ontology categories identified by ALIGATOR analysis of the AD GWA data of Lambert and colleagues. The 423 genes identified as having GWAS SNP signals p<0.001 from Lambert et al. (8)were used: APOE was included in the gene list. In this analysis one SNP was not allowed to add more than one gene to any gene ontology category. “Study-wide p-value” is the probability of obtaining by chance at least one GO category with a category-specific enrichment p-value at least as significant as that observed.
https://doi.org/10.1371/journal.pone.0013950.s002
(0.08 MB PDF)
Table S3.
List of KEGG categories significantly (p<0.05) enriched in both GWAS. “Joint p” is the probability of observing by chance at least one category among the entire set of categories tested with joint enrichment (defined as the product of enrichment p-values from the two GWAS) at least as extreme as that observed in the real data. This corrects for the multiple non-independent GO categories being tested.
https://doi.org/10.1371/journal.pone.0013950.s003
(0.00 MB PDF)
Table S4.
All genes in the KEGG immune-related categories in Table S3. “Best p (corrected)” is the significance of the best single-SNP p-value corrected for testing multiple SNPs in a gene (allowing for LD between SNPs). “Set based p” refers to a test of whether the average single-SNP chi-squared (allelic) association statistic is significantly high (again allowing for LD between SNPs).
https://doi.org/10.1371/journal.pone.0013950.s004
(0.01 MB PDF)
Table S5.
MGI mouse phenotypes identified by ALIGATOR analysis of the AD GWA data of Harold and colleagues. The 589 genes identified as having GWAS SNP signals p<0·001 were used: APOE was included in the gene list. In this analysis one SNP was not allowed to add more than one gene to any phenotype. “Study-wide p-value” is the probability of obtaining by chance at least one mouse phenotype with a phenotype-specific enrichment p-value at least as significant as that observed.
https://doi.org/10.1371/journal.pone.0013950.s005
(0.08 MB PDF)
Table S6.
All genes in the cholesterol and lipid categories in Table 5. “Best p (corrected)” is the significance of the best single-SNP p-value corrected for testing multiple SNPs in a gene (allowing for LD between SNPs). “Set based p” refers to a test of whether the average single-SNP chi-squared (allelic) association statistic is significantly high (again allowing for LD between SNPs).
https://doi.org/10.1371/journal.pone.0013950.s006
(0.08 MB PDF)
Table S7.
All genes in the immune-related categories in Table 6. “Best p (corrected)” is the significance of the best single-SNP p-value corrected for testing multiple SNPs in a gene (allowing for LD between SNPs). “Set based p” refers to a test of whether the average single-SNP chi-squared (allelic) association statistic is significantly high (again allowing for LD between SNPs).
https://doi.org/10.1371/journal.pone.0013950.s007
(0.04 MB PDF)
Acknowledgments
We thank the individuals and families who took part in this research. We thank R. Brown, J. Landers, D. Warden, D. Lehmann, N. Leigh, J. Uphill, J. Beck, T. Campbell, S. Klier, G. Adamson, J. Wyatt, M.L. Perez, T. Meitinger, P. Lichtner, G. Eckstein, N. Graff-Radford, R. Petersen, D. Dickson, G. Fischer, H. Bickel, H. Jahn, H. Kaduszkiewicz, C. Luckhaus, S. Riedel-Heller, S. Wolf, S. Weyerer, the Helmholtz Zentrum München genotyping staff, E. Reiman, the Translational Genomics Research Institute and the NIMH AD Genetics Initiative. We thank Advanced Research Computing @Cardiff (ARCCA), which facilitated data analysis.
Author Contributions
Conceived and designed the experiments: LJ PH MO MO JW. Performed the experiments: LJ PH DH RA PH RS AG NJ AS AM RG PD. Analyzed the data: LJ PH MH DH VM DI AP JSP. Contributed reagents/materials/analysis tools: PH SL JP PP MKL CB DCR MG BL AL KM KSB PP DC BM ST CH DM ADS SL PGK SM NCF MR JC WM FJ BS HvdB IH OP JK JW MD LF HH MH DR AMG JSKK CC PN JM KM GL NJB HMG AM AAC CES ABS RG TWM MN SM KHJ NK HEW ER MMC VSP SGY JH MO MO JW. Wrote the paper: LJ PH JH MO MO JW.
References
- 1. Hebert LE, Scherr PA, Bienias JL, Bennett DA, Evans DA (2003) Alzheimer disease in the US population: prevalence estimates using the 2000 census. Arch Neurol 60: 1119–1122.
- 2. Wancata J, Musalek M, Alexandrowicz R, Krautgartner M (2003) Number of dementia sufferers in Europe between the years 2000 and 2050. Eur Psychiatry 18: 306–313.
- 3. Gatz M, Reynolds CA, Fratiglioni L, Johansson B, Mortimer JA, et al. (2006) Role of genes and environments for explaining Alzheimer disease. Arch Gen Psychiatry 63: 168–174.
- 4. Blennow K, de Leon MJ, Zetterberg H (2006) Alzheimer's disease. Lancet 368: 387–403.
- 5. Hardy J (2009) The amyloid hypothesis for Alzheimer's disease: a critical reappraisal. J Neurochem 110: 1129–1134.
- 6. Bertram L (2009) Alzheimer's disease genetics current status and future perspectives. Int Rev Neurobiol 84: 167–184.
- 7. Harold D, Abraham R, Hollingworth P, Sims R, Gerrish A, et al. (2009) Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer's disease. Nature Genetics 41: 1088–1093.
- 8. Lambert J-C, Heath S, Even G, Campion D, Sleegers K, et al. (2009) Genome-wide association study identifies variants at CLU and CR1 associated with Alzheimer's disease. Nature Genetics 41: 1094–1099.
- 9. Corneveaux JJ, Myers AJ, Allen AN, Pruzin JJ, Ramirez M, et al. (2010) Association of CR1, CLU and PICALM with Alzheimer's disease in a cohort of clinically characterized and neuropathologically verified individuals. Hum Mol Genet.
- 10. Carrasquillo MM, Belbin O, Hunter TA, Ma L, Bisceglio GD, et al. (2010) Replication of CLU, CR1, and PICALM Associations With Alzheimer Disease. Arch Neurol.
- 11. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, et al. (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet.
- 12. Hirschhorn JN (2009) Genomewide association studies–illuminating biologic pathways. N Engl J Med 360: 1699–1701.
- 13. Baranzini SE, Galwey NW, Wang J, Khankhanian P, Lindberg R, et al. (2009) Pathway and network-based analysis of genome-wide association studies in multiple sclerosis. Hum Mol Genet 18: 2078–2090.
- 14. Ritchie MD (2009) Using prior knowledge and genome-wide association to identify pathways involved in multiple sclerosis. Genome Med 1: 65.
- 15. Wang K, Li M, Bucan M (2007) Pathway-Based Approaches for Analysis of Genomewide Association Studies. Am J Hum Genet 81:
- 16. Holmans P, Green EK, Pahwa JS, Ferreira MA, Purcell SM, et al. (2009) Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. Am J Hum Genet 85: 13–24.
- 17. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575.
- 18. Moskvina V, Schmidt KM (2008) On multiple-testing correction in genome-wide association studies. Genet Epidemiol 32: 567–573.
- 19. Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, et al. (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 32: D258–261.
- 20. Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, et al. (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34: D354–357.
- 21. Bult CJ, Eppig JT, Kadin JA, Richardson JE, Blake JA (2008) The Mouse Genome Database (MGD): mouse biology and model systems. Nucleic Acids Res 36: D724–728.
- 22. Simes RJ (1986) Am improved Bonferroni-type procedure for multiple tests of significance. Biometrika 73: 751–754.
- 23. Hong MG, Pawitan Y, Magnusson PK, Prince JA (2009) Strategies and issues in the detection of pathway enrichment in genome-wide association studies. Hum Genet 126: 289–301.
- 24. Lambert JC, Grenier-Boley B, Chouraki V, Heath S, Zelenika D, et al. (2010) Implication of the Immune System in Alzheimer's Disease: Evidence from Genome-Wide Pathway Analysis. J Alzheimers Dis.
- 25. (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661–678.
- 26. Birks J (2006) Cholinesterase inhibitors for Alzheimer's disease. Cochrane Database Syst Rev CD005593.
- 27. Camon EB, Barrell DG, Dimmer EC, Lee V, Magrane M, et al. (2005) An evaluation of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinformatics 6: Suppl 1S17.
- 28. Reynolds CA, Hong MG, Eriksson UK, Blennow K, Wiklund F, et al. (2010) Analysis of lipid pathway genes indicates association of sequence variation near SREBF1/TOM1L2/ATPAF2 with dementia risk. Hum Mol Genet 19: 2068–2078.
- 29. Anstey KJ, Lipnicki DM, Low LF (2008) Cholesterol as a risk factor for dementia and cognitive decline: a systematic review of prospective studies with meta-analysis. Am J Geriatr Psychiatry 16: 343–354.
- 30. Wyss-Coray T (2006) Inflammation in Alzheimer disease: driving force, bystander or beneficial response? Nat Med 12: 1005–1015.
- 31. Duron E, Hanon O (2008) Vascular risk factors, cognitive decline, and dementia. Vasc Health Risk Manag 4: 363–381.
- 32. Weggen S, Rogers M, Eriksen J (2007) NSAIDs: small molecules for prevention of Alzheimer's disease or precursors for future drug development? Trends Pharmacol Sci 28: 536–543.
- 33. Corder EH, Saunders AM, Strittmatter WJ, Schmechel DE, Gaskell PC, et al. (1993) Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. Science 261: 921–923.
- 34. Bu G (2009) Apolipoprotein E and its receptors in Alzheimer's disease: pathways, pathogenesis and therapy. Nat Rev Neurosci 10: 333–344.
- 35. Jiang Q, Lee CY, Mandrekar S, Wilkinson B, Cramer P, et al. (2008) ApoE promotes the proteolytic degradation of Abeta. Neuron 58: 681–693.
- 36. Koudinov AR, Berezov TT, Kumar A, Koudinova NV (1998) Alzheimer's amyloid beta interaction with normal human plasma high density lipoprotein: association with apolipoprotein and lipids. Clin Chim Acta 270: 75–84.
- 37. McGeer EG, McGeer PL (2001) Innate immunity in Alzheimer's disease: a model for local inflammatory reactions. Mol Interv 1: 22–29.
- 38. Blasko I, Stampfer-Kountchev M, Robatscher P, Veerhuis R, Eikelenboom P, et al. (2004) How chronic inflammation can affect the brain and support the development of Alzheimer's disease in old age: the role of microglia and astrocytes. Aging Cell 3: 169–176.
- 39. Hanisch UK, Kettenmann H (2007) Microglia: active sensor and versatile effector cells in the normal and pathologic brain. Nat Neurosci 10: 1387–1394.
- 40. Bates KA, Verdile G, Li QX, Ames D, Hudson P, et al. (2009) Clearance mechanisms of Alzheimer's amyloid-beta peptide: implications for therapeutic design and diagnostic tests. Mol Psychiatry 14: 469–486.