Genetic Evidence Implicates the Immune System and Cholesterol Metabolism in the Aetiology of Alzheimer's Disease

Background Late Onset Alzheimer's disease (LOAD) is the leading cause of dementia. Recent large genome-wide association studies (GWAS) identified the first strongly supported LOAD susceptibility genes since the discovery of the involvement of APOE in the early 1990s. We have now exploited these GWAS datasets to uncover key LOAD pathophysiological processes. Methodology We applied a recently developed tool for mining GWAS data for biologically meaningful information to a LOAD GWAS dataset. The principal findings were then tested in an independent GWAS dataset. Principal Findings We found a significant overrepresentation of association signals in pathways related to cholesterol metabolism and the immune response in both of the two largest genome-wide association studies for LOAD. Significance Processes related to cholesterol metabolism and the innate immune response have previously been implicated by pathological and epidemiological studies of Alzheimer's disease, but it has been unclear whether those findings reflected primary aetiological events or consequences of the disease process. Our independent evidence from two large studies now demonstrates that these processes are aetiologically relevant, and suggests that they may be suitable targets for novel and existing therapeutic approaches.


Introduction
Alzheimer's disease (AD) is the leading cause of dementia [1,2] with a heritability of 56-79% [3]. It causes great social, emotional, and financial burdens to sufferers, their families and carers and there are no effective treatments that can slow or halt disease progression [4].
Genetic studies have been successful in identifying a number of causal loci (APP, PSEN1 and PSEN2) for familial early onset forms of AD and in doing so have supported the amyloid cascade hypothesis [5]. Identical amyloid pathology to that observed in early onset disease is seen in the more common late onset form of AD (LOAD), thus implying the relevance of the amyloid cascade in both forms of disease. However, genetic variation at the early onset loci has not been reliably associated with LOAD. Indeed until recently, APOE was the only genetic locus with robust support in LOAD [6]. However, the publication of two genome-wide association studies (GWAS) and replications have recently established three novel LOAD susceptibility loci: CLU, PICALM and CR1 [7,8,9,10].
Genome-wide significant SNPs in complex traits generally explain only a proportion of the heritability of that disorder [11]. Much of the residual heritability underlying common traits appears to lie in SNPs that do not achieve genome-wide significance, meaning that a substantial proportion of the associated genetic signal in current GWAS is hidden below the genome-wide significance threshold. We know that SNPs that are robustly associated with particular common disorders are not randomly distributed across all genes. Instead, the implicated genes show biologically relevant relationships between each other [12,13,14,15]. This is also true for SNPs in genes for which there is weaker individual evidence for association that falls short of stringent levels of genome-wide significance and statistical approaches have recently been developed to identify sets of functionally related genes containing genetic variants that collectively show evidence for association [14,16]. We used the ALIGATOR algorithm [16] to examine SNPs in two AD GWAS [7,8] for enrichment in related categories of genes. We also confirmed the results using gene set enrichment [15] and set-based analyses [17] to uncover sets of functionally related genes showing evidence for association with disease. The identification of such patterns in association datasets is likely to be crucial in moving beyond the genetic data to an understanding of function.

Data summary
The GWA studies were performed as described in Harold and colleagues [7] and Lambert and colleagues [8]. We have obtained approval to perform a genome-wide association study including 19,000 participants (Multi-centre Research Ethics Committee for Wales MREC 04/09/030; Amendment 2 and 4; approved 27 th July 2007). All individuals included in these analyses have provided informed written consent to take part in genetic association studies.

Statistical analysis
Excess of SNPs passing significance thresholds. The number N of independent SNPs in the whole genome (excluding APOE, CLU and PICALM) was estimated by the method of Moskvina & Schmidt [18], as were the observed number of independent SNPs significant at each p-value criterion. In the absence of excess association, the expected number of independent SNPs significant at significance level a is distributed as a binomial (N,a).
Pathway analyses. ALIGATOR analysis was carried out essentially as in Holmans and colleagues [16] using gene ontology (GO) and KEGG defined functional categories [19,20]. ALIGATOR converts a list of significant SNPs into a list of significant genes, and tests this list for enrichment within functional categories. Unlike methods designed for gene-expression data (where there is typically only one measurement per gene), ALIGATOR corrects for variable numbers of SNPs per gene. Each gene is counted once regardless of how many significant SNPs it contains, thus eliminating the influence of LD between SNPs within genes. Replicate gene lists of the same length as the original are generated by randomly sampling SNPs (thus correcting for variable gene size). The lists are used to obtain pvalues for enrichment for each category and to correct these for testing multiple non-independent categories, and to test whether the number of significantly enriched categories is higher than expected. The present analysis was restricted to categories containing at least three genes: 6723 GO and 194 KEGG categories. Categories required at least two signals to be counted as enriched to remove the possibility of a small category being deemed significantly enriched based on one signal. SNPs that mapped to within 20kb of a gene (genome build 36_3) were assigned to that gene: if SNPs mapped within 20kb of more than one gene all such genes were included. Based upon the linkage disequilibrium (LD) structure of the region, 33 genes near APOE (chromosome 19: 49.6-50.6 Mb) were removed from the analysis. This was to remove the effects of genes whose evidence for association was merely a consequence of LD with the very strong APOE signal. APOE itself was included in the analysis since it is likely to be the AD susceptibility gene in this region. Any one SNP was not allowed to add more than one gene to any category to prevent the analysis being biased by SNPs located in multiple overlapping genes that are functionally related.
As independent validation of the results obtained from the analysis of GO categories, we also utilised the Mouse Genome Informatics (MGI) database [21]. This contains a comprehensive catalogue of behavioural, physiological and anatomical phenotypes observed in mutant mice. Extracting phenotype data for single gene studies (excluding all transgenes), we converted mouse genes to their human orthologs using the MGI's mouse/human orthology assignment. We were able to map 5671 different phenotypic annotation terms to 6297 human genes, and the gene sets corresponding to each annotation were tested for enrichment in the Harold et al. data using ALIGATOR, as described previously.
Set-based analyses on genes and gene sets. Two genewide analyses were carried out using PLINK [17]. The first was based on the most significant single-SNP p-value and the second, 'set-based', analysis was based on the average chi-squared statistic of all SNPs in the gene, calculated under an allelic association model. The former analysis will detect significant association in genes with a single strong signal, while the latter analysis will highlight genes with several independent signals, even if each of these is of modest significance individually. The analyses are thus complementary. Significance in each case was obtained by comparing the test statistic in the observed data to that obtained when disease status was randomly permuted among individuals, thereby accounting for inter-SNP LD. 1000 permutations were performed (10000 for genes with a gene-wide p-value,0.01). Genes without at least one SNPs p,0.05 were not analysed.
As a validation of the ALIGATOR results, set-based analysis was also performed on the set of SNPs within each of the GO processes that were significantly enriched in both GWAS datasets. 1000 permutations were used for each process. Set-based analysis is robust to LD between and within genes, as well as SNPs being in several genes.
Gene-set-enrichment (GSEA) analysis. As a further validation of the ALIGATOR results, gene-set enrichment analysis (GSEA) was performed using the method described in Wang et al. [15]. Rather than defining a list of significant genes, GSEA ranks all genes in order of a gene-wide association statistic, and tests whether the genes in a particular gene set have higher rank overall than would be expected by chance. Following Wang et al., in order to allow for varying numbers of SNPs per gene, the gene-wide statistic used was the Simes-corrected single-SNP pvalue [22]. Since apparently significant GSEA enrichments can result from a single gene that is strongly associated with disease [23], we removed the APOE region before performing the analysis.

Results
In the GWAS study of Harold et al. [7] involving approximately 12,000 AD cases and controls, we observed a considerable excess of SNPs surpassing different thresholds of significance when compared with those expected by chance (Table 1), suggesting the existence of many LOAD susceptibility loci that were not detected at genome-wide significance. To exploit any signal arising from the excess of nominally significant SNPs in the GWAS, we used ALIGATOR [16], to identify functional categories that were enriched for association signals.
We found that in the real data, significantly more GO categories were enriched for genes containing at least one SNP surpassing varying thresholds of nominal significance compared with the simulated data ( Table 2). The most significant excess in enriched GO categories was based upon a list of 589 autosomal genes defined by having at least 1 SNP with p,0.001. In that analysis, there was a significant excess of categories regardless of the threshold (p,0.05, p,0.01, p,0.001) for defining a category containing a significant excess of associated genes. This list was used to define enriched GO categories for further study [16]. However, we note that significant excesses of enriched categories were also observed for gene lists defined by other SNP association criteria and that the categories themselves were similar, suggesting the conclusions of this study are not highly sensitive to the threshold used to define nominal SNP association.
From the most significantly enriched categories in the Harold GWAS [7] (Table 3, Table S1), two main themes emerged: sterol and lipid metabolism and the immune response. Many of the top 20 categories relate to these processes and aspects of these processes are detected throughout the significant GO categories. Note that several categories show significant enrichment even after correcting for the multiple GO categories tested (study-wide p,0.05). A similar analysis was performed on the GWAS data from Lambert and colleagues, in which the same SNP threshold of p,0.001 defined a list of 423 autosomal genes. Sterol and lipid metabolism and the immune system again emerge as clear themes in the list of significantly enriched categories derived from the Lambert data (Table S2). None of the categories relating to bamyloid (Ab) and its processing were significant in this analysis either in the Harold (Table S1) or Lambert (Table S2) data.
In order to investigate whether we could replicate this signal we restricted enrichment analysis of the Lambert data [24] to the 173 GO processes with enrichment p,0.05 in the Harold data [7]. Of the 173 categories, twenty-five processes were also enriched for genes containing a SNP with p,0.05 in the replication dataset, a number that is significantly greater than expected (p = 0.0045). This provides evidence for a common underlying genetic association between the studies. Note that the significance of this overlap is not due to the biological areas in question being relatively well annotated since the same set of processes was tested in both the real and simulated gene lists (see Methods). Table 4 shows that these processes relate to the immune system and complement pathways and to cholesterol and lipid metabolism with one exception: cholinergic synaptic transmission. For the majority of these processes, their joint enrichment (defined as the product of the enrichment p-values in the two studies) is significant even after correction for testing multiple GO categories, thus providing strong evidence for their involvement in disease susceptibility.
ALIGATOR enrichment analysis was also performed on 194 KEGG [20] human pathways. Six KEGG pathways were significantly enriched (p,0.05) in both the Harold and Lambert datasets [7,8]. This is higher than would be expected by chance (p = 1.16610 23 ). These pathways, and their enrichment p-values, are listed in Table S3. The genes contained in the pathways, together with the p-values of the most significant SNP are listed in Table S4. Inspection of Table S4 reveals that, in addition to CR1 and CR2 (members of pathway hsa4640: hematopoietic cell lineage), there are several genes in the HLA region contributing to the enrichment signal in both datasets. These genes may reflect the same association signal due to LD, and were therefore collapsed into one signal: when the enrichment analysis was repeated, no  The analysis used only autosomes and was restricted to GO categories with at least two hits. SNPs that mapped to within 20kb of a gene were assigned to that gene: if SNPs mapped within 20kb of more than one gene all such genes were included. SNPs in the APOE region (49.6-50.6 Mb on chromosome 19, 34 genes) were removed from the analysis. Only the most significant of any GO categories containing the same list of significant genes was permitted and any one SNP was not allowed to add more than one gene to any GO category. Pvalues were generated using 5000 permutations of the data except for * 50,000 permutations. doi:10.1371/journal.pone.0013950.t002 pathway was significantly enriched (p,0.05) in both datasets. The enrichment significance for each of the MGI mouse phenotype annotations is shown in Table S5. It can be seen that several of the most significantly enriched annotations relate to lipids, cholesterol and innate immunity, similar to the top-ranking GO categories in Tables 3 and 4.
To investigate which genes contribute to the association signals seen in the enriched GO processes identified by both GWAS, two further analyses were performed in the Harold data using PLINK [17]. First, a gene-wide correction was applied to the most significant single-SNP p-value in each gene. Second, a 'set-based' analysis was applied to each gene based on the average single-SNP chi-squared statistic of all SNPs in that gene. The latter analysis measures the overall association evidence across a gene, highlighting genes with multiple association signals. Results for all genes in the cholesterol-related processes listed in Table 4 are given in Table S6, and for all genes in the immune-related processes in Table S7. Gene-wide significance of genes with a SNP with p,0.001 in either study are shown for lipid-related genes in Table 5 and for immune-related genes in Table 6. As expected, most of the genes in Tables 5 and 6 show gene-wide significant association evidence (Tables S6 and S7), but other genes in these processes are also significant. Tables 5 and 6 also give the most significantly associated SNP from each gene for both studies and the r 2 between them. Note that the immune-related genes include both CLU, which contains a SNP showing genome-wide significant association in both GWAS, and CR1, which contains a SNP that is genome-wide significant in one study [8] and has a pvalue,1610 25 in the other [7]. It was not possible to perform gene-wide analyses on the Lambert data since individual genotypes were not available. However, the most significant pvalues from the genes of interest are shown in Tables 5, S6 and S7. Similar gene-wide analyses were performed on the genes in the enriched KEGG pathways (Table S4).
Set-based and GSEA analysis was applied to each of the 25 GO processes with ALIGATOR p,0.05 in both the GWAS datasets (Table 4). GSEA analysis was applied in both Harold [7] and Lambert [8] datasets, while the set-based analysis was applied in the Harold dataset only (with the APOE region removed) since individual genotypes were not available in the Lambert dataset. Set-based analyses were also applied to the complete set of cholesterol-related genes in Table S6, and the complete set of immune-related genes in Table S7. The cholesterol-related genes gave a set-based p = 0.005, and the immune-related genes p = 0.005. After removing the SNPs giving rise to the GO signal (i.e. the most significant SNPs from the genes in Tables 5 and 6), the p-values are p = 0.009 and p = 0.007, respectively. This shows that the association signal in these genes is not restricted to a few highly-significant SNPs. GSEA analysis in the Harold dataset was significant for all of the processes except for GO:0007271 (synaptic The 589 genes identified as having GWAS SNP signals p,0.001 were used: APOE was included in the gene list. In this analysis one SNP was not allowed to add more than one gene to any gene ontology category. ''Study-wide p-value'' is the probability of obtaining by chance at least one GO category with a category-specific enrichment p-value at least as significant as that observed. . There are genes in the pathways that are in close proximity and that are both included because of the same significant SNP in both genes, as genes were associated with a SNP if it mapped within 20kb of a given gene: details of these genes are in Tables 5 and 6

Discussion
Our analysis of two large independent GWAS of LOAD strongly implicates genetic variation in the functions of the immune system and in lipid metabolism as causes of LOAD susceptibility. A previous analysis of the Lambert et al. data [8,24] highlighted similar biological processes despite not showing an overall excess of enriched GO categories. It highlights potential mechanisms related to these processes that should be the subject of further detailed genetic and functional investigations. This study has implications for the interpretation of GWAS of complex disease as it demonstrates that useful biological insights may be gained from association signals below the threshold for genomewide significance, as previously shown for the WTCCC study [16,25] where pathways known to be related to the diseases studied were highlighted by ALIGATOR. These analyses potentially highlight non-genome-wide significant SNPs that could explain some disease heritability which current GWAS do not have the power to detect.
The power of genetic data lies in their ability to highlight primary susceptibilities to disease, that is, they illuminate aetiology. This does not mean that all genes with a nominally significant SNP in an enriched GO category are true susceptibility genes for the phenotype under consideration, rather that that category itself is likely to be relevant to aetiology since it contains an excess of nominally associated SNPs. In this context, while the Harold [7] and Lambert [8] GWAS show a remarkable overlap in processes identified by ALIGATOR [16], the signal within each category did not necessarily reflect the same set of SNPs or genes. Tables 5  and 6 show that linkage disequilibrium between the most significant SNPs from each gene in the two GWAS varies from 1 (the same SNP) to none. In a pathway analysis this is perhaps unsurprising as there are several explanations for this observation. First, although we observe an excess of associated SNPs at all significance levels (Table 1), not all SNPs that surpass nominal significance can be expected to represent true associations. Second, in a set of genes that influence disease aetiology through a common biological pathway, it is likely that a number of SNPs will be associated with disease risk and affected individuals need not have the same combination of risk alleles. Individuals may have susceptibility alleles in different genes in a pathway or multiple rare susceptibility alleles may occur in a single gene; the latter will tend to be poorly tagged in GWAS. As a consequence even fairly large studies will have modest power to detect (or replicate between studies) any one signal, as compared with the power of tests based on the whole pathway. It is therefore noteworthy that the only non-immune and non-lipid related process detected in both studies was cholinergic synaptic transmission (Table 4); boosting cholinergic transmission is the target of one of the few available therapies for AD [26]. This analysis has limitations. We used categories curated in GO and KEGG databases and phenotypes annotated in the MGI database and will not have detected signal in functional processes not represented or well annotated by those systems. We chose to use GO and KEGG to define pathways since they are publicly available in a format that enables systematic testing of all pathways simultaneously in a statistically rigorous manner. The large number of GO categories increases the chance of alignment with the unknown disease biology underlying the GWAS results and the smaller number of results provided by the KEGG analysis supports this conclusion. The power to detect enrichment is highest for well-defined processes, and is greatly reduced if Table 5. Genes with a SNP with p,0.001 in cholesterol and lipid-related processes that are significantly enriched in both GWAS.

Gene Symbol
Chr location (Mb) No. of SNPs (Harold) Best p-value (Harold) No. of SNPs (Lambert) Best p-value (Lambert) Best SNP (Harold) Best SNP (Lambert) r 2 (Harold)  Genes included are those that have a SNP with p,0.001 in the Harold GWAS, and are in the lipid-related processes significantly enriched in both GWAS (Table 4). APOC1, APOC2 and APOC4 are not included in the enrichment analysis (Tables 3 and 4) since they are in LD with APOE. APOA1 and APOA4 share the same best SNP and are therefore counted as the same gene in the enrichment analyses. Two genes, CLU and APOA4, are found in both cholesterol and immune-related GO processes. The category-wide set-based analysis allows for such dependence between genes. Genes contributing to the enrichment signal from Harold  biologically important gene products are incorrectly or incompletely classified, or omitted. The quality of annotation in GO is variable, since some of it is inferred electronically, although there is some evidence that the majority of such annotations are correct [27]. However, enrichment analysis of an independent set of experimentally determined annotations, the MGI mouse phenotypes, highlighted the same biological processes, thus validating the GO results. The same analysis method applied to other diseases [16] found relevant biological pathways which were different to those presented here. Thus, the significance of these results is not Table 6. Genes with a SNP with p,0.001 in immune-related processes that are significantly enriched in both GWAS.

Gene Symbol
Chr location (Mb) No. of SNPs (Harold) Best p-value (Harold) No. of SNPs (Lambert) Best p-value (Lambert) Best SNP (Harold) Best SNP (Lambert) r 2 (Harold)  Genes included are those that have a SNP with p,0.001 in the Harold GWAS, and are in the immune-related processes significantly enriched in both GWAS (Table 4). BCL3 is not included in the enrichment analysis (Tables 3 and 4) since it is in LD with APOE. Two genes, CLU and APOA4, are found in both cholesterol and immunerelated GO processes. CR1 and CR2 are at the same locus, as are IL18RAP, IL18R1 and IL1RL1 (see Table 3). Although they do not share the same best SNP, they may be tagging the same signal. simply due to the immune system and lipid metabolism being relatively well annotated. Furthermore, the ALIGATOR results were validated by applying GSEA and set-based analyses to the most significantly enriched pathways. These analyses produced similar results to ALIGATOR, giving confidence that the results obtained by ALIGATOR are genuine. This is supported by a direct analysis of SNPs in lipid-pathway genes in AD [28] which showed that more SNPs in lipid pathway genes than expected showed association with AD.
There are relatively few pathways highlighted by the KEGG analysis and this is likely due to the KEGG pathways including a more restricted range of biological processes than GO: while there are KEGG pathways relating to cholesterol and bile acid biosynthesis there are no pathways relating directly to lipid efflux from and transport between cells. Lambert et al. [24] detected an enrichment with the Alzheimer's disease KEGG pathway in a GSEA analysis. However, this enrichment is likely to have driven by the strong APOE association. We found significant enrichment of this pathway in the Lambert data when APOE was included, but not when it was removed. The KEGG pathways also tend to be large and the KEGG database does not have the hierarchical structure of the GO database that allows more specific functions to be defined. KEGG pathways with apparently similar names do not always contain similar genes to their corresponding GO categories. For example, KEGG pathway hsa4610 (complement and coagulation cascades) and GO:0006958 (complement activation, classical pathway) both relate to the complement cascade. However, hsa4610 also contains several genes that are not part of the complement cascade, making it larger than GO:0006958 (67 genes to 28) and reducing its significance in the enrichment analysis, since none of the extra genes have a SNP with p,0.001.
Cholesterol metabolism and innate immune processes have previously been implicated in AD pathogenesis [29,30]. Epidemiological studies show that high cholesterol levels in mid-life are correlated with later dementia, and statins, which lower cholesterol levels, may have a protective effect against the development of dementia [31]. There have been trials and epidemiological surveys of the effects of anti-inflammatory treatment in AD which indicate that, although non-steroidal anti-inflammatories may have an effect on disease susceptibility, the drugs investigated so far are not a treatment for manifest disease [32]. Better targeted drugs to the parts of the immune system involved in AD susceptibility may offer new therapeutic avenues for research.
Although APOE was identified as a susceptibility factor for AD over 15 years ago [33], it is still not clear how the e4 variant contributes to disease risk. The brain requires de novo cholesterol synthesis. This occurs in astrocytes and microglia, the cholesterol then being loaded into APOE lipoprotein particles and transported to the main cholesterol users, neurons and oligodendrocytes [34]. So while the impact of APOE is clearly of importance in AD, our data indicate that other participants in sterol metabolic processes also impact upon susceptibility. It is notable that some of these genes are not expressed in the brain, for instance LIPC, APOA1, SCARB1 and LIPG, but are important in the systemic control of sterol metabolism in the liver and blood. Some of these gene products may well be useful in providing clues for possible systemic biomarkers of disease progress.
APOE has been implicated in Ab clearance. The lipidation state of APOE is critical to its ability to transport Ab across the BBB, APOE4 being associated with the least efficient transport [35]. Ab in the blood is transported in cholesterol-rich HDL particles, which have ApoA1 or ApoE as associated lipoproteins, before elimination by the liver [36]. Our data suggest that the role of APOE in cholesterol metabolism is important in AD, and may implicate the systemic clearance of Ab-HDL through the liver, in which APOE is certainly involved, as a primary modulator of AD susceptibility [36,37]. CLU, encoding APOJ, is associated with cholesterol transport and has been demonstrated to promote export of Ab over the BBB [38] and thus may modulate Ab clearance from the brain in concert with APOE.
Apart from the APOE locus, CLU, which encodes the complement activation inhibitor clusterin and CR1 which encodes complement receptor 1 both contain genome-wide significant signals and are involved in the innate immune response [7,8]. The set of immune-related genes remained significantly associated (setbased p-value 0.006) after the removal of CLU. Complement components have been detected in AD amyloid plaques [37] and fibrillar APP activates complement pathways. The phagocytotic action of both microglia and blood-derived macrophages has been implicated in Ab clearance [38]. However, until now, these observations have been considered to be consequences of disease pathology because activation of microglia, the resident immune cells of the brain, can result from neurodegeneration [39].
Our data suggest that the primary causes of LOAD include genetic variation in cholesterol metabolism and the innate immune system. They also indicate that common variation in genes directly related to Ab metabolism does not underlie individual differences in susceptibility to LOAD. Nevertheless these findings do not exclude a central role for the amyloid cascade [5] in pathogenesis, and indeed, both processes highlighted by our analysis have been implicated in Ab clearance in the brain [40] though further work is required to determine whether the risk these processes confer is mediated solely or in part through Ab and whether they impact on risk via other mechanisms. Importantly both processes represent modifiable risk factors that might be addressed by drugs already in our armoury.

Supporting Information
Table S1 Gene ontology categories identified by ALIGATOR analysis of the AD GWA data of Harold and colleagues (7). The 589 genes identified as having GWAS SNP signals p,0.001 were used: APOE was included in the gene list. In this analysis one SNP was not allowed to add more than one gene to any gene ontology category. ''Study-wide p-value'' is the probability of obtaining by chance at least one GO category with a category-specific enrichment p-value at least as significant as that observed. Found at: doi:10.1371/journal.pone.0013950.s001 (0.11 MB PDF) Table S2 Gene ontology categories identified by ALIGATOR analysis of the AD GWA data of Lambert and colleagues. The 423 genes identified as having GWAS SNP signals p,0.001 from Lambert et al. (8)were used: APOE was included in the gene list. In this analysis one SNP was not allowed to add more than one gene to any gene ontology category. ''Study-wide p-value'' is the probability of obtaining by chance at least one GO category with a category-specific enrichment p-value at least as significant as that observed. Found at: doi:10.1371/journal.pone.0013950.s002 (0.08 MB PDF)

Table S3
List of KEGG categories significantly (p,0.05) enriched in both GWAS. ''Joint p'' is the probability of observing by chance at least one category among the entire set of categories tested with joint enrichment (defined as the product of enrichment p-values from the two GWAS) at least as extreme as that observed in the real data. This corrects for the multiple non-independent GO categories being tested.
Found at: doi:10.1371/journal.pone.0013950.s003 (0.00 MB PDF) Table S4 All genes in the KEGG immune-related categories in Table S3. ''Best p (corrected)'' is the significance of the best single-SNP p-value corrected for testing multiple SNPs in a gene (allowing for LD between SNPs). ''Set based p'' refers to a test of whether the average single-SNP chi-squared (allelic) association statistic is significantly high (again allowing for LD between SNPs). Found at: doi:10.1371/journal.pone.0013950.s004 (0.01 MB PDF) Table S5 MGI mouse phenotypes identified by ALIGATOR analysis of the AD GWA data of Harold and colleagues. The 589 genes identified as having GWAS SNP signals p,0?001 were used: APOE was included in the gene list. In this analysis one SNP was not allowed to add more than one gene to any phenotype. ''Studywide p-value'' is the probability of obtaining by chance at least one mouse phenotype with a phenotype-specific enrichment p-value at least as significant as that observed. Found at: doi:10.1371/journal.pone.0013950.s005 (0.08 MB PDF)

Table S6
All genes in the cholesterol and lipid categories in Table 5. ''Best p (corrected)'' is the significance of the best single-SNP p-value corrected for testing multiple SNPs in a gene (allowing for LD between SNPs). ''Set based p'' refers to a test of whether the average single-SNP chi-squared (allelic) association statistic is significantly high (again allowing for LD between SNPs). Found at: doi:10.1371/journal.pone.0013950.s006 (0.08 MB PDF) Table S7 All genes in the immune-related categories in Table 6. ''Best p (corrected)'' is the significance of the best single-SNP pvalue corrected for testing multiple SNPs in a gene (allowing for LD between SNPs). ''Set based p'' refers to a test of whether the average single-SNP chi-squared (allelic) association statistic is significantly high (again allowing for LD between SNPs). Found at: doi:10.1371/journal.pone.0013950.s007 (0.04 MB PDF)