Overrepresentation of Glutamate Signaling in Alzheimer's Disease: Network-Based Pathway Enrichment Using Meta-Analysis of Genome-Wide Association Studies

Genome-wide association studies (GWAS) have successfully identified several risk loci for Alzheimer's disease (AD). Nonetheless, these loci do not explain the entire susceptibility of the disease, suggesting that other genetic contributions remain to be identified. Here, we performed a meta-analysis combining data of 4,569 individuals (2,540 cases and 2,029 healthy controls) derived from three publicly available GWAS in AD and replicated a broad genomic region (>248,000 bp) associated with the disease near the APOE/TOMM40 locus in chromosome 19. To detect minor effect size contributions that could help to explain the remaining genetic risk, we conducted network-based pathway analyses either by extracting gene-wise p-values (GW), defined as the single strongest association signal within a gene, or calculated a more stringent gene-based association p-value using the extended Simes (GATES) procedure. Comparison of these strategies revealed that ontological sub-networks (SNs) involved in glutamate signaling were significantly overrepresented in AD (p<2.7×10−11, p<1.9×10−11; GW and GATES, respectively). Notably, glutamate signaling SNs were also found to be significantly overrepresented (p<5.1×10−8) in the Alzheimer's disease Neuroimaging Initiative (ADNI) study, which was used as a targeted replication sample. Interestingly, components of the glutamate signaling SNs are coordinately expressed in disease-related tissues, which are tightly related to known pathological hallmarks of AD. Our findings suggest that genetic variation within glutamate signaling contributes to the remaining genetic risk of AD and support the notion that functional biological networks should be targeted in future therapies aimed to prevent or treat this devastating neurological disorder.


Introduction
Alzheimer's disease (AD [MIM 104300]) is the most common neurodegenerative disorder in the human population [1]. Clinically, AD is characterized by a progressive loss of cognitive abilities and memory impairment. At a biological level, it is thought that the presence of extracellular deposits of the b-amyloid peptide (Ab) and intracellular neurofibrillary tangles composed of hyperphosphorylated Tau protein leads to synaptic loss and neuronal death [1,2]. Genetically, AD is complex and heterogeneous. [3,4] A small percentage of AD cases (1-2% of all cases) have an early-onset familial form of presentation, with symptoms appearing before 65 years of age, and most cases are late-onset or ''sporadic'' with no apparent familial recurrence of the disease [4]. While familial-AD has been associated with mutations in the genes coding for the amyloid precursor protein (APP) and the presenilins (PSEN1 and PSEN2) proteins, the only genetic factor extensively replicated for sporadic AD is the apolipoprotein E-e4 (APOE-e4) allele [4][5][6], which is present in ca. 60% of the cases [1,[7][8][9]. However, the APOE-e4 allele is not causative, since it has been found in individuals that would not develop the disease, suggesting that other genetic contributions remain to be identified.
During the past decade, the scientific efforts focused in identifying these genetic hallmarks reported more than 2,900 Single Nucleotide Polymorphisms (SNP) within , 4,700 genes associated with AD [10] (see also AlzGene.org). More recently, the use of high density DNA genotyping microarrays in genome-wide associations studies (GWAS), combined with powerful statistical procedures, have expanded the search for novel susceptibility loci for the disease [11]. Nevertheless, these genetic approaches currently exhibit some limitations. First, they present a high rate of false positives and require major sample sizes in order to be replicated. In fact, different simulations have shown that authentic associations have only a 26% chance of falling into the top 1,000 p-values in a GWAS [12]. Second, they only examine the association of a single genetic variant at a time, therefore failing in the detection of minor associations that can still be present and confer risk in a cumulative way. Finally, current top hits associated with AD, including the bridging integrator 1 (BIN1) [13], clusterin (CLU) [14,15], the ATP-binding cassette sub-family A member 7 (ABCA7) [16], the complement component (3b/4b) receptor 1 (CR1) [15], and the phosphatidylinositol binding clathrin assembly protein (PICALM) [14], do not account for the entire genetic contribution of the disease or surpass the risk conferred solely by the APOE-e4 allele.
Therefore, considering that the etiology of complex diseases might depend on functional protein-protein interaction networks [17,18], here we performed a meta-analysis followed by networkbased pathway analyses on publicly available GWAS in AD and used significant genetic information to identify glutamate signaling as a key ontological pathway of the disease.

GWAS datasets included in the analysis
We selected three publicly available GWAS in AD performed on unrelated case-control and familial samples from Europeandescent populations ( Table 1). The GWAS datasets are: i) the Translational Genomics Research Institute (TGen1) study on AD [19], including 829 AD cases and 535 control individuals; ii) the National Institute on Aging -Late Onset Alzheimer's Disease and the National Cell Repository for Alzheimer's Disease (NIA-LOAD/NCRAD) [20,21], including 5,220 subjects from which we only considered for analysis a subset of 3,689 individuals (1,837 cases and 1,852 controls) that were self-declared non-Hispanic European Americans, passed principal components analyses and had non-missing phenotypes. Given that this subset is composed by familial data, using the provided family trees, we excluded all related controls and kept only one case per family giving a final number of 978 AD cases and 702 controls individuals that were eligible for our study; and iii) the Pfizer Pharmaceutical Company (Pfizer) study on AD [13], including 733 AD cases and 792 control individuals, available only at summary level. The TGen1 data was downloaded from the TGen website (https://www.tgen.org/ research/research-divisions/neurogenomics.aspx) and the NIA-LOAD/NCRAD data was retrieved through the database of Genotypes and Phenotypes (dbGaP; http://www.ncbi.nlm.nih. gov/gap) [22] of the National Institute of Health (NIH), under the accession number phs000168.v1.p1. The Pfizer data was gathered from the supplementary information accompanying the original publication [13]. Detailed information regarding recruitment, diagnosis, affection status and age at the time of enrollment can be found in the original studies [13,19,21]. Written informed consent was obtained for all participants and prior Institutional Review Board approval was obtained at each participating institution. Additionally, data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu) as a targeted replication sample. The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and non-profit organizations, as a $60 million, 5-year public-private partnership. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD). Determination of sensitive and specific markers of very early AD progression is intended to aid researchers and clinicians to develop new treatments and monitor their effectiveness, as well as lessen the time and cost of clinical trials. The Principal Investigator of this initiative is Michael W. Weiner, MD, VA Medical Center and University of California -San Francisco. ADNI is the result of efforts of many coinvestigators from a broad range of academic institutions and private corporations, and subjects have been recruited from over 50 sites across the U.S. and Canada. The initial goal of ADNI was to recruit 800 subjects but ADNI has been followed by ADNI-GO and ADNI-2. To date these three protocols have recruited over 1500 adults, ages 55 to 90, to participate in the research, consisting of cognitively normal older individuals, people with early or late MCI, and people with early AD. The follow up duration of each group is specified in the protocols for ADNI-1, ADNI-2 and ADNI-GO. Subjects originally recruited for ADNI-1 and ADNI-GO had the option to be followed in ADNI-2. After QC procedures (See below; Association analysis) the final ADNI sample consisted in a total of 693 individuals, 449 cases (161 AD and 338 MCI Cases) and 194 unrelated controls. A total of 524,993 SNPs were genotyped under the Illumina 610 Quad platform (Table 1).

Imputation
In order to maximize information on linkage disequilibrium (LD) structure between the studies, the TGen1 and the NIA-LOAD/NCRAD datasets were imputed by comparison with the CEU reference panel (unrelated individuals) from the HapMap III phased data (release 2) [23]. Imputation was carried out using the Markov Chain Haplotyping method implemented in MACH 1.0 following author recommendations [24].

Association analysis
Quality control (QC) procedures such as minor allele frequency (MAF), Hardy-Weinberg equilibrium (HWE), missing rate per individuals (MIND) and per SNPs (GENO) were performed on the TGen1 and the NIA-LOAD/NCRAD dataset using PLINK v.1.07 (http://pngu.mgh.harvard.edu/purcell/plink/) [25] with threshold values of 0.05, 1610 26 , 0.05 and 0.02, respectively. We applied a logistic regression analysis, using an additive model on the imputed datasets data with MACH2DAT [24]. SNPs with r 2 values less than 0.29 were removed from further analysis. Similarly, in the Pfizer dataset, the standard error (SE) per SNP was estimated from the p-values reported in the study [13]. Briefly, p-values were transformed into the corresponding Z score with the INVNORMAL function implemented in STATA v.10 (Stata-Corp, College Station, TX), and then the SE was calculated taking the log of the odds ratios (OR) divided by the corresponding Z score. In the ADNI replication dataset, we performed QC and association analysis based on a quantitative trait locus (QTL) method as described previously [26]. MAF, HWE, MIND and GENO QC values of 0.05, 1610 26 , 0.1 and 0.1 were applied, respectively. In order to control for population stratification we conducted Principal Component Analysis (PCA) with EIGEN-STRAT [27]. After this step, 63 individuals were excluded from further analysis, leaving a total 693 individuals. The QTL association analysis was carried out using an additive genetic linear regression model with PLINK using different co-variables including age at baseline visit, education, gender and APOE status (e4 allele present or not). Finally, the results for each dataset was assessed for genomic inflation and visualized in Quantile-Quantile (Q-Q) plots using the statistical R (www.r-project.org) [28].

Meta-analysis
Meta-analysis was performed using the inverse variance method implemented in PLINK v.1.07 [25]. We checked that all statistics values (p-values, OR and SE) for each dataset prior the metaanalysis were computed for the same allele. Annotation of the results was done with the RefSeq Genes for the human genome assembly Build 36.3/Hg18 available at the UCSC Table Browser [29] using own Perl scripts (available upon request). In order to consider a SNP inside a gene we defined a threshold of +/2 5,000 bp relative to the transcription start and end sites. The annotated output of the meta-analysis used for the pathway approach is available in Table S1. PRISMA guidelines were followed (showed in Checklist S1) [30].

Single-gene p-value generation
Genetic association values with a cut-off threshold of p,0.05, from either the Meta-analysis or the ADNI replication dataset, were transformed into single-gene associations using two independent approaches: i) Selection of a gene-wise p-value (GW) [17], defined as the single strongest association inside a gene; and ii) Calculation of a gene-based association p-value using the extended Simes procedure (GATES), which extract a gene-level association from the combination of the SNPs p-values within a gene. This approach does not relies on genotype or phenotypic data and has been shown to correct type 1 error rates in both simulated and permuted datasets, regardless of the gene size or LD structure [31].

Functional Protein Association Network (FPAN)
To evaluate single-gene associations in a network context the complete human functional protein association network (FPAN) was retrieved from the STRING 9.0 database (Search Tool for the Retrieval of Interacting Genes/Proteins; http://string-db.org/) [32]. FPAN contains highly curated known and predicted interactions emerging from different evidence channels such as: genomic context, co-expression and curated literature. Raw textformatted protein-protein functional interactions were downloaded from STRING. To avoid redundancy and false positives, alternative proteins and their interactions were consolidated into one gene using own Perl scripts (available upon request). We kept only the interactions with a combined score .0.7 (provided by STRING), which stand for high confidence interactions. The final FPAN generated was composed by 14,793 nodes (genes) and 229,357 edges (interactions) and was introduced as an input to Cytoscape [33], which is an open source platform for visualizing complex networks that not only allows the integration of additional attribute data (i.e. gene annotations, expression profiles and interactions source and confidence), but also provides a comprehensive set of tools to perform integrated pathways analysis. Thus, the p-values of GW and GATES procedures were introduced as a floating-point attribute into the FPAN (Table S2).

Sub-networks search (SNs)
SNs search was carried out with the Cytoscape JActive Modules Plug-in [34] with a gene overlap threshold of 50%. JActive modules is designed to detect if a certain group of connected nodes are significantly enriched with a statistical parameter such as the single gene p-value, which in our study comes from either the meta-analysis or the ADNI replication dataset. Briefly, starting from one node a sub-network grows to its connected genes by computing an aggregated score (S) derived from the conversion of the single-gene p-value (if present) into their corresponding z-score (with the inverse normal cumulative distribution function). This score is compared internally with a background distribution created from the scores of 10,000 random modules of the same size in a Monte Carlo procedure. If the aggregated score cease growing above the expected by chance, the algorithm stops and the growing sub-network is reported as a result. As in the original publication, modules with S.3 (3 standard deviation above the mean of randomized scores) and with a size below 50 were considered significant [17]. To acquire a mean S score and standard deviation (SD) for each resulting SN and to confirm that the SN structure (gene members and interactions) and significance remained consistent and replicable, the search was performed 10 times for each analysis (Meta-GW, Meta-GATES, ADNI-GW and ADNI-GATES). Finally, the same procedure was conducted with their corresponding permuted p-values over the entire genes present in the FPAN (Permuted analysis) and without genome wide significant results (p-values ,10 28 ), in this case with real and permuted data, respectively (WGW analysis). Statistical differences between permuted and non-permuted analyses were assessed through two-sided t-test.

Gene Ontology (GO) and KEGG pathway enrichment Analysis
To examine if the structure of significant sub-networks obtained in the Meta or the ADNI replication dataset were biologically meaningful, gene lists of the first 10 significant modules were tested for pathway enrichment using information from Gene Ontology (GO; http://geneontology.org/) [35] and the Kyoto Encyclopedia of Genes and Genomes database (KEGG; http://www.genome. jp/kegg/) [36]. We initially used the ontology structure and annotations using the package Ontologizer [37], only considering categories with less than 500 members to avoid associations to major categories that are less informative (i.e. signaling) and excluding the ones "Inferred by Electronic Annotation" (IEA), from "Reviewed Computational Analysis" (RCA) and with "No biological Data available" (ND), which are characterized by a high rate of false positives [38]. In this case, we used the parent-childunion algorithm to call for overrepresentation adjusting the pvalues with a Westfall-Young Single-Step multiple test correction, to avoid additional false positives [39], and considering a GO term significantly over-represented when the adjusted p-value was below 0.01. Similarly, to determine overrepresentation, KEGG pathway enrichment was assessed in the complete set of pathways and components [40], using an hypergeometric test with the phyper function contained in the R statistical package [41].

Gene expression heatmaps and cluster analysis
To evaluate the expression pattern of genes of interest from the network-based analysis, human gene expression profiles were downloaded from the Allen Brain Atlas (ABA) website (http:// www.brain-map.org) [42]. We used the Gene Search web-tool to enter a list of genes arising from the intersection of sub-networks and analyzed their expression profiles through 27 brain regions. We averaged the expression levels from 6 brain donor individuals (ids. H0351.2001, H0351.2002, H0351.1009, H0351.1012, H0351.1015 and H0351.1016) and used the collapseRows R script [43] to generate a gene-wise expression dataset. Expression heatmaps and hierarchical clusters were analyzed using Cluster v.3.0 (http://www.geo.vu.nl/,huik/cluster.htm) and visualized with the aid of JavaTreeView v.1.1.6r2 (http://jtreeview. sourceforge.net) [44]. Identification of genes co-expressed and correlation analyses were performed with Cluster v.3.0, using Euclidean distance in conjunction with centroid linkage algorithms, and a correlation coefficient cutoff of r.0.7 to denote highly correlated gene clusters.

Meta-Analysis
The complete strategy implemented in the present study is shown in Figure 1. General features of the datasets used for the meta-analysis (TGen1, NIA-LOAD/NCRAD and Pfizer) and for the targeted replication sample (ADNI) are described in Table 1. The genetic information of 4,569 individuals (2,540 AD cases and 2,029 controls) was considered after passing QC thresholds based on MAF, HWE, MIND and GENO parameters calculated in PLINK. Additionally, we imputed a total of 1,231,704 and 1,245,964 QC-passing SNPs for the TGen1 and NIA-LOAD/ NCRAD datasets, respectively. To account for bias still present after QC procedures, SNP association p-values were further assessed for genomic inflation, which is represented in Q-Q plots ( Figure S1). All the datasets yielded an inflation factor (l) between the acknowledged margins of 0.9 to 1.1, where the contribution of population structure to the genome-wide association is negligible [45]. Taking into account these considerations, we performed a meta-analysis using the inverse variance method, selecting p-values and ORs from the fixed effects model, assuming that these studies have been conducted under similar conditions and subjects [46][47][48]. The combined analysis showed a normal distribution of the pvalues with an excess of significant signals seen only at the end of the curve, indicating likely true association events (l = 1.05, Figure  S1).
Whole-genome meta-analysis results are depicted as a Manhattan plot (Figure 2), with a significance threshold defined above log 10 (5610 28 ), which marks the beginning of genome-wide significant values [49]. In agreement with current reports, the strongest associations were located in a broad genomic region (.250,000 bp) in the vicinity of the APOE locus in chromosome 19 (Table 2). In particular, highly significant genome-wide associations signals were observed in the coding region of the translocase of the mitochondrial outer membrane gene (TOMM40: rs2075650, p = 8.54610 2116 , OR = 4.48; rs157580, p = 9.6610 235 , OR = 0.51 and rs8106922, p = 1.17610 225 , OR = 0.57), upstream of the apolipoprotein C-I gene (APOC1: rs439401, p = 8.82610 229 , OR = 0.54), inside the poliovirus receptor related 2 isoform delta gene (PVRL2: rs6859, Figure 1. Study strategy. Genotype imputation was carried out in the Tgen1 and NIA-LOAD/NCRAD datasets (asterisk). The meta-analysis was conducted using the inverse variance method in PLINK, after pruning bad genotypes and samples using standard quality control (QC) tests. The ADNI dataset was included for replication following similar QC procedures at this step. Meta-analysis (dark grey arrows) and ADNI associations (light grey arrows) results were annotated and single gene p-values calculated using the Gene-Wise (GW) method or the more stringent GATES procedure (threshold p,0.05). We next introduced this information to FPAN (from STRING database). Module search was performed 10 times, side by side with the permuted data and without genome-wide-significant (WGW) results, which served as internal controls. Significant sub-networks (white squares) were compared and assayed for gene ontology (GO) term and KEGG pathways enrichment to obtain the final overrepresented pathways associated with AD, inside each sub-network. Equal results between Meta and WGW analysis ('' = '') that could not be obtained with the permuted control (''?'') were expected. doi:10.1371/journal.pone.0095413.g001 p = 7.87610 228 , OR = 1.7 and rs3852861 p = 5.32610 211 , OR = 0.64) and between TOMM40 and the APOE gene (rs405509, p = 2.29610 227 , OR = 0.57). In addition, in the same chromosomal region we observed genome-wide association, downstream of the basal cell adhesion molecule isoform 1 gene (BCAM: rs10402271, p = 1.98610 217 , OR = 1.46 and rs10405693, p = 2.83610 212 , OR = 1.49) and in the region of the B-cell CLL/ lymphoma 3 gene (BCL3: rs8103315, p = 1.87610 28 , OR = 1.5). We note that the strength of the association signal for some SNPs is derived from the Pfizer and NIA-LOAD/NCRAD datasets only, since these were either not genotyped in the TGen1 sample or poorly imputed due to the intrinsic array density in that particular region of chromosome 19 in the Affimetrix GeneChip 500 K (See "N" column, Table 2

Pathway Analysis
To test the hypothesis that highly-connected sub-networks (SNs) enriched with minor associations might be significantly overrepresented in AD, we performed a gene-oriented pathway analysis by loading meta-analysis results into a high confidence functional protein association network (FPAN), gathered from the STRING database [32]. First, to avoid noise, the analysis was restricted to SNPs with p-value ,0.05 (Table S1). Second, we calculated whole-gene association values using two alternative approaches: (i) the extraction of a gene-wise p-value, corresponding to the strongest association signal within a gene (Meta-GW) [17]; and (ii) the derivation of a more stringent gene-based p-value using the extended Simes test (Meta-GATES), which combines all association signals within a gene and controls for the bias that could be generated by gene-size or LD structure among markers [31]. Thus, we observed 66,204 SNP association p-values ,0.05 tagging 7,527 genes. Of these, we loaded only 4,891 and 4,647 gene pvalues (from GW and GATES procedures, respectively) into FPAN (Table S2), which was composed by 14,793 genes having 229,357 non-redundant high-confidence interactions (note that not all genes are informative in the FPAN). Third, we conducted the search for significant SNs in AD using the information from GW and GATES procedures in comparison with an expected background distribution among the FPAN created from 10,000 permutations (See Methods). Fourth, we repeated the search 10 times in order to explore whether the SNs structure (gene members), interactions and significance was consistent across iterations and not the result of the Monte Carlo procedure. As an example, we observed that the top SN1 had a 97.6% of concordance in gene structure and interactions (see also Table  S3). Fifth, the above results were controlled by 2 further module searches, this time including permuted data or results without genome wide (WGW) significance ( Figure 1). While in the former search, the p-values were permuted 10 times over the entire FPAN to determine if the SN structure and its score could be obtained by chance (Meta-GW-Permuted, Meta-GATES-Permuted); in the latter, we discarded the possibility that the SNs could be the result   of bias due to the strong genome-wide significant p-values within the APOE locus. The result of the global search for overrepresented modules in AD is presented in Figure 3. First, the total number of significant SNs obtained in the Meta-GW analysis (average number = 32.3, SD = 6.14) was significantly higher (p = 2.51610 27 ), than those arising from chance ( Figure 3A; Meta-GW Permuted; average = 13.7, SD = 4.05). Likewise, the total number of significant SNs in Meta-GATES was similar to Meta-GW (average number = 34.9, SD = 0.99) and significantly higher (p = 1.35610 29 ) than those obtained by chance ( Figure 3A; Meta-GATES Permuted; average = 6.1, SD = 7.99). Second, the SN scores in each procedure were always significantly higher among real vs. permuted data ( Figure 3B). Third, the number and scores of the modules obtained in the WGW control was similar to the ones obtained with the whole set of associations, indicating that the strongest associations of the meta-analysis did not influence the present observations ( Figure 3A and B). Fourth, the modules obtained in the Meta-GW and Meta-GATES searches, remained consistent in significance and structure across iterations, changing  only in their respective rank/score. Remarkably, the most significant SN detected in each approach (Meta-GW SN1: S = 6.14, p-value = 8.16610 28 ; and Meta-GATES SN1: S = 5.62, p-value = 1.77610 25 ; Figure 3B and Table 3) was identical in structure differing only in the presence of the guanine nucleotide binding protein (G protein) alpha z polypeptide gene (GNAZ), which was absent in Meta-GATES SN1 (Table S4). Finally, we note that the list of genes contained in each SN was not replicated in the permutation analyses and that only 2 out of 688 permuted genes were also seen either in Meta-GW SN1 or Meta-GATES SN1 ( Figure S2). Altogether, these results indicate that the quantity, significance and structure of the modules identified could not be reached by chance, strongly suggesting that these subnetworks could be biologically meaningful in the etiology of AD.

Glutamate signaling is overrepresented in AD
To examine the above-mentioned hypothesis, gene ontology (GO) term enrichment was assessed in the top 10 SNs identified using the package Ontologizer (see Methods). Table 3 presents the top 3 Meta-GW and Meta-GATES SNs as a function of biological process (BP), cellular component (CC) and molecular function (MF) categories. Meta-GW results indicated that: SN1 was heavily composed by genes acting at the synapse (21/461, p = 2.87610 218 ), participating in the glutamate receptor signaling pathway (11/46, 2.67610 211 ) and specifically related to glutamate receptor activity (13/24, p = 1.21610 218 ); SN2 was mostly overrepresented by genes belonging to the axon guidance biological process (33/341, p = 1.20610 29 ), located mostly at the cell leading edge (12/245, p = 9.35610 211 ); and SN3 was over-represented by the roundabout signaling pathway (3/3, p = 1.6610 27 ). On the other hand, the results with the alternative and more stringent Meta-GATES procedure showed that SN1 had identical ontological enrichment patterns as observed for Meta-GW SN1, being glutamate receptor activity the most significant category (13/24, 1.69610 218 ). Interestingly, Meta-GATES SN2 contained several genes involved in lipid metabolism including categories such as steroid metabolic process and lipid localization (11/269, p = 2.13610 29 and 13/221, p = 3.81610 29 , respectively). Finally, Meta-GATES SN3 was composed of genes participating in transmembrane receptor protein kinase activity (12/82, p = 1.18610 212 ), growth factor binding (9/99, p = 3.42610 212 ), protein autophosphorylation (16/174, 2.30610 212 ) and located mainly at the synapse (7/461, 3.28610 26 ). Specific SNs features and components are described in Table S4. The complete set of ontologies overrepresented in the first top 10 SNs (SN1-SN10) is provided in Table S5.

Replication of glutamate signaling in the ADNI dataset
Considering that glutamate signaling pathway components were consistently present in significant SNs enriched with minor associations to AD, both in the Meta-GW and Meta-GATES analyses, and since both procedures were originated from a single set of SNP associations, we next interrogated the ADNI dataset under the same pipeline (Figure 1), as an attempt to replicate the results in an independent sample of AD individuals. This additional dataset was composed of 693 subjects of which 499 were cases and 194 were controls (Table 1). Genetic association values were calculated replicating the quantitative trait locus (QTL) method reported in the original study [26], which is based on the composite memory score, a measure of the level of memory impairment, reported for each patient (see Methods). Although, the ADNI case cohort includes subjects with mild cognitive impairment (MCI), the phenotype is considered a transitional state with significant risk of progression to clinically diagnostic AD [50], Table 3. Cont. which validates their inclusion. After the corresponding QC procedures, the ADNI dataset showed no significant genomic inflation (l = 1.02, Figure S1). According to was described in the original publication, our results indicate that QTL testing yielded 25,785 SNP associations (p-value ,0.05), tagging 4,915 genes (Table S6), did not reach genome wide significant levels. Marginal associations were observed within the dual specificity phosphatase 23 (DUSP23) gene in chromosome 1 (rs1129923, p = 1.07610 26 ), the 3'-phosphoadenosine 5'-phosphosulfate synthase 1 (PAPSS1) gene (rs9569, p = 6.84610 26 ) and in the phosphatidylinositol-4phosphate 3-kinase, catalytic subunit type 2 gamma (PIK3C2G) gene (rs10841025, p = 9.01610 26 ), as well as association signals in intergenic regions on chromosome 17 and 3 (rs9890008, p = 4.25610 26 and rs4857008, p = 5.88610 26 , respectively). We introduced 3,244 and 3,113 p-values, ADNI-GW and ADNI-GATES respectively (Table S2), into the same FPAN used for the Meta-analysis and module search was carried out with their respective null datasets (ADNI-GW-Permuted, ADNI-GATES-Permuted), since in the absence of genome wide significant results, the WGW control was not necessary. In general agreement with the meta-analysis data, global results indicated that the number of significant modules obtained in either ADNI-GW (average number = 30.3, SD = 2.00) or ADNI-GATES (average number = 24.4, SD = 1.7), was significantly higher (p = 7.01610 23 and p = 1.80610 25 , respectively), than those obtained by chance (Permuted; Figure 3C). Interestingly, when comparing the scores of the first 10 SNs only the ones belonging to the ADNI-GW analysis remained significantly above their respective permuted ones ( Figure 3D) and thus were considered for further analysis (Table S4).
GO term enrichment in ADNI indicated that genes belonging to categories such as voltage-gated calcium channel complex and ion channel complex were significantly overrepresented in AD (p = 1.24610 28 and p = 9.24610 27 , respectively; see also Table  S4 and Table S5 for complete ontological results). Moreover, we replicated multiple modules enriched with glutamate signaling genes (Table 4), including modules SN3 (S = 5.23, p = 5.09610 28 ), SN4 (S = 4.94, p = 7.14610 28 ) and SN7 (S = 4.38, p-value = 3.40610 28 ). Individual sub-network structure is presented in Figure S3.
KEGG pathway enrichment KEGG provided information regarding 280 pathways involving 6,733 genes. Although in comparison with the GO database the amount of information provided by KEGG is substantially reduced, the fact that each annotation is manually curated makes any association much more reliable [51]. Notably, throughout this analysis we detected that glutamate signaling was again the main overrepresented biological process in both Meta-GW SN1 (p = 1.10610 228 ) and Meta-GATES SN1 (p = 5.94610 229 ) subnetworks (Table S7). Glutamatergic synapse, as a KEGG pathway category, was also significantly associated in ADNI sub-networks SN3, SN4 and SN7 (p = 1.20610 206 , p = 1.32610 206 and p = 8.18610 210 , respectively; Table S7). Finally, logical and structural relationships of all sub-networks enriched with glutamate signaling genes from the meta-analysis and the ADNIreplication dataset allowed us to define a list of genes of interest, which were shared at least by 3 SNs (Figure 4). The list was composed by 20 signaling components, including membraneanchored ionotropic (GRIN2A, GRIN2B, GRID2, GRIA1 and GRIA2) and metabotropic glutamate receptors (GRM1, GRM3, GRM7 and GRM8), intracellular downstream effectors CAMK2A and AKAP5, as well as scaffold proteins SHANK1 and SHANK2, which are required for proper formation and function of neuronal synapses [52]. The functional relationship of these signaling components in the context of a glutamatergic synapse is shown in Figure 5.

Expression of glutamate-signaling genes in the human brain
At the physiological level, to explore if there was a transcriptional relationship among the glutamate signaling genes previously identified, we examined their expression profiles in 27 normal human brain regions, using the information from the Allen Brain Atlas [42], as a reference. Clearly, the expression pattern of these components clustered in brain regions tightly related to AD pathology, such as the hippocampal formation, hypothalamus and white matter [53][54][55] (Figure 6). While it is known that glutamate signaling is active in these brain domains [56][57][58], it was interesting to find out clusters with high-and low-expression levels. For instance, GRIA1, GRIA2, CAMK2A, GRIN2B and GRM7 were found among highly expressed gene clusters in the hippocampal formation (r = 0.94), particularly in the CA2-CA3 and CA4 region ( Figure 6A), while GRM3, LPHN3, GRID2 and SLC9A9 were grouped in a low-expression cluster (r = 0.87). Lowexpression clusters were also observed in the hypothalamus (LSP1, GRM3, GRM1, GRIN2A, ITPR1, AKAP5 and ATP2B2; Figure 6B), the dorsal thalamus (SHANK2, SHANK1, BAI3, PIAS1, GRIA1, GRID2 and GRIA2; Figure 6C) and also distinguished in the white matter (GRIA2, AKAP5, GRIN2A, GRIN2B, CAMK2A, SHANK2, GRM8, GRM1, GRM7, BAI3, LPHN3, ITPR1, SHANK1 and ATP2B2; Figure 6D). Interestingly, there was an inverse relationship in the expression pattern of a subset of these genes, since components highly expressed in the hippocampus were found in low-expression clusters in the white matter, and vice versa (i.e. GRIA2, CAMK2 and GRIN2B vs. GRM3 and SLC9A9). Altogether these results indicate that glutamate signaling components are differentially expressed in restricted brain domains for proper neuronal or glial functional activity (see also Figure 5).

Discussion
In agreement with previous GWAS in AD [8,[13][14][15][16]19] our meta-analysis detected strong genome-wide association signals in a 250 kb window of chromosome 19, centered in the coding/ regulatory region of the TOMM40 gene, in close proximity to the APOE locus, and that also included significant signals in the PVRL2, APOC1, BCAM and BCL3 genes. While it has been suggested that the association of such extended region may reflect that other variants in LD with APOE may be of pathogenic importance, particularly a poly-T track in the TOMM40 gene [59,60], recent studies have shown that APOE alleles account for essentially all the inherited risk of AD associated in this region Genes shared by at least 2 SNs are located at the center in bold font and cross interactions between genes inside each module are denoted by light grey edges. Node color represents the OR behavior in a gradient from green to red values (i.e. green: OR,1; red OR.1; white: OR = 1), denoting protection and risk, respectively. Similarly, node size is proportional to the -log10 p-value obtained from the meta-analysis (if absent, node size is the minimum). Triangle shaped nodes marks genes belonging to the glutamate signaling pathway GO term (GO:0007215); Diamond shaped nodes denotes genes belonging to KEGG Glutamatergic synapse pathway (hsa04724); Squares Square shaped nodes denotes genes belonging to both gene ontology term GO:0007215 and KEGG hsa04724 pathway. doi:10.1371/journal.pone.0095413.g004 [61]. Besides the signal in chromosome 19, we detected marginal associations of 2 novel SNPs in the MS4A3 gene, located in a wide LD region containing a cluster of SNPs in the MS4A6A/MS4A4E loci in the long arm of chromosome 11 (i.e. rs610932, rs670139, rs1562990, rs4938933 and rs983392), which reached genomewide significant levels by other recent studies [16,62,63].
Assuming that the APOE locus is the major genetic hallmark associated with the disease and that it does not explain the entire susceptibility of AD [1,[7][8][9], we conducted a network-based pathway analysis with our meta-analysis results to explore the biology behind variants with minor effect size. Initially, to integrate the whole genetic contribution from the meta-analysis we used a gene-wise p-value (GW, single min p-value method) that has been widely applied in detecting novel associations using GWAS data [17,64]. Although this approach has a certain bias for pathways enriched with larger genes and does not consider intergenic associations or LD structure, we note that the random permutation of p-values yielded a distribution of results expected by chance from where the actual data could be compared. Likewise, with the search for significant sub-networks with real and permuted data, and additionally with the WGW control, we believe that the actual contribution of the aforementioned problems to the final result is strongly surpassed by the combination of true minor effect size variants. Still, we considered appropriated the introduction of the GATES procedure that is specifically designed to directly address the gene size and LD structure issues and thus we ended up with a more stringent geneoriented p-value. Notably, through this approach we replicated essentially the same SN (i.e. GW and GATES SN1), which was populated by genes related to glutamate signaling, differing only in the absence of GNAZ gene whose only association in the metaanalysis (Table S1; rs4820537, p = 0.02096) was found not informative in the GATES procedure. Glutamate signaling was further replicated in the ADNI dataset and this time it reached significant association levels in three SNs (ADNI-GW SN3, SN4 and SN7).
From a biological point of view, the relationship of these genetic observations with current knowledge about AD is straightforward. Glutamate signaling has been reported to regulate multiple biological processes, including fast excitatory synaptic transmission, neuronal growth and differentiation, synaptic plasticity, learning and memory [65,66]. Degenerating neurons and synapses in AD brains are usually located within regions that project to or from areas displaying high densities of Ab plaques and tangles [67] and in this regard, glutamatergic neurons located in the hippocampus, as well as in other areas of the brain, are severely affected by these neurotoxic insults [65,67]. Likewise, it has been established that there is a relationship between glutamate receptor signaling and soluble Ab oligomers in the hippocampus, affecting their expression and recycling, which leads to long term depression, synaptic loss and ultimately to cognitive deficit [68,69]. Moreover, sustained activation of glutamate receptors at  the synapse rise Ca 2+ influxes and second messenger levels activating neuronal nitric oxide synthase (NO), increasing reactive nitrogen and oxygen species, thus contributing to neuronal damage independently of the presence of Ab oligomers [70]. Alternatively, and from a genomic perspective, here we provide strong evidence that common genetic variants within a complete set of genes acting as ionotropic/metabotropic glutamate receptors and its downstream effectors are associated in a network context with AD. Accordingly, it has been recently reported that pathways related to neurotransmitter receptormediated calcium signaling and long-term potentiation are similarly associated with mild cognitive impairment and AD [26]. In addition we have found that the expression pattern of these glutamate signaling genes cluster in specific brain regions, which are affected during the development of the disease, such as the hippocampal formation and the hypothalamus. Therefore, it will be interesting to learn if the genes identified through our network-based pathway approach are spatially and coordinately modulated at the transcriptional or post-transcriptional level as a result of various trophic or toxic stimuli. Finally, our data extends the notion that the remaining genetic risk for complex traits, such as AD, is likely explained by the accumulation of functional genetic variants inside an entire pathway, rather than by punctual independent mutations.

Supporting Information
Figure S1 Quantile-Quantile (Q-Q) plots for GWAS datasets and combined meta-analysis. Comparison of the association results for each SNP (black dots) with those expected by chance (red line) in TGen1 (A), NIA-LOAD/NCRAD (B), Pfizer (C) the final meta-analysis (D) and in the ADNI replication dataset. In each dataset, the genomic inflation factor (l) is shown. Values of l between 0.9 and 1.1 are considered unbiased by the population structure. (TIF) Figure S2 Gene structure comparison between modules detected with real and permuted data. Gene coincidences between Meta-GW SN1 (49 genes, light grey circle) and Meta-GATES SN1 (48 genes, dark grey circle) are shown in a Venn diagram and compared with the total number of genes in the first 10 modules of each permuted analysis: Meta-GW SN1 to SN10 (397 genes, light grey circle) and Meta-GATES SN1 to SN10 (354 genes, dark grey circle). (TIF) Figure S3 Glutamate signaling SNs overrepresented in AD. Meta-GW SN1 in conjunction with Meta-GATES SN1, and ADNI-GW SN3, ADNI-GW SN4, ADNI-GW SN7 sub-networks are shown in A through D, respectively. Nodes represent genes and edges their corresponding interactions extracted from FPAN based upon the information in the STRING database. Network legend is provided at the bottom panel: the node color represents the OR behavior in a gradient from green to red values (i.e. green: OR,1; red OR.1; white: OR = 1), denoting protection and risk, respectively. Similarly, node size and edge thickness are proportional to the -log10 p-value obtained in the meta-analysis (if absent, node size is the minimum) and the combined score of interaction. Asterisk in GNAZ gene is a reminder that this gene is only present in Meta-GW SN1. (TIF)     Checklist S1 PRISMA Checklist. (PDF) implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: ?http://adni.loni.usc.edu/wp-content/uploads/how_to_ apply/ADNI_Acknowledgement_List.pdf.