Evidence for More than One Parkinson's Disease-Associated Variant within the HLA Region

Parkinson's disease (PD) was recently found to be associated with HLA in a genome-wide association study (GWAS). Follow-up GWAS's replicated the PD-HLA association but their top hits differ. Do the different hits tag the same locus or is there more than one PD-associated variant within HLA? We show that the top GWAS hits are not correlated with each other (0.00≤r2≤0.15). Using our GWAS (2000 cases, 1986 controls) we conducted step-wise conditional analysis on 107 SNPs with P<10−3 for PD-association; 103 dropped-out, four remained significant. Each SNP, when conditioned on the other three, yielded PSNP1 = 5×10−4, PSNP2 = 5×10−4, PSNP3 = 4×10−3 and PSNP4 = 0.025. The four SNPs were not correlated (0.01≤r2≤0.20). Haplotype analysis (excluding rare SNP2) revealed increasing PD risk with increasing risk alleles from OR = 1.27, P = 5×10−3 for one risk allele to OR = 1.65, P = 4×10−8 for three. Using additional 843 cases and 856 controls we replicated the independent effects of SNP1 (Pconditioned-on-SNP4 = 0.04) and SNP4 (Pconditioned-on-SNP1 = 0.04); SNP2 and SNP3 could not be replicated. In pooled GWAS and replication, SNP1 had ORconditioned-on-SNP4 = 1.23, Pconditioned-on-SNP4 = 6×10−7; SNP4 had ORconditioned-on-SNP1 = 1.18, Pconditioned-on-SNP1 = 3×10−3; and the haplotype with both risk alleles had OR = 1.48, P = 2×10−12. Genotypic OR increased with the number of risk alleles an individual possessed up to OR = 1.94, P = 2×10−11 for individuals who were homozygous for the risk allele at both SNP1 and SNP4. SNP1 is a variant in HLA-DRA and is associated with HLA-DRA, DRB5 and DQA2 gene expression. SNP4 is correlated (r2 = 0.95) with variants that are associated with HLA-DQA2 expression, and with the top HLA SNP from the IPDGC GWAS (r2 = 0.60). Our findings suggest more than one PD-HLA association; either different alleles of the same gene, or separate loci.


Introduction
The recent discovery of an association between PD and HLA was made in a hypothesis neutral genome-wide association study (GWAS) [1]. Historically, HLA-disease associations have been conducted with the highly polymorphic ''classical'' HLA loci; i.e., those that encode the diversity for antigen recognition; whereas, the association peak in our GWAS was in HLA-DRA which is practically monomorphic and hence not normally investigated for disease associations. The finding that genetic variants in immune response affect risk of developing PD, firmly grounds, at the DNA level, the long held notion that the immune system and inflammation play a significant role in PD [2].
Our original GWAS that uncovered an association between PD and HLA was performed with the NeuroGenetic Research Consortium (NGRC) data. NGRC is a single data set (2000 persons with PD, 1986 control volunteers) which was collected using uniform protocols for all study procedures including subject selection and diagnosis, data collection, genotyping and data analysis [1]. The NGRC GWAS revealed a spike at HLA for association with PD that reached genome-wide significance. The association peak was at rs3129882, a SNP in intron 1 of HLA-DRA which had previously been shown to associate with variation in expression of HLA-DRA, DRB5 and DQA2 [3,4]. Association of rs3129882 with PD had an odds ratio (OR) = 1.31, and P = 3610 28 in discovery and was replicated in independent datasets in the same study [1]. The HLA region spike in the NGRC data included 107 SNPs that achieved P values of 10 23 to 3610 28 . A subsequent GWAS conducted in the Dutch population (772 cases, 2024 controls) confirmed the association of PD with HLA [5]. Their most significant SNP was rs4248166: OR = 1.36, P = 4610 25 which also maps to the HLA class II region. The involvement of the HLA region in PD was also confirmed by the International Parkinson Disease Genomics Consortium (IPDGC) meta-analysis [6], which identified chr6:32588205 in the HLA class II region as the most significant SNP in their discovery sample (5333 cases, 12019 controls) with OR = 0.70, P = 3610 28 and in their replication sample (7053 cases, 9007 controls) OR = 0.80, P = 9610 28 .
It is not unexpected that different GWAS's are identifying different HLA SNPs; arrays with different SNPs were used.
However, do the different top SNPs from various studies all tag the same locus or could there be more than one PD-associated susceptibility variant in HLA? The aim of this study was to explore this question.

HLA hits in three GWAS's
The most significant HLA SNPs from the three GWAS's including and subsequent to our original report are shown in Table 1. They span a ,100 kb region in the HLA class II region. We expected to find strong linkage disequilibrium (LD) (measured by r 2 , where r is the correlation coefficient) among them, assuming all three are tagging the same PD-susceptibility locus. Surprisingly, there was very little correlation between the SNPs as evidenced by pair wise r 2 = 0.00, 0.09 and 0.15 ( Figure 1).

Conditional analysis in NGRC GWAS
To gain a better understanding of the HLA association with PD, we performed step-wise conditional analysis in the NGRC GWAS. Conditional analysis allows testing all the SNPs in the region (or a chosen subset, here according to statistical significance), to identify the most significant one, and repeating the analysis conditioned on the most significant SNP to see if others are significant in addition to the top SNP [7,8,9]. The process is repeated, each time conditioning on all SNPs that emerged as most significant in prior rounds, until all SNPs whose significance is dependent on other SNPs are identified and removed. We performed conditional analysis on 107 HLA SNPs that had achieved P,0.001 for PD-association in the NGRC GWAS [1], using PLINK v1.07 software [10] (Table S1). In round 1, when analysis was performed conditioned on the single most significant SNP (SNP1, rs3129882), 90 of 106 SNPs lost significance while 16 SNPs remained significant at P,0.05. The SNP with the lowest P value (rs3993757, P = 0.002) was marked SNP2. When the analysis was repeated (round two) conditioned on both SNP1 and SNP2 for the 15 SNPs that survived round one, 13 were associated with PD with P,0.05, and the most significant SNP of this analysis (rs2844505, P = 0.006) was marked SNP3. When the analysis was repeated with the 12 remaining SNPs conditioning now on SNP1 and SNP2 and SNP3 (round three), only one SNP had P,0.05 (rs9268515, P = 0.025) and it was marked SNP4. Summary of the results are shown in Table 2. For the full analysis see Table S1.
Step-wise conditional analysis revealed four HLA SNPs with seemingly independent effects on PD. We re-tested association of each of the four SNPs with PD conditioning on the other three. We obtained P = 5610 24 for SNP1 conditioned on SNP2 and SNP3 and SNP4, P = 5610 24 for SNP2 conditioned on SNP1 and SNP3 and SNP4, P = 4610 23 for SNP3 conditioned on SNP1 and SNP2 and SNP4, and P = 0.025 for SNP4 conditioned on SNP1 and SNP2 and SNP3.

Interaction among SNPs 1-4 in NGRC
We tested for and did not find significant evidence for interaction among the four SNPs. The full model testing all pair-wise interactions among the four SNPs compared with a model with no interactions yielded P = 0.9. Testing each pair-wise interaction, with all other SNPs in the model as covariates, yielded P = 0.2-0.7. Lack of evidence for interaction could have been due to insufficient power (discussed further under Replication).

Linkage Disequilibrium
We examined LD between the four SNPs that withstood conditional analysis. LD was measured as D9 and r 2 (correlation coefficient) [11]. In the context of disease association, r 2 is commonly used to assess correlation among SNPs (see for example [8]). Using the NGRC data for estimating LD, pair-wise D9 for the four NGRC SNPs ranged from 0.17 to 0.75 and r 2 ranged from 0.00 to 0.09. Using the 1000 Genomes Project data for estimating LD (to allow inclusion of IPDGC SNP) the NGRC SNPs 1-4 had D9 = 0 to 0.88 and r 2 = 0.01 to 0.20 ( Figure 2). In relation to the top SNPs from other GWAS's ( Figure 2), SNP1, SNP2, and SNP3 showed little or no correlation with them (0.00#r 2 #0.15), whereas SNP4 was moderately correlated with the top SNP from IPDGC (r 2 = 0.60).

Haplotype analysis
We performed haplotype analysis for the SNPs that had emerged from NGRC conditional analysis (SNP1, SNP3 and SNP4; SNP2 was not included because its minor allele was rare and none of the haplotypes carrying the minor allele of SNP2 had Figure 1. Linkage disequilibrium (LD) among top HLA associations from three PD GWAS's. LD was measured as D9 (panel A) and as r 2 (panel B) among rs3129882 from the NGRC GWAS [1], rs4248166 from Dutch GWAS [5], and 6-32588205 from IPDGC meta analysis [6].

Replication
We attempted to replicate the following observations: (a) SNPs 1-4 are associated with PD; (b) association of each SNP with PD remains significant when conditioned on the other three SNPs; (c) haplotype analyses will reveal similar pattern of increasing risk with increasing risk alleles as seen in our NGRC results. We performed the replication in an independent dataset (843 cases and 856 controls) that is published [12] and publicly available on the NIH database of Genotypes and Phenotypes (dbGaP, accession number phs000126.v1.p1).
We performed haplotype analysis for SNP1 and SNP4. Since the NGRC haplotype analysis was performed with three SNPs, for a direct comparison, we repeated the NGRC haplotype analysis with SNP1 and SNP4, leaving out SNP3. The NGRC and replication haplotype analyses yielded similar results ( Table 5), with the most significant effect for the haplotype GG with two risk alleles (OR NGRC = 1.51, P NGRC = 1610 29 , OR Replication = 1.41, P Replication = 2610 24 , OR Pooled = 1.48, P Pooled = 2610 212 ), followed by AG (OR NGRC = 1.23, P NGRC = 3610 23 , OR Replication = 1.28, P Replication = 0.01, OR Pooled = 1.25, P Pooled = 2610 24 ); the results for GC are less reliable due to the smaller sample size ( Table 5).
Test of interaction between SNP1 and SNP4 in the combined NGRC and replication data yielded OR Interaction = 1.12, P Interaction = 0.19. If interaction exists, it is weak (OR interaction ,1.12) and would require a larger sample size than available here to achieve significance. Our power to detect OR interaction = 1.12 with P = 0.05 was 42%. We had 80% power to detect OR interaction $1.20, and 90% power to detect OR interaction $1.23, using the combined NGRC and replication data. Therefore, we should have been able to detect interactions with moderate or high magnitude.
The effects of SNP1 and SNP4 appeared to be additive, as suggested by the OR for genotypic combinations, increasing incrementally with the number of risk alleles an individual possessed up to OR = 1.94, P = 2610 211 for individuals who were homozygous for the risk allele at both SNP1 and SNP4 ( Table 6).

Discussion
It is intriguing that different GWAS's have identified different SNPs as their top PD associated SNP in the HLA region. We questioned whether they tag the same susceptibility locus. In this study, we demonstrated that the top hits from different studies are not strongly correlated with each other. Low correlation among PD-associated SNPs does not rule out the possibility that they tag the same locus, but it does raise the possibility that there may be more than one PD signal in HLA. Using step-wise conditional analyses on the NGRC data we uncovered four seemingly independent signals for PD within the HLA region. There was no correlation and no detectable interaction among the four NGRC SNPs. Two of the four NGRC SNPs, rs3129882 (SNP1) and rs9268515 (SNP4), were replicated in an independent dataset. We have therefore demonstrated that there are at least two HLA-PD associations that cannot be explained by LD. These signals  (Table S1) 16 with P,0.05 13 with P,0.05 1 with P,0.05 Step-wise conditional analysis was performed for 107 SNPs in the HLA region that achieved P,10 23 in GWAS. The full analysis is shown in Table S1. Here, we show the summary results for the four SNPs that remained significant after conditioning on the other significant SNPs. For consistency, we show all odds ratios (OR) on the positive side (i.e., testing risk allele against the alternate allele). The risk allele at each SNP is shown in bold. All association tests were adjusted for age at enrollment, sex, and PC1 and PC2 (principal components that define Jewish/non-Jewish origin and the European country of ancestry). Once the four SNPs that retain conditioned P,0.05 were identified, we re-tested association of each of the SNPs with PD conditioning on the other three. We obtained P = 5610 24 for SNP1 conditioned on SNP2 and SNP3 and SNP4, P = 5610 24 for SNP2 conditioned on SNP1 and SNP3 and SNP4, P = 4610 23 for SNP3 conditioned on SNP1 and SNP2 and SNP4, and P = 0.025 for SNP4 conditioned on SNP1 and SNP2 and SNP3. BP = base pair position of the SNP on chromosome 6. Minor/major allele = the two alternative nucleotides at the SNP, the one with higher frequency denoted as major allele. MAF = minor allele frequency. HWE P = P value for the test of Hardy-Weinberg Equilibrium. doi:10.1371/journal.pone.0027109.t002 may indicate different PD-associated alleles at a single susceptibility gene, or different PD susceptibility loci. In support of the different alleles in the same gene hypothesis, there are known examples of multiple disease-associated alleles at the same gene for HLAassociated diseases including type1 diabetes [13] and multiple sclerosis [14], as well as multiple association signals for PD within one gene, notably, SNCA [8,15]. Alternatively, the signals may represent different genes. SNP1, rs3129882, the original genome-wide significant finding in NGRC, is an expression quantitative trait locus (eQTL). It is in intron 1 of HLA-DRA and is associated with expression levels of HLA-DRA, DRB5 and DQA2 genes [3,4]. SNP4, rs9268515, is highly correlated (r 2 = 0.95) with three eQTL SNPs (rs3793127, rs3763309, rs3763312) that are associated with expression levels of HLA-DQA2 [4]. SNP4 also correlates with the top SNP of IPDGC (r 2 = 0.60), which maps between DRA and DRB5, and could be tagging any of the classical HLA genes in the DR-DQ region due to their high LD. This raises the intriguing possibility that PD may be associated with both regulatory elements that influence HLA class II gene expression, and with a classical HLA class II allele. The allele frequency for SNP1 varies significantly in Caucasian Americans according to the European country from which their ancestors immigrated to the US [1]. Furthermore, SNP1 is more strongly associated with sporadic PD than familial PD [1]. Therefore, depending on subject selection, a study may find a positive association (as reported in NGRC), no association, or even an inverse association between SNP1 and PD. We used principal components to correct for ethnic and geographic variability. The subpopulations that differed were the Jewish and the Irish (defined by self-report and verified by principal components [1]). SNP1 and SNP4 remained significant when we excluded the Jewish and the Irish individuals. A recent study showed an inverse association between SNP1 and PD in a mixed Irish and Polish population [16]. It is noteworthy that in the original NGRC GWAS report [1], the Americans of Irish-descent also showed an inverse association between SNP1 and PD, while all other Europeandescent subpopulations showed a positive association (number of individuals of Polish-descent were few). The Irish/Polish study found the association using a recessive model, as compared to additive model used in NGRC. This is also in agreement with the NGRC data, because the recessive model projects a larger effect size than additive model because it compares individuals who are homozygous for the risk allele to all others.
SNP2 and SNP3 are more than 800 kb away from SNP1 and SNP4. SNP2 had a low minor allele frequency and thus could not be studied in any detail. SNP3 indicated a potential third signal in the NGRC data, but it did not replicate. Much larger sample sizes will be required to determine if and how SNP2 and SNP3 affect susceptibility to PD.
SNP4 was the last of the four seemingly independent SNPs to show up in the conditional analysis of the NGRC data with a modest P = 0.025; and it was the only one of the four NGRC SNPs that showed any correlation with the top HLA hit of IPDGC. The direction of the SNP4 effect was the same as the IPDGC SNP; they reported reduced risk with the minor allele, we report increased risk with the major allele (we used the risk alleles throughout the text and the main tables to keep the ORs consistently in the positive direction). We used a liberal P,0.05 given the exploratory nature of the study and that we would follow with a replication study. That our last significant HLA SNP tagged the most significant HLA SNP in IPDGC gives credence to SNP4 being a true association rather than a false positive signal due to a relaxed significance threshold. SNP4 was genotyped in NGRC and imputed in the replication study. The Impute Information score was 0.95, suggesting relatively high reliability. Cases and controls were imputed together under the same conditions, using the same method, and independently of NGRC, and the results showed significantly different allele frequencies between cases and controls for SNP4, in line with the original observation in NGRC. Thus it is unlikely that the results for replication were severely skewed by the fact that SNP4 was imputed; however, replication by other studies will help clarify further. We do not know if HLA plays a larger role for PD risk for certain individuals and perhaps little or no role for others. HLA may be involved in a subtype of PD; for example, some cases of PD may be due to infection [17] or autoimmunity [18]. Alternatively, HLA may have a ubiquitous role in all PD perhaps via inflammatory response to a variety of causes [19]. The two possibilities are not mutually exclusive; i.e., it is possible that HLA and the immune system affect PD pathogenesis in more than one way.
Our results illustrate the utility of conditional analyses and examination of LD structure in replication studies. These exploratory studies can be used to investigate the possibility of more than one disease association in a region. They can also help understand the differences in the results across association studies. We acknowledge the exploratory nature of this study and the difficulty of deciphering HLA-disease association due to the complex structure of the region. At this point, we have probably identified markers and not the true risk alleles. Not only is it critical to understand the nature of the association(s) of PD with HLA, it is equally important to understand the interrelationship between PD and other HLAassociated disorders, particularly multiple sclerosis which like PD is a neurodegenerative disorder [14,20,21,22]. Solving this evolving story will take open collaboration to amass large datasets with genotype, sequence, expression and epigenetic data.

Human Subjects
This study was approved by the Institutional Review Boards of the New York State Department of Health, Emory University and Atlanta VA Medical Center, VA Puget Sound Heath Care System and University of Washington, and Albany Medical College. All participants gave consent. The majority was signed written consent. A subset of participants who preferred to remain anonymous gave verbal consent. Both written and verbal consent procedures were approved by the IRBs and documented in each participant's data file. A GWAS conducted with 2000 cases and 1986 controls from the NGRC, which had previously identified HLA as a PD-associated gene [1] was examined as the primary dataset here. Persons with PD had been diagnosed using standard criteria [23]. Cases and controls were unrelated, non-Hispanic Caucasians from the United States. (See below for Replication dataset.) Genotyping DNA was obtained from whole blood for NGRC subjects, and from blood, cell lines, or whole genome amplified DNA for the replication dataset. NGRC individuals were genotyped using the Illumina HumanOmni1-Quad_v1-0_B array, with a call rate of 99.92% and 99.99% reproducibility. Details of the GWAS genotyping and statistical quality control (QC) have been previously published [1]. 9,232 SNPs were genotyped in the HLA region in NGRC (from 29-33 Mb on chromosome 6; genome build 36).

Quality Control for NGRC GWAS
NGRC genome-wide genotypes had previously been filtered based on standard QC criteria [1]. Two principal components (PC1 and PC2) were found to associate significantly with PD, representing Jewish/non-Jewish ancestry and European countries of ancestral origin. Sex also significantly associated with case/control status because PD is more prevalent in men than in women, and the population used in this study reflects that disparity. Furthermore, while controls in NGRC were intentionally selected to be older than patients' age at onset, age effects are not completely corrected by the study design. Thus, all association tests with NGRC data included age at enrollment, sex, PC1, and PC2 as covariates.
Step-wise conditional Analysis The step-wise conditioning followed a previously described and commonly used protocol [7], with the modification that here we used logistic regression instead of chi-square and adjusted for covariates (sex, age, PC1, PC2). We used 107 HLA SNPS that reached P,0.001 in GWAS. For round 1 of the conditional analysis, all 107 SNPs were ranked by P value. The SNP with lowest P value was marked as SNP1. The remaining 106 SNPs were tested for association with PD risk conditioned on SNP1. The SNP with the lowest P value was marked SNP2, and analysis was repeated now conditioning on SNP1 and SNP2. The process was repeated until no remaining SNPs had P,0.05 for association with PD. We chose to use P,0.05, rather than a more stringent threshold, because the study was exploratory and would be followed with replication.

Linkage Disequilibrium
We used Haploview [24] to construct haplotypes, visualize haploblocks, and estimate LD and r 2 . Since not all of the SNPs Haplotypes were composed of SNP1, SNP3 and SNP4, shown in that order. These SNPs and SNP2 survived conditional analysis at P,0.05. SNP2 had low frequency and any haplotype with minor allele of SNP2 was too infrequent (,0.01) to be included in haplotype association tests. For each SNP, the allele that was associated with higher risk for PD (see Table 2) is shown in bold. OR and P values were calculated for each haplotype relative to AAC haplotype which has the lowest risk for PD. Analyses were adjusted for age at enrollment, sex, PC1, and PC2 (principal components that define Jewish/non-Jewish origin and the European country of ancestry  In the replication dataset, SNP1 and SNP3 were genotyped, SNP2 was not genotyped and did not impute, SNP4 was imputed (info score = 0.95). We used imputed genotype probabilities in the R software, adjusting for age and sex for replication, and adjusting for age, sex and study for pooled data. Replication P values are one sided [28] given the directionality of the hypothesis; P values for pooled NGRC and replication are two sided. doi:10.1371/journal.pone.0027109.t004 reported by other studies were genotyped or imputed in NGRC, LD was calculated using genotypes from the 1000 Genomes Project [25]. SNP genotypes were downloaded from the 1000 Genomes Project website (www.1000genomes.org; CEU low coverage, July 2010 release).

Haplotype Analysis
We used HAPSTAT-3.0 [26] to construct the haplotypes, estimate haplotype frequencies for the top associating SNPs and test haplotype association with PD, while adjusting for age, sex, PC1 and PC2. Only haplotypes with frequencies of $0.01 in cases and controls combined were included.

Replication
The dataset for replication was downloaded from dbGaP with IRB approval and Data Use Certification from the National Institute of Neurological Disorders and Stroke (NINDS) ((http://www.ncbi.nlm. nih.gov/sites/entrez?db = gap) accession number phs000126.v1.p1). The replication dataset was a single GWAS with 843 persons with PD collected by the PROGENI and GenePD studies and 856 neuronormal controls from the NINDS Repository [12]. Cases were familial PD (one individual per family). Diagnosis was made using standard criteria. Cases and controls were Caucasian and unrelated. SNP1 (rs3129882) and SNP3 (rs2844505) were genotyped. We imputed SNP4 (rs9268515) with information score = 0.95. SNP2 was not genotyped and did not impute. IMPUTE v2 [27] was used with HapMap 3 and 1000 Genomes Project genotypes as reference data. Genotype probabilities (dose 2-0) were analyzed in R software http:// www.r-project.org/. Haplotype analyses were performed as described above. P values were one-sided for replication given the directional hypotheses [28]. The pooled analyses of NGRC and replication had two-sided P values and were adjusted for study as well as age and sex.