Comparison of HapMap and 1000 Genomes Reference Panels in a Large-Scale Genome-Wide Association Study

An increasing number of genome-wide association (GWA) studies are now using the higher resolution 1000 Genomes Project reference panel (1000G) for imputation, with the expectation that 1000G imputation will lead to the discovery of additional associated loci when compared to HapMap imputation. In order to assess the improvement of 1000G over HapMap imputation in identifying associated loci, we compared the results of GWA studies of circulating fibrinogen based on the two reference panels. Using both HapMap and 1000G imputation we performed a meta-analysis of 22 studies comprising the same 91,953 individuals. We identified six additional signals using 1000G imputation, while 29 loci were associated using both HapMap and 1000G imputation. One locus identified using HapMap imputation was not significant using 1000G imputation. The genome-wide significance threshold of 5×10−8 is based on the number of independent statistical tests using HapMap imputation, and 1000G imputation may lead to further independent tests that should be corrected for. When using a stricter Bonferroni correction for the 1000G GWA study (P-value < 2.5×10−8), the number of loci significant only using HapMap imputation increased to 4 while the number of loci significant only using 1000G decreased to 5. In conclusion, 1000G imputation enabled the identification of 20% more loci than HapMap imputation, although the advantage of 1000G imputation became less clear when a stricter Bonferroni correction was used. More generally, our results provide insights that are applicable to the implementation of other dense reference panels that are under development.

respectively, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the 'author contributions' section. Infrastructure for the CHARGE Consortium is supported in part by the National Heart, Lung, and Blood Institute grant R01HL105756. ARIC is carried out as a collaborative study supported by National Heart, Lung, and Blood Institute (NHLBI) contracts HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, and HHSN268201100012C, R01HL087641, R01HL59367 and R01HL086694; National Human Genome Research Institute contract U01HG004402; and National Institutes of Health contract HHSN268200625226C.  13], including mostly common SNPs with a minor allele frequency (MAF) of over 5%. HapMap imputation enabled the interrogation of most common SNPs possible, even while meta-analyzing studies that used different genotyping arrays with low overlap [1]. However, low-frequency and rare variants are not well covered in the HapMap panel [14]. In addition, genetic variants other than SNPs, such as small insertion/deletions (indels) and large structural variants, are not included in HapMap-based imputed projects, and may be possible sources of missing explained heritability.
In contrast, the more recently released Phase 1 version 3 of the 1000 Genomes Project (1000G) is based on a larger set of individuals [15], and comprises nearly 40 million variants, including 1.4 million indels. 1000G allows the interrogation of most common and low-frequency variants (MAF > 1%), and rare variants (MAF < 1%) that were previously not covered [16]. In general, improving reference panels can lead to the identification of additional significant loci both through the addition of new variants and the improved imputation of known variants. 1000G imputation may thus have several advantages, but given that the denser 1000G imputation comes at the cost of an increased computational and analytical burden, it is important to estimate the observed benefits in practice. Furthermore, such empirical data is needed to make informed decisions in the future on the use of newer reference panels such as UK10K, and the Haplotype Reference Consortium [17,18]. While several GWA studies using 1000G imputation have been published or are in progress, their sample size differs from the previous

Results
Baseline characteristics of the participants for each of the included studies are shown in S1  [19]. Using a genome-wide significance threshold of 5×10 −8 , a total of 1,210 SNPs across 30 loci were associated with circulating fibrinogen concentration in the HapMap imputed GWA study compared with 4,096 variants across 35 loci in the 1000G imputed GWA study (S1 Fig and S2 Fig). These loci are described in further detail in S3 Table. Of these loci, six were associated only in the 1000G GWA study and one was associated only in the HapMap GWA study, while 29 were overlapping ( Fig 1A). The HapMap and 1000G lead variants of non-overlapping loci are described in Table 1, and leads variants of overlapping loci are described in Table 2. Among significant loci, the correlation coefficient across cohorts of the beta coefficients, P-values, and imputation quality scores of HapMap and 1000G lead variants were 0.925, 0.998, and 0.435 respectively (S3 Fig).

Non-overlapping loci
The lead variants for the seven non-overlapping loci always differed between the HapMap and 1000G GWA studies, and all P-value differences were greater than one order of magnitude  For four of these six loci, the correlation r 2 between allelic dosages of the most associated variants imputed using HapMap and 1000G was less than 0.8 (S4 Table). None of the 1000G lead variants among these four loci were included in the HapMap GWA study, and neither were any good proxies (S5 Table).
A regional plot of the 6p21.3 locus, which was significant only in the HapMap GWA study, is shown in Fig 4. The most significant P-value at the locus was 8.5×10 −9 in the HapMap GWA study compared to 7.9×10 −6 in the 1000G GWA study. The correlation r 2 between imputed dosages of the HapMap and 1000G lead variants was low (0.07). The HapMap lead SNP was included in the 1000G GWA study under a different name, rs114339898, but the imputation quality was only sufficient for inclusion in seven of the studies (S5 Table).

Overlapping loci
Regional plots of the 29 overlapping loci are shown in S4 Table. The lead variants of eight of the 29 overlapping loci were the same for the HapMap and 1000G GWA studies. P-value differences between the HapMap and 1000G GWA studies were often small: they were smaller than or equal to one order of magnitude for 22 loci. P-values differed by more than one order of magnitude for seven loci. Five of these loci were more significant in the 1000G GWA study (2q37.3, 4q31.3, 10q21.3, 12q24.12, and 21q22.2), while two of these loci were more significant in the HapMap GWA study (5q31.1 and 8q24.3).
Among the five overlapping loci with lower P-values in the 1000G GWA study, the correlation r 2 between imputed dosages of lead variants from HapMap and 1000G was higher than 0.8 for 4 loci, but was 0.68 for the 12q24.12 locus (S4 Table). There was no good proxy of the 1000G lead variant at the 12q24.12 locus included in the HapMap GWA study.
The 5q31.1 and 8q24.3 loci had lower P-values in the HapMap GWA study. The correlation r 2 between imputed dosages from HapMap and 1000G was almost perfect for 5q31.1, but was 0.75 for 8q24.3. The HapMap lead variant of the 8q24.3 locus was also included in the 1000G GWA study. These differences between HapMap and 1000G imputation for the 29 overlapping loci are summarized in

Sensitivity analyses
Because more independent variants are included in the 1000G GWA study [20,21], using the conventional genome-wide significance threshold of 5×10 −8 may result in an increased type I error rate. When we used a more stringent genome-wide significance threshold of 2.5×10 -8 for the 1000G GWA study as suggested by Huang et al. [20], there were 4 loci significant only in the HapMap GWA study, 5 loci significant only in the 1000G GWA study, and 26 overlapping loci ( Fig 1B). Three loci that were significant using both HapMap and 1000G imputation thus became non-significant when the stricter significance threshold was applied to the 1000G results.
Genomic inflation factors to correct for genomic control were calculated separately for the HapMap and 1000G analyses of each study. Thus, differences in the genomic inflation factors could explain some of the differences between the HapMap and 1000G results. When we repeated the HapMap and 1000G GWA study without applying genomic control corrections, 2 loci were associated only with circulating fibrinogen concentration in the HapMap GWA study, 6 were only associated in the 1000G GWA study, and 30 were associated in both GWA studies ( Fig 1C and S6 Table). For practical reasons, not all of the studies used the same imputation software, analysis software, or covariates for the HapMap and 1000G analyses. Specifically, fewer studies used principal components in the HapMap GWA study. When we restricted the analysis to those studies that used the same imputation software, analysis software, and covariates in the HapMap and 1000G GWA studies (S7 Table and S8 Table), 3 loci were associated only in the 1000G GWA study, and 6 were associated in both the HapMap and the 1000G GWA studies (Fig 1D and S9  Table). No loci were associated only in the HapMap GWA study.

Discussion
In our fibrinogen GWA study of 91,953 individuals, using 1000G instead of HapMap imputation led to the identification of six additional fibrinogen loci, suggesting an improvement in the detection of associated signals. Nevertheless, there was also one locus that was only identified when using HapMap imputation, and the advantage of 1000G imputation was attenuated when using a more stringent Bonferroni correction for the 1000G GWA study. The inclusion of indels in the 1000G GWA study did not lead to the identification of any new loci. Only one locus in our 1000G GWA study was led by an indel, and it was in strong linkage disequilibrium with a SNP present in HapMap.
While this is the first study of the impact of HapMap and 1000G imputation on genomewide associations using exactly the same individuals in a large-scale consortium setting, four previous studies have addressed this question on a smaller scale. In the Wellcome Trust Case Control Consortium, consisting of 2000 for seven diseases (bipolar disorder, coronary artery disease, Crohn's disease, hypertension, rheumatoid arthritis, type 1 and 2 diabetes) and 3000 shared controls, Huang et al. re-analyzed GWA studies of these seven diseases with 1000G imputation, and found two novel loci: one for type 1 diabetes and one for type 2 diabetes [20]. A more conservative genome-wide significance threshold of 2.5×10 −8 was used in the 1000G GWA studies, while the MAF inclusion threshold was the same at 1%. The second study was a 1000G imputed GWA study of around 2000 cases of venous thrombosis and 2400 controls [22]. Using a conservative P-value threshold of 7.4×10 −9 , but no MAF threshold, Germain et al. identified an uncommon variant at a novel locus that was not identified in the HapMap GWA study [22]. Third, the National Cancer Institute Breast and Prostate Cancer Cohort Consortium found no new loci by applying 1000G imputation to their existing dataset of 2800 cases and 4500 controls [23,24]. The conventional genome-wide significance threshold of 5×10 −8 was used, but no MAF threshold was used. Fourthly, Wood et al. compared HapMap and 1000G imputation for a total of 93 quantitative traits in 1210 individuals from the InCHIANTI study [25]. Using a significance threshold of 5×10 −8 for both the HapMap and 1000G GWA studies, they found 20 overlapping associations, 13 associations that were only significant using 1000G imputation, and one association that was only significant using Hap-Map imputation. For the association significant only in HapMap, the P-value difference between HapMap and 1000G lead variants was less than one order of magnitude. When the authors lowered their significance threshold to 5×10 −11 to reflect the number of tests being done in analyzing multiple traits, 9 associations remained significant based on HapMap imputation and 11 associations remained significant based on 1000G imputation.
All four of these comparison studies used an earlier 1000 genomes reference panel. The present study adds to the literature as it is based on the widely implemented Phase 1 Version 3 of 1000G. Crucially, the large sample size allowed us to examine differences at many non-overlapping and overlapping loci, and improved the generalizability of our results, as ongoing GWA studies are often conducted in large consortia. Two further studies with different approaches also provide insights. First, Springelkamp et al. found a novel locus using 1000G imputation even though the sample size was smaller than the previous HapMap GWA study [26,27]. The same genome-wide significance (5×10 −8 ) and MAF (1%) thresholds were used. The lowest P-value at the locus was 1.9×10 −8 . Because different individuals were included in these GWA studies, the difference between HapMap and 1000G may partially be explained by sampling variability. Second, Shin et al. identified 299 SNP-metabolite associations based on HapMap imputation, and reexamined the associated loci using 1000G imputation in the same individuals [28]. They found that HapMap and 1000G imputation yielded similar P-values and variance explained for all but one loci. For that locus, the 1000G imputation based association was considerably stronger: the explained variance increased from 10% to 16%, and the P-value decreased from 8.8×10 −113 to 7.7×10 −244 . Although Shin et al. did not compare loci identified using HapMap and 1000G, their results do support our finding that large differences in association strengths are possible, albeit not at every locus. All these studies, along with the current study, suggest that additional signals not previously identified in HapMap GWA studies can be found using the 1000G GWA study, with the same sample size.
In the current study we demonstrate that, although 1000G imputation was overall more effective at identifying associated loci, HapMap imputation may outperform 1000G imputation for specific loci. The 6p21.3 locus, corresponding to the major histocompatibility complex (MHC), was significant in the HapMap GWA study but not in the 1000G GWA study. The MHC locus is highly polymorphic and hosts many repetitive sequences, rendering it difficult to genotype and sequence [29][30][31]. The HapMap reference panel was based largely on the genotyping of variants that were known at that time, whereas the 1000G reference panel is based entirely on low-coverage sequencing. This may explain the rather large discrepancy between HapMap and 1000G at this locus.
Differences in associations when GWA studies are based on different participants can be explained by sampling variability, even with the same sample size. Hence, by using exactly the same participants in the HapMap and 1000G comparisons in the present project, we rule out both statistical power and sampling variability as possible explanations for differences between the HapMap and 1000G GWA studies. Several real differences between the HapMap and 1000G reference panels may underlie the net benefit of 1000G imputation. The HapMap reference panel was largely based on genotypes of known variants, whereas the 1000G reference panel was primarily based on low-pass whole genome sequencing, enhancing the inclusion of novel variants. Additionally, most studies used only a small number of Europeanancestry participants for HapMap imputation, whereas they used a larger number of participants of all available ancestries for 1000G imputation, introducing further haplotypes into the imputation process. Nevertheless, some analytical differences between the HapMap and 1000G analyses were not controlled for in our main analysis and therefore remain as potential alternative explanations. First, genomic control corrections were applied to the results of each of the studies before meta-analysis, separately for the HapMap and 1000G GWA studies. As a result, for any given study, there could be differences between the correction applied to the HapMap GWA analysis and to the 1000G GWA analysis. As these differences do not appear to differ there were 6 loci only significant in the 1000G GWA study compared to 2 loci only significant in the HapMap GWA study. The second difference between the HapMap and 1000G GWA studies that may explain our findings is that in the 1000G GWA study more studies were adjusted for ancestry-informative principal components. This difference reflects common practice, as population stratification is suspected to have a stronger influence on variants with lower MAF, and 1000G includes more of these [32]. However, the adjustments are applied to variants across the spectrum of minor allele frequencies, which may have influenced our results.
Thirdly, some studies used different software for HapMap and 1000G imputation (S1 Table). The imputation quality metrics used by IMPUTE and MACH differ, and this has traditionally been dealt with by applying different imputation quality thresholds: > 0.3 for MACH and > 0.4 for IMPUTE [5,33]. In studies that used different imputation software for the Hap-Map and 1000G GWA studies, the filtering of variants can therefore differ. There may, additionally, be real differences in imputation quality. Finally, some studies used different analysis software (S3 Table). When we restricted our analysis to only those studies that used the same covariates, analysis software, and imputation software for the HapMap and 1000G GWA studies, 3 loci were only significant in the 1000G GWA study, while all loci significant in the Hap-Map GWA study were also significant in the 1000G GWA study. This suggests that differences in imputation software, analysis software, and covariates do not fully explain the observed difference between the HapMap and 1000G GWA studies, and that there are real differences resulting from choice of reference panel.
1000G GWA studies include more independent statistical tests than HapMap GWA studies [20,21]. Thus, while a P-value threshold of 5×10 −8 , correcting for 1 million independent tests, maintains the type I error rate at 5% for HapMap GWA studies, this may not be the case for 1000G GWA studies. Using 1000G pilot data, Huang et al. estimated that 2 million independent tests were being done, and thus suggested a P-value threshold of 2.5×10 −8 [20]. In our study we used a P-value threshold of 5×10 −8 for both the HapMap and 1000G GWA studies, in accordance with the majority of published 1000G GWA studies [26,[34][35][36][37]. When we used the threshold of 2.5×10 −8 in the 1000G imputed GWA study, the difference between the Hap-Map and 1000G GWA studies became smaller. Thus, while we expect applying 1000G imputation may lead to novel findings using the conventional genome-wide significance threshold, this expectation may not be met when using stricter, and perhaps more appropriate thresholds. In other words, using the traditional significance threshold for 1000G may increase the type 1 error rate, which may account for some additional significant loci detected in 1000G GWA studies.
In this study we only examined variants with a MAF of greater than 1%. This restriction was common practice for HapMap GWA studies, but given the improved coverage of rare variants in 1000G, this may not remain the case for 1000G GWA studies. Different MAF thresholds have been used in published 1000G GWA studies, although many have used 1% [20,22,23,26,27,[34][35][36][37][38][39][40]. Therefore, an advantage of 1000G not illustrated by this study may be the identification of rare variants, at new loci or as secondary signals at known loci. The advantage of 1000G imputation will then in part depend on the importance and impact of rare variants in the trait being studied, as well as the distribution of these variants. Rare and uncommon variants are often clustered in genes with previously associated common variants, limiting the new biology revealed through their identification [41,42]. This appears to be the case for fibrinogen concentration as well [43,44].
In conclusion, we show that the reference panel used in GWA studies can have an impact on the identification of common variants, although our results do not support the expectation that 1000G imputation always outperforms HapMap imputation, as we found one locus that appeared to be better covered in HapMap. This suggests that GWA studies will continue to be more successful as newer reference panels such as the Haplotype Reference Consortium are adopted. Nevertheless, our results also suggest that the benefits of 1000G are considerably reduced when the additional independent tests introduced by 1000G imputation are corrected for. Given that the bulk of the new information provided by 1000G imputation relates to lowfrequency variants, we expect the penalty increased multiple testing burden to become less relevant in future studies as the power to examine these low-frequency variants increases with larger sample sizes and enhanced imputation quality. Imputation using the Haplotype Reference Consortium reference panel improves the imputation quality of low-frequency variants when compared to 1000G, and future reference panels based on the wealth of whole-genome sequencing data currently being generates by efforts such as TOPMed are likely to continue this trend [45].
the Medical Ethics Committee of the University of Greifswald. The TwinsUK study was approved by the NRES Committee London-Westminster (formerly St Thomas' Ethics Committee). The WGHS was approved by Brigham and Women's Hostpital IRB.

Genotyping and imputation
Genotyping and pre-imputation quality control methods for each study are shown in S7 Table. Studies imputed dosages of genetic variants using reference panels from the 1000 genomes project with MACH [47,48] or IMPUTE [49]. Studies imputed variant dosages using Phase 2 reference panels from the HapMap project with MACH [47,48], IMPUTE [49], or BIMBAM [50]. We excluded variants with MACH imputation quality < 0.3, IMPUTE/BIMBAM imputation quality < 0.4, or MAF < 0.01 from each study.

Fibrinogen measurement
Fibrinogen concentration was measured in citrated or EDTA plasma samples using a variety of methods including the Clauss method, immunonephelometric methods, immunoturbidimetric methods, and other functional methods. Fibrinogen concentration was measured in g/L and natural log transformed. Details about the fibrinogen measurement are shown in S10 Table. Genome-wide association analysis All analyses were adjusted for age and sex, and study specific covariates such as center or case/ control status. In family studies, linear mixed models were used to account for family structure. Some studies adjusted the analysis for principle components to account for population structure and cryptic relatedness. Some studies used a different number of principle components in the HapMap and 1000G analyses. The adjustments and analysis software used by each study are shown in S8 Table. We applied a genomic control correction to the results of each of the studies before meta-analysis to remove any remaining genomic inflation. The genomic inflation factor used in this correction was calculated separately in the HapMap and 1000G analyses for each study. We meta-analyzed the results using an inverse-variance model with fixed effects implemented in METAL [51]. Loci were defined as the 500 Kb area on either side of lead variants (the variant with the smallest P-value). Build 36 positions of HapMap SNPs were converted to build 37 using the UCSC genome browser (http://genome.ucsc.edu/cgi-bin/ hgLiftOver). Variants were annotated to genes using ANNOVAR version 2013Mar07. At the meta-analysis level, the imputation quality of each variant was defined as the sample-size weighted mean imputation quality across the studies, not including studies where the variant was filtered out.

Comparison of HapMap and 1000G
When a locus was significant in both the HapMap and 1000G GWA studies we defined it as an overlapping locus. When a locus was significant in only one of the two analyses we defined it as a non-overlapping locus. To compare the strength of association in the HapMap and 1000G GWA studies, we identified loci with P-value differences of 1 order of magnitude or greater (for example: from 5×10 −8 compared to 5×10 −9 or less).
For each significant locus we used two approaches to assess the relationship between lead variants from HapMap and 1000G. First, we determined whether or not the more significant of the two lead variants or a good proxy (linkage disequilibrium r 2 > 0.8) was included in the analysis of the other reference panel. If so, we examined its association in the other reference panel. Thus, if a locus was more significant in the 1000G GWA study, we checked whether the 1000G lead variant or a proxy was included in the HapMap GWA study. Second, we examined the correlation R 2 between HapMap and 1000G lead variants in the form of imputed genotype dosages. This was performed for 5966 individuals from the Rotterdam Study (see study description in S1 Text) [52].

Sensitivity analysis
First, we compared the results of the HapMap and 1000G GWA studies when applying a stricter Bonferroni-corrected P-value threshold of 2.5×10 −8 to the 1000G GWA study. This threshold was suggested by Huang et al. to keep the type 1 error rate at 5% when using 1000G data [20]. Second, we repeated the analysis without using genomic control corrections. Third, we repeated the analysis in 34,098 participants using only the 10 studies that used the same imputation and analysis software as well as the same covariates for the HapMap and 1000G GWA studies.   Table. Loci that were significant in either the HapMap or 1000G GWAS excluding studies that did not use the same imputation software, analysis software, or covariates. (XLSX) S10 Table. Sample and array type used for the fibrinogen measurement in each of the included studies.

Supporting Information
(XLSX) S1 Text. Supplementary Methods. (DOCX) script. The authors thank the staff and participants of the ARIC study for their important contributions. We would like to thank the University of Minnesota Supercomputing Institute for use of the calhoun supercomputers. A full list of principal CHS investigators and institutions can be found at CHS-NHLBI.org. The analyses reflect intellectual input and resource development from the Framingham Heart Study investigators participating in the SNP Health Association Resource (SHARe) project. The authors would like to thank the men and women participating in the HCS as well as The University of Newcastle, Vincent Fairfax Family Foundation and The Hunter Medical Research Institute. We thank the LBC1936 and LBC1921 participants and research team members. We thank the nurses and staff at the Wellcome Trust Clinical Research Facility, where subjects were tested and the genotyping was performed. We thank the LURIC study team who were either temporarily or permanently involved in patient recruitment as well as sample and data handling, in addition to the laboratory staff at the Ludwigshafen General Hospital and the Universities of Freiburg and Ulm, Germany. This work was performed as part of an ongoing collaboration of the PROSPER study group in the universities of Leiden, Glasgow and Cork. The authors are grateful to the study participants, the staff from the Rotterdam Study and the participating general practitioners and pharmacists. We thank Pascal Arp, Mila Jhamai, Marijn Verkerk, Lizbeth Herrera, Marjolein Peters and Carolina Medina-Gomez for their help in creating the GWAS database, and Karol Estrada and Carolina Medina-Gomez for the creation and analysis of imputed data. We thank the many individuals who generously participated in this study, the Mayors and citizens of the Sardinian towns involved, the head of the Public Health Unit ASL4, and the province of Ogliastra for their volunteerism and cooperation. In addition, we are grateful to the Mayor and the administration in Lanusei for providing and furnishing the clinic site. We are grateful to the physicians Angelo Scuteri, Marco Orrù, Maria Grazia Pilia, Liana Ferreli, Francesco Loi, nurses Paola Loi, Monica Lai and Anna Cau who carried out participant physical exams; the recruitment personnel Susanna Murino; Mariano Dei, Sandra Lai, Andrea Maschio, Fabio Busonero for genotyping; Maria Grazia Piras and Monia Lobina for fibrinogen phenotyping.
Steno Diabetes Center and Synlab Holding Deutschland GmbH provided support in the form of salaries for authors T.S.A. and W.M. respectively, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the 'author contributions' section. Infrastructure for the CHARGE Consortium is supported in part by the National Heart, Lung, and Blood Institute grant R01HL105756. ARIC is carried out as a collaborative study supported by National Heart, Lung, and Blood Institute (NHLBI) contracts HHSN268201100005C,  HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C,  HHSN268201100010C, HHSN268201100011C,   SNP Genotyping was performed by The Wellcome Trust Sanger Institute and National Eye Institute via NIH/CIDR. The WGHS is supported by HL043851 and HL080467 from the National Heart, Lung, and Blood Institute and CA047988 from the National Cancer Institute, the Donald W. Reynolds Foundation and the Fondation Leducq, with collaborative scientific support and funding for genotyping provided by Amgen.