Conceived and designed the experiments: JK MRB GB MEW. Performed the experiments: JK. Analyzed the data: JK. Wrote the paper: JK MEW.
MRB is an employee of GlaxoSmithKline. This does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials.
A genome wide association study (GWAS) typically results in a few highly
significant ‘hits’ and a much larger set of suggestive signals
(‘near-hits’). The latter group are expected to be a mixture of true
and false associations. One promising strategy to help separate these is to use
functional annotations for prioritisation of variants for follow-up. A key task
is to determine which annotations might prove most valuable. We address this
question by examining the functional annotations of previously published GWAS
hits. We explore three annotation categories: non-synonymous SNPs (nsSNPs),
promoter SNPs and
New clues about the aetiology of complex genetic diseases have been provided by
genome-wide association studies (GWAS)
Several lines of evidence suggest that these near hits do indeed contain some real
association signals. Firstly, quantile-quantile plots of GWAS association p-values
often show a departure from null expectation that extends into the ranked SNPs below
the genomewide significance threshold
Prioritization of near hits for follow-up may be more effective if functional
information is combined with the GWAS p-values. There is already evidence that
causative SNPs for a wide range of traits are enriched for certain functional
categories
One way to arrive at an objective, empirically based weighing scheme is to use the
observed preponderance of functional annotations in established GWAS hits as a guide
to weighting of ‘near hit’ GWAS SNPs. GWAS data are more appropriate for
this purpose than candidate gene genotyping data, as the SNPs typed in the latter
type of study are often selected on the basis of annotation and therefore could
produce biased results. Two recently published databases of GWAS hits (
It is not clear how dataset-specific these previous findings might be. In this paper,
we compare and contrast two GWAS hit datasets and perform sensitivity analysis to
gauge the robustness of annotation enrichment under different conditions. We focus
on three annotations from three different categories, non-synonymous SNPs (nsSNPs),
promoter SNPs and
We used two published GWAS datasets: ‘Hindorff’ (
We adopted a sensitivity analysis approach in which we contrasted results obtained under two very different scenarios, representing two extreme possible endpoints of average GWAS panel SNP composition. In one we assumed all GWASs had used the Affymetrix Mapping 500K panel (hereafter ‘Affy500’) and in the other that all GWASs had used the Illumina HumanHap 550K panel (hereafter ‘Illu550’). Both panels have been widely used in GWASs to date, but reflect different strategies for marker selection. Illumina selected tagging SNPs whereas Affymetrix selected SNPs based on assay availability and minor allele frequency. The proportion of SNPs with a MAF less than 0.1 on the Illu550 is 22% whereas the proportion on the Affy500 is 34%. In addition to these two extreme approaches, we considered a compromise GWAS panel set comprising the union of these two panels (hereafter ‘Affy500+Illu550’).
We chose three annotation categories; non-synonymous SNPs (nsSNPs), expression quantitative trait loci (eQTLs) and promoter region SNPs. Non-synonymous SNPs alter the amino acid sequence of a gene product, we downloaded these from the UCSC browser selecting nonsense (premature termination codons) and missense mutations from the dbSNP version 130 table.
eQTLs are excellent candidates for GWAS hits as they are thought to be causally
involved in complex traits and may be more closely correlated to the genotype
than the complex trait itself. We defined and selected eQTLs from a study of
global gene expression in lymphoblast cell lines (LCLs)
We used the First exon finder (firstEF) program to identify putative promoter
regions, defined as the 570 bp immediately upstream of the first exon
Our approach to testing for annotation enrichment was to compare the proportion of annotated SNPs in the GWAS hit SNP sets with the GWAS panel SNP sets. We determined standard error bars and statistical significance based on expected binomial variation in the GWAS hits (as the number of SNPs in different annotation classes in the GWAS panel sets was large enough to result in negligible error by comparison).
GWAS panels do not include every SNP in the genome, and it is expected that many
GWAS hits will only be markers for true causal variants, lying outside the GWAS
panel, that are associated via linkage disequilibrium or ‘tagging’.
We address this issue by annotating our GWAS SNPs (both ‘hits’ and
‘nulls’) via LD-proxy. A SNP was defined to be LD-proxy-annotated if
it was in linkage disequilibrium with an annotated SNP with
r2> = 0.8. We used the SNAP web-tool
We note that eQTL annotations already have an element of linkage disequilibrium ‘built in’, as any SNP labelled an eQTL may itself be only tagging a nearby causal SNP. However, our eQTL dataset derives from a smaller GWAS panel (Illumina 300k), making further extension via LD-proxy necessary.
Bayesian analysis provides the most suitable framework for combining annotation
information with evidence from an association study
Note that our definition of ‘true association’ includes the possibility of indirect association via linkage disequilibrium. To account for this, we import annotation data from other SNPs in LD, as we describe above. We also note that BFassoc will typically refer to a hypothesis of causality for a specific phenotype, whereas the BFannot values that we consider below refer to a hypothesis of causality for any phenotype that has been tested in a GWAS. Our method is therefore motivated by the idea that the BFannot values obtained under a general-phenotype definition of causality are a reasonable guide to the BFannot values one would obtain for the specific phenotype in question.
The prior odds, Oprior, are set in advance, and are usually set to
reflect a low prior belief that any one given SNP in the human genome is
causally related to the phenotype in question (as indeed reflected by the small
number of GWAS hits found so far for most complex traits). For example,
Oprior = 10−5 was used by
the Welcome Trust Case Control Consortium
The Bayes Factor for association, BFassoc, is calculable from GWAS
data either via direct computation of the relevant integral
The Bayes Factor for annotation, BFannot is estimated empirically from the GWAS hit data. The estimated value is the proportion of a given annotation class seen in the set of hit SNPs divided by the proportion seen in the set of non-hit SNPs. Since hit SNPs make up a small fraction of all SNPs, we shall use the annotation proportion seen in unselected GWAS panel sets for this latter quantity.
Application of our method to real data would require the following steps: (1)
decide on prior odds (if absolute rather than relative Opost values
are required); (2) calculate BFassoc from GWAS data; (3) calculate
BFannot from GWAS hit database data; (4) calculate posterior odds
using the formula given above. To facilitate our method, we have made available
software for calculating BFassoc from PLINK output files, and have
created a file containing BFannot values for all the SNPs on the
Affy500 and the Illu550 panels, indicating their annotation status for the three
categories under study as well as BFannot in the range that we
recommend using. These resources are available from our website:
We tested our method on a real dataset. We compared the rank of the
BFassoc with the rank of the product of the BFassoc
and the BFannot in the WTCCC1 Crohn’s data
A higher proportion of SNPs have functional annotation in the GWAS hit datasets
compared to the GWAS panel SNPs (
The proportion of annotation is shown for three different categories (cis
eQTL in open chromatin, nsSNPs and promoter SNPs).
“***” indicates p-values
<2.8×10−5
( = 0.001/36); “**” indicates
p-values <2.8×10−4
( = 0.01/36); “*” indicates
p-values <1.4×10−3
( = 0.05/36). These thresholds were chosen to
reflect a Bonferroni correction of the 36 comparison tests implicit in
Hindorff | Johnson | Affy500+Illu550 | |
Total (hit SNPs: P<10−6) | 1219 | 1576 | 961605 |
cis eQTLs in Open Chromatin | 46 (3.8) | 39 (2.5) | 7791 (0.8) |
ns SNPs | 166 (13.6) | 181 (11.5) | 37856 (3.9) |
promoter SNPs | 97 (8) | 89 (5.6) | 30516 (3.2) |
No annotation | 1853 (87.6) | 2380 (88.3) | 908537(94.5) |
For the GWAS hit SNP datasets, the number of SNPs with p-values <10−6 that fall into each annotation categories is presented. SNPs in each annotation categories include annotated SNPs and their linkage disequlibrium proxies.
We observed that the Hindorff dataset had 13.6% of SNPs with MAF<0.1,
while the Johnson dataset had 29.8% of SNPs with MAF<0.1, a similar
figure was seen in the Affy500+Illu550 panel (27.9%). This bias may
be due to the fact that the Hindorff dataset often only contains the most
significant SNP in a region. It is likely that such SNPs will have a relatively
high MAF compared to others in the region as it is hard for SNPs with very low
MAF to attain small p-values. To test the results for robustness against the
differences in MAF distributions, the proportion of annotation was compared for
SNPs with MAF <0.1 and SNPs with MAF = >0.1 (
The proportion of annotation is shown for three different categories (cis
eQTL in open chromatin, nsSNPs and promoter SNPs). Significance levels
and error bars are defined as in
To establish whether the results from the three categories were independent, we
removed all SNPs that had a multiple annotation or were a proxy for any SNP in
another annotation category. The patterns remained consistent (
To investigate whether the results were specific to the chosen ‘null’
GWAS panel set, we compared the annotation proportions seen in the Affy500-only
and Illu550-only GWAS panels. Since different SNP selection strategies were
adopted by Affymetrix and Illumina in constructing their panels, and in
particular in the SNP tagging approach used by Illumina, splitting the GWAS
panel dataset in this way allowed us to perform a sensitivity analysis with
respect to the different SNP selection strategies and their effect on GWAS panel
composition. We found consistently lower proportions of annotation in all three
GWAS panel sets, compared to either GWAS hit sets (
We suggest the use of Bayes Factors of 0.93 for SNPs without annotation and
within the range of 3.1–4.7 for
The Bayes Factors are shown for three different categories (cis eQTL in open chromatin, nsSNPs and promoter SNPs). Panel A shows results derived using the Hindorff dataset and Panel B results from the Johnson dataset. The GWAS panel set is comprised of a union of Affymetrix 500k and Illumina 550k panels.
We use ‘LD-proxy-annotations’ (see
We performed most analysis using proxies with an r2 of
> = 0.8 and tested the effect of this cut-off by
performing analysis using proxies with an r2 of 1, and analyses with
no proxies at all. The variation in threshold did not have much of an impact on
the results (
The Bayes Factors are shown for three different categories (cis eQTL in open chromatin, nsSNPs and promoter SNPs). Panel A shows results derived using the Hindorff dataset and Panel B results from the Johnson dataset. The GWAS panel set is comprised of a union of Affymetrix 500k and Illumina 550k panels.
In our preliminary analysis we investigated
The GWAS panel set is comprised of a union of Affymetrix 500k and Illumina 550k panels.
In each direct comparison the SNPs in open chromatin had the greater Bayes
Factor. The most highly significant cis eQTL category had the greatest Bayes
Factor. The increase in stringency and selection of only
The rank of the BFassoc * BFannot was on average 10322 higher than the rank of the BFassoc for the Crohn’s hits and 205 lower for the null hits. Furthermore 21 of the Crohn’s hits moved up in rank while the average number that moved up in the null set was only 4.
Our study confirms the hypothesis that there are differences in the proportion of functional annotation between GWAS hits and the background of GWAS panel SNPs. This trend is robust to differences in GWAS panel SNP sets, different GWAS hit lists and SNP allele frequency. The patterns are also independently seen in each annotation category. This provides us with reassurance, given the problems experienced both with accurately capturing all GWAS hits and with defining a fully appropriate comparative GWAS panel set. Our study highlights three categories of functional annotation that appear to provide reliable enrichment in GWAS data that can be used to empirically estimate Bayes Factors for Bayesian analysis. Furthermore when applied to real data our technique increases the rank of SNPs that have later been shown to be hits.
In order to produce hit SNPs sets with reasonably large numbers of SNPs, our definition of a GWAS ‘hit’ includes SNPs with p-values greater than what is typically considered to be genomewide significant. We accept that this increases the proportion of false positives in our hit sets. However, our sensitivity analyses show that annotation enrichment is still noticeable in hit SNP sets with a lower p-value threshold definition. We also note that the overall effect of false positives in the set of GWAS hits will be to shrink BFannot values towards 1, so it will have a conservative effect on the use of annotation information in combination with association data.
The robustness of these results across the datasets and indeed the different ways of
defining annotations and GWAS hits is striking, particularly in relation to the
eQTLs. The eQTLs in our study and Nicolae et al’s
While the patterns of enrichment are broadly consistent, our study also reveals some
differences. The annotation proportions, and derived Bayes Factors, from the
Hindorff dataset are almost always higher than from the Johnson dataset. There is
also a difference in the ranking of the three categories, in the Hindorff dataset
It is not straightforward to arrive at an appropriate ‘null’ set of GWAS SNPs, against which the annotation properties of a hit set can be compared. For example, consider combining the results of one GWAS that used the Affymetrix GeneChip Human Mapping 500K panel with another that used the Illumina HumanHap 550K panel. These panels share about 15% of SNPs. Should the annotation information for these SNPs held in common be counted twice (summation approach) or only once (union approach)? Our null hypothesis is not that all these GWAS hits are false (we assume in fact that most are true), but rather that their location is independent of any annotation information that may be attached to them. The summation approach is appropriate if we assume that the GWAS hits in the second study are independent of the first study (e.g. unconnected diseases, no common causal genetic mechanisms), while the union approach is appropriate if the same hits are to be expected (e.g. same or very similar disease, with both studies well powered). Given that both datasets contain several GWASs on the same or similar phenotypes, and given the growing evidence for some causal effects spanning many diseases, the best situation would lie somewhere between the two approaches. In addition to this theoretical uncertainly, there is also considerable practical uncertainty in ascertaining exactly which panels were used in each study, especially in studies where more than one panel was used. Even if the panels are known, the set of SNPs remaining after QC may not be. The panel composition of each GWAS study is important because there are between-panel differences in the selection strategies for panel membership, based on features such as minor allele frequency, linkage disequilibrium and location (e.g. genic vs inter-genic), and all of these may impact on the annotation proportions. Again the consistency of results accross panels demonstrates the validity of the approach despite these problems.
We accept that it will be difficult to determine exact values for empirically derived
Bayes Factors. However, there is sufficient consistency in our study for us to
suggest the use of Bayes Factors within the range of 3.1–4.7 for
We allowed GWAS panel and GWAS hit SNPs to acquire “annotation-via-LD-proxy”, primarily because GWAS panel are designed to detect association signals via tagging. In addition to this the use of proxies increases the size of the datasets that we are working with. An alternate approach would have been to amplify the set of SNPs to include all LD proxies of all GWAS panel SNPs, and indentify “hits-by-proxy” and “nulls-by-proxy”. However, under this approach is is not clear what to do with SNPs which are simultaneously “hits-by-proxy” and “nulls-by-proxy”, a problem which is avoided by our approach.
In this study we have not differentiated GWAS hits by phenotype, both because we are
interested in general determinants of causality and because stratifying the GWAS
hits in this way decreases the power to identify differences in the distributions of
the annotation between the datasets. However, we note that using their alternative
approach of defining eQTLs, Nicolae et al
Due to advances in next generation sequencing technology
The enrichment signal found in this study for different functional annotation categories in GWAS hits is sufficiently consistent, and the size of the enrichment sufficiently large, to justify its use in Bayesian association analyses. More work is needed to define the size of the signals in other annotation categories, and to refine how rare variants identified by next generation sequencing differ from common variants identified in GWAS data.