The performance of a new local false discovery rate method on tests of association between coronary artery disease (CAD) and genome-wide genetic variants

The maximum entropy (ME) method is a recently-developed approach for estimating local false discovery rates (LFDR) that incorporates external information allowing assignment of a subset of tests to a category with a different prior probability of following the null hypothesis. Using this ME method, we have reanalyzed the findings from a recent large genome-wide association study of coronary artery disease (CAD), incorporating biologic annotations. Our revised LFDR estimates show many large reductions in LFDR, particularly among the genetic variants belonging to annotation categories that were known to be of particular interest for CAD. However, among SNPs with rare minor allele frequencies, the reductions in LFDR were modest in size.


Introduction
Current technologies for measuring genome-wide genetic variation easily capture millions of variants across the genome. High dimensional genotyping arrays already commonly include several million variants. Through direct sequencing or by imputation against large previouslysequenced reference panels where sequencing has been performed [1][2][3], the number of assayed genetic variants may increase substantially. Hence, when performing an association study to identify genetic variation associated with a phenotype of interest, the number of variants to be tested may easily include many millions of variants [4], most of which will be single nucleotide polymorphisms (SNPs).
Stringent genome-wide significance thresholds of 5x10 -8 , established to control the familywise error rate (FWER) at approximately 5% for genome-wide testing, have been in standard PLOS  usage in the field of human genetics for many years [5]. Recently, this threshold has been refined downwards to account for the much larger number of variants tested in sequenced or imputed data [6]. Certainly, strict application of these thresholds has led to substantially increased reproducibility of identified genetic associations [7][8][9]. However, although few positive results are now published when associations meet genomewide significance thresholds controlling FWER, power can be severely compromised by the use of these necessarily very small significance thresholds. Substantial missing heritability has been seen for many traits and diseases, yet many true loci of small effect may not be identified due to low power studies.
In order to improve power, many research groups are increasing their sample sizes through collaborations and meta-analyses (e.g. [3]). Other groups are pursuing analytic alternatives that relax the genome-wide significance threshold. In particular, strategies that control the false discovery rate (FDR) instead of the family-wise error rate (FWER) often allow testing at much more liberal significance thresholds [10,11]. The argument can be made that when performing millions of tests, that to control the probability of at least one false positive result at 5% is unnecessarily strict, and that it makes sense to control merely the proportion of false results among the set of significant results, i.e. the FDR.
Despite the potential benefits of using FDR-defined significance thresholds instead of the FWER-defined thresholds, control of type 1 errors through the use of a chosen, fixed FDR threshold may be suboptimal. In fact, the use fixed FDR thresholds incurs a bias and tends to allow a higher proportion of false positives than indicated by the selected FDR [12]. Therefore, a better strategy may be to rely on the local FDR (LFDR), which is the probability of a test result being a false positive, given the exact value of the test statistic [10] (Fig 1).
Power remains low even when FDR methods (or LFDR methods) are used, due not only to the chosen significance thresholds, but also because the effect sizes of most genetic variants tend to be small. Furthermore, power is particularly low for variants that have lower minor allele frequencies-i.e. variants that are uncommon in the population-since the standard errors associated with the estimated effects are large due to the small number of individuals carrying the uncommon variants. Additional strategies for increasing power are, therefore, of great interest. Approaches motivated along these lines include finding subsets of genetic variants that are of particular interest, and performing significance threshold adjustments that prioritize these subsets. It has been shown that judicious use of good external annotation about the genetic variants can increase the statistical power to identify associations with low prior power, and that the significance rankings can be improved [13,14]. For example, recent studies have shown that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. [15,16] An approach has been developed, using mixed linear models, to systematically leverage annotation information together with genome-wide genotype data to identify subset of SNPs that show significant heritability-enrichment [17,18].
In this paper, we focus on obtaining improved false discovery rate estimates for coronary artery disease (CAD) through the use of an LFDR method that incorporates external information about the genetic variants, leading to a posterior probability of non-association that varies with the annotations. Similar approaches for the FDR, i.e. methods that stratify or modify the FDR as a function of external information, have been shown to be effective in reducing the overall type 1 error rates [13,19,20]. Specifically, we implement a new LFDR estimation method recently developed by David Bickel's group at University of Ottawa: the ME method [21], that optimally combines LFDR estimates from a small class of test statistics (for some genetic variants) with the larger set of all tests [21]. The theoretical framework underlying the ME LFDR method has been provided in the Appendix. Here, the small class is defined by external annotation categories that showed significant enrichment of CAD heritability.
We modelled the p-values arising from the CARDIOGRAMplusC4D genome-wide association (GWA) consortium [3] to explore the performance of these new false discovery rate estimators. We selected nine functional categories where heritability of CAD is known to be enhanced to define high risk subsets of SNPs [22] and we compare LFDR results with and without the use of external annotation information to demonstrate the potential benefits.

CARDIOGRAM
The p-values from the CARDIOGRAMplusC4D Consortium (http://www.cardiogramplusc4d. org/data-downloads/) meta-analysis GWA study of coronary artery disease [3] (CAD) were extracted for our investigations here. CARDIOGRAMplusC4D included 60,801 cases and 123,504 controls from 48 studies, and tested for association at 9,455,779 variants. We summarize briefly here the methods used to calculate these p-values; detail can be found in the primary publication of CARDIOGRAMplusC4D [3]. Imputation was based on the 1000 Genomes phase 1 version 3 training set with 38 million variants of which over half are low frequency (MAF < 0.005) and one-fifth are common (MAF > 0.05) variants. After selecting variants that surpassed allele frequency (MAF > 0.005) and imputation quality control criteria in at least 29 (>60%) of the studies, 8.6 million SNPs and 836K (9%) indels were included in the meta-analysis; of these, 2.7 million (29%) were low frequency variants (0.005 < MAF < 0.05). The tests of significance arising from the meta-analysis, after application of genomic control, were used for our investigations of LFDR.

LFDR estimation with maximum entropy method
There are several well-known approaches to estimate LFDR [23][24][25][26][27][28]. In this paper, we used the recently-developed ME estimation method [21]. In this method, we assume we have several categories of SNPs where the categorization is obtained from external annotation. Each SNP may be a member in more than one category or reference set. The ME procedure first calculates the LFDR in each of the reference sets. For the SNPs in the intersection of two reference sets, there will therefore be two estimates of LFDR, and the concept of maximum entropy is then used to obtain a single estimate.
To decide whether a separate or a combined reference class should be used, the ME method bases the LFDR estimate mostly on the separate reference class if it has enough SNPs for reliable estimation and otherwise uses the combined reference class alone [21,29]. ME is so-called because it minimizes the relative entropy function over a confidence interval or likelihood interval constructed based on the separate reference class. If the interval is sufficiently narrow, the separate reference class has enough SNPs to derive reliable estimates of LFDR. In this case, the ME method chooses the separate reference class to estimate the LFDR, and the LFDR estimate is the limit of the interval that is closest to the estimate based on the combined reference class. On the other hand, if the constructed interval is so wide that it includes the estimate of the LFDR based on the combined reference class, then the separate-class estimate is considered unreliable. In that case, the ME method estimates the LFDR based solely on the combined reference class.

Construction of reference sets
SNPs were categorized into 53 overlapping functional categories based on the annotation data from Finucane et al. [15] and their polygenic contributions to heritability of CAD were estimated using mixed linear models [15,17,22]. Nine functional categories showed significant heritability enrichment (Bonferroni corrected P<0.01). These categories were used to illustrate the performance of the LFDR method. For ease of presentation, results for three categories (Hoffman enhancers, H3 lysine 9 acetylation (H3K9ac), and fetal DNAse I hypersensitivity (DHS) mark (H3K27ac)) are highlighted in the main paper and additional results are in the Supplement.

Results
The results from CARDIOGRAMplusC4D can be seen in their primary publication [3]. Here, we used the p-values obtained at 9.45 million variants to estimate LFDR. We define three different levels of "significance" for use in our explorations. Firstly, there are 1,836 SNPs that met the most stringent threshold of p-values less than 1x10 -8 (Table 1) [6], appropriate for genome-wide sequencing studies and MAFs down to 0.005. There are 2,213 SNPs with p-values less than 5x10 -8 , and 32,508 that are in the tail of the QQ plot that deviates from the null distribution. For this latter definition, we refer to these SNPs as "P deviated" in Table 1, and the significance threshold is 0.001 (i.e 3 on the -log 10 scale). The gene names (if available), p-values, parameter estimates and MAF for each of the p-deviated SNPs are provided in S1 Table. Table 1 also shows the proportion of the significant SNPs-by each definition of significance-that fall into different MAF bins. SNPs with MAF!0.05 account for 71% of all analyzed SNPs, but they account for 93.6% of the p-deviated set. To examine these data from another perspective, SNPs with a frequency less than 1% account for 2.5% of analyzed SNPs but only 0.32% of deviated SNPs. Therefore, the data indicate that there may be too few SNPs showing statistical evidence of association at small MAFs, probably due to low power.
To demonstrate the potential of the ME LFDR method, we applied it to nine annotation categories (22) known to significantly contribute to CAD heritability ( Table 2). Changes of LFDR estimates for all nine categories are shown with violin plots in Fig 2. For any of the annotations, the largest decreases in the LFDR estimates can be found among the set of

Annotation Category h 2 obs (SE) h 2 exp P-value (adjusted) (1) # of SNPs (2) KS-test D measure (a,b,c)
Enhancer_Hoffman. extend.500 (3)  (1) Adjusted p-value for enrichment, using a Bonferroni correction (2) The number of SNPs used for the adjusted p-value (3) "extend.500" implies that a 500 base pair window around the category was included with the annotation to minimize inflation of heritability from flanking regions [22] https://doi.org/10.1371/journal.pone.0185174.t002 which was the least significant LD-score enriched category in Table 2, LFDR decreased by at least 10% at only 0.66% 4785/722,377 of the SNPs when we used the ME method. However, SNPs that showed substantial decreases in LFDR were more common for some of the other annotations. For example, we found a 20% decrease in LFDR for 0.85% of Hoffman enhancer SNPs (4,530 / 533,446), and a 30% decrease in LFDR for 0.99% of H3K9ac histone modifications (3200 /322,804). Fig 3 displays the LFDR estimates versus the p-values after using the ME method, and demonstrates that the benefit associated with the ME method is strongest for the H3K9ac category, whether with the extended window, or with post-processing following [30]. In agreement with Fig 2, it can be seen that the largest LFDR estimates are found for the Fetal DHS annotation. In Fig 4, the changes in LFDR for the Enhancer Hoffman (extended 500bp) annotation are shown as a function of MAF. Although the differences are not very discernible to the eye, the distribution of changes in LFDR is more left-skewed when the MAFs are smaller. Kolmogorov-Smirnov (KS) tests were used to compare the LFDR distributions between MAF groups for all nine functional categories ( Table 2). All tests were highly statistically significant (p<10 −16 ) indicating differences between the distributions. Table 2 shows that the magnitudes of the distances between distributions in different MAF subgroups, as measured by the D-statistic of the KS tests, are quite consistent across the annotations. The general pattern is that LFDR values tend to be smaller for the SNPs in the groups with smaller MAFs, i.e. the LFDR empirical cumulative distribution (ECDF) for SNPs with MAFs in (0.005-0.01) is shifted to the right (i.e. lower values) than the ECDF for SNPs with MAFs in (0.01-0.05) or SNPs with MAF>0.05. We note that since the KS test is rank based, identical test results are obtained when using p-values or LFDR estimates. Finally, in Fig 5, we focus on SNPs with small MAF (<0.10) and with small LFDR estimates (LFDR-ME<0.10) for H3K9ac, where only 93 SNPs are selected by this filter. For comparison, in S1 and S2 Figs, similar results are shown for Enhancer Hoffman and Fetal DHS, where 67 SNPs were selected by the filter for Enhancer Hoffman, and 63 for Fetal DHS). Three-dimensional scatterplots show the relationships between LFDR, MAF and the LFDR change in these subsets of SNPs. In fact, the SNPs with the larger decreases in their LFDR estimates tend to be those with larger MAF, and furthermore, the SNPs with the smallest local false discovery rates tend to have MAF closer to the upper bound of 0.10 and to show have only small decreases in their LFDR with the new method.
Among the selected subset of 93 SNPs from the H3K9ac annotation, rs41423244 on chromosome 12 showed the greatest LFDR decrease of 14.7%, from 24.7% to 9.998%, with a raw p-value of 1.35x10 -4 . This SNP lies in the CS gene (citrate synthase), and the gene has been previously associated with psoriasis, height and celiac disease. Similarly, the SNP with the largest LFDR decrease among the 67 SNPs highlighted in S1 Fig (Enhancer Hoffman) is rs61877912, which is located on chromosome 11 in an H3K27ac mark in gene DENND5A. The naïve p-value is 8.76x10 -5 ; LFDR falls from 18.5% to 9.5% with the ME method. Although no previous GWAS associations have been reported with this SNP, the gene DENND5A has been associated with Beta2-glycoprotein plasma levels. Finally, the SNP whose LFDR was most influenced by use of the fetal DHS mark is rs75274818 on chromosome 12 (naïve p = 6.1x10 -5 ; LFDR.ME 9.9%; original LFDR 14.5%), located in SLC39A5, a zinc transporter. Again no previous GWAS associations have been reported with this SNP, but this gene has been associated with inflammatory skin disease and height.

Discussion
These data showed many associations with CAD as has been previously reported [3]. However, since power to detect associations with rare genetic variants is usually low, our goal here was to Benefits of new local false discovery rate method in GWAS of coronary artery disease investigate the potential improvements in power associated with a new LFDR method that incorporates external SNP annotation.
Although the LFDR estimates changed for many SNPs, the LFDR estimates that changed most due to the use of the ME method were not those that were particularly rare. When we restricted SNPs to those with MAF <0.1 and LFDR < 0.1, it was always the SNPs near the upper MAF bound that had the largest LFDR decreases (Fig 5). It seems that the ME method is improving the power to detect SNPs with p-values that are small; however, among SNPs that demonstrated genome-wide significance, there were few with small MAFs.
Here, we explored the effect on LFDR estimates by partitioning SNPs into reference sets using functional annotation categories pointing towards excess risk for CAD that were obtained from LD-score regression [15]. The approach that led to this categorization of the SNPs leverages linkage disequilibrium patterns and known regulatory features, and then partitions the heritability for GWAS summary statistics while accounting for linkage disequilibrium patterns. The process or strategy for determining the best reference class of SNPs is an example of what is known as the reference class problem; see [31] for references. In general, the potentially-greater relevance of smaller reference classes must be balanced against their greater variability, as Efron [32] discussed (see also Section 10.4 of [33]). The maximum entropy method is an attempt to automatically achieve that balance. The ME method can be applied to many other ways of defining reference classes. For example, a reference set could be derived from prior evidence for association in the region [13,19], or many different kinds of functional annotation. For coding variants, reference sets could be based on whether amino acids are likely to be affected by a nucleotide change [34,35]. Annotation can also be based on whether the SNPs in the set are themselves located in regulatory features or demonstrate conservation across species [36]. Here, the annotation definition depends not only on whether the SNPs themselves are annotated, but also whether the SNPs are in linkage disequilibrium with an interesting annotation, which allows enlargement of the featured reference class. However, in more generality, LFDR estimates can depend on many kinds of additional information that allow definition of classes of variants, and the ME method, in particular, is applicable when there is uncertainty about which reference class should be used.
The general concept of giving different priority to different subsets of hypotheses has been previously approached in several ways. Stratified FDR estimates can be obtained by separately calculating the FDR in different classes, and then combining the results [13,19]. The weighted FDR method assigns an externally-chosen weight to each test [37], and the prioritized subset method [38] identifies a subset of SNPs expected to show stronger significance when calculating FDR. Unlike these methods, ME is designed for estimation of LDFR. Due to the balancing built into the ME method, if the chosen subsets are not optimal, then the LFDR estimate will be obtained from the larger reference class, hence providing protection against a poor choice of annotation.
Some substantial decreases in LFDR were seen in our work-as large as a drop of 0.4 in the LFDR estimate. Inevitably, these very large changes tended to occur for SNPs where the original LFDR estimate was large, and hence these SNPs may not be of great interest. Nevertheless, the ME LFDR method has the potential to increase the level of interest for pursuing a SNP for further investigations by using external annotation in a statistically principled way, and we saw larger reductions in the LFDR. Since all the LFDR estimates calculated here are relevant for functional annotations shown to be significantly associated with CAD, SNPs associated with substantially reduced LFDR estimates may be worth further investigation. Therefore, we have provided a spreadsheet (S1 Table) for all deviated SNPs indicating the LFDR estimates for each of the nine annotation categories.
Supporting information S1 Fig. Scatter plot of the LFDR-ME estimates by minor allele frequency and the decrease in LFDR estimates using the ME method, when using the Enhancer Hoffman annotation. (TIF) S2 Fig. Scatter plot of the LFDR-ME estimates by minor allele frequency and the decrease in LFDR estimates using the ME method, when using the Fetal DHS annotation. (TIF) S1 Table. Local false discovery rate estimates using the maximum entropy method for nine annotation categories. Columns include the SNP id (legendrs), chromosome (chr), position (pos), minor allele frequency (maf), slope coefficient (beta) and p-value (p_dgc) for association with CAD from the consortium, z-squared (z_sq), and then various LFDR estimates. They are named for the set of SNPs used (LFDR.ME for the ME method, LFDR.Big for LFDR estimated from the large set of SNPs, and LFDR.Small for LFDR from the small annotated category) as well as for which annotation category was used (EH_ext for Enhancer Hoffman extend 500, H3K9_Try for H3K9ac Trynka, H3K9_Try_ext for H3K9ac Trynka extend 500, EH for Enhancer Hoffman, H3K27_ext for H3K27ac PGC2 extend 500, H3K4_Try_ext for H3K4me3 Trynka extend 500, H3K27 for H3K27ac PGC2, H3K9 for H3K9ac peaks Trynka and FDHS for Fetal DHS Trynka). Differences between overall LFDR and maximum entropy LFDR are also provided (Diff). The gene name is provided if the SNP is in a gene. (XLSX)