Incorporating regulatory interactions into gene-set analyses for GWAS data: A controlled analysis with the MAGMA tool
Fig 2
Larger SNV-to-gene mappings (by coverage) yield more significant genes.
(A) The number of significant genes detected for diverse SNV-to-gene mappings as a function of their coverage (that is, the number of non-redundant base pairs covered by all features–gene bodies, flanks, and regulatory elements—defining a gene, summed across all genes). Blue, solid line shows trend (loess smooth) for flank-based mappings (including Gene Body + U0D0). For each phenotype, we reported Spearman’s rank correlation coefficient (ρ) and its associated p-value (p) for a test of positive correlation based on all mappings (including gene bodies with 250kb flanks, gene bodies with 500kb flanks, and gene bodies with 1000kb flanks, which are not depicted on the graphs themselves but provided in Table B in S2 Table). (B) Gene scores of individual genes (circles), comparing between a mapping with large flanks (that is, 100kb flanks) and a mapping with small flanks (that is, 10kb flanks, which is the baseline model). Black, solid line shows the identity line. Red, dashed line shows the significance cut-off (α = 0.05). We tested (binomial test) if there were more novel genes (N; genes significant only with large flanks) than lost genes (L; genes significant only with small flanks), against the null hypothesis that both outcomes are equally likely or that losing is more likely. Counts (N:L) and the FDR-adjusted p-value (p) of each test (that is, FDR-adjusted within each phenotype across all the mappings reported in S3 Table) are reported. (A) and (B) Flanks were defined as regions extending from gene bodies; specifically, as UX (U; upstream from the transcription start-site) and DY (Y; downstream from the transcription end-site), where X and Y are flank size in kb. Phenotype abbreviations: C-Artery Disease (coronary-artery disease); Mac. Degeneration (macular degeneration).