Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics
a) We compute gene scores by aggregating SNP p-values from a GWAS meta-analysis (without the need for individual genotypes), while correcting for linkage disequilibrium (LD) structure. To this end, we use numerical and analytic solutions to compute gene p-values efficiently and accurately given LD information from a reference population (e.g. one provided by the 1000 Genomes Project). Two options are available: the max and sum of chi-squared statistics, which are based on the most significant SNP and the average association signal across the region, respectively. b) We use external databases to define gene sets for each reported pathway. We then compute pathway scores by combining the scores of genes that belong to the same pathways, i.e. gene sets. The fast gene scoring method allows us to dynamically recalculate gene scores by aggregating SNP p-values across pathway genes that are in LD and thus cannot be treated independently. This amounts to fusing the genes and computing a new score that takes the full LD structure of the corresponding locus into account. We evaluate pathway enrichment of high-scoring (possibly fused) genes using one of two parameter-free procedures (chi-squared or empirical score), avoiding any p-value thresholds inherent to standard binary enrichment tests.