Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics
As input data we used 100 simulated instances of a random Gaussian phenotype and genotype data for 379 individuals from the EUR-1KG panel. Using the Pascal pipeline with sum gene scores and chi-squared pathway integration strategy we computed p-values for 1,077 pathways from our pathway library (results for max gene scores are similar, see S4 Fig). Panel (a) shows the p-value distributions without merging of neighbouring genes and (b) with merging of neighbouring genes (gene-fusion strategy).P-value distributions are represented by QQ-plots (upper panels) and histograms (lower panels). Results are colour-coded according to the fraction of genes in a given pathway that have a neighbouring gene in the same pathway, i.e. that are located nearby on the genome (distance <300kb). (a) P-values of pathways that contain genes in LD are strongly inflated without correction. (b) The gene fusion approach provides well-calibrated p-values independently of the number of pathway genes in LD.