Leveraging Prior Information to Detect Causal Variants via Multi-Variant Regression

doi:10.1371/journal.pcbi.1003093

Figure 1.

Workflow of the simulation study.

Before carrying out these steps, a large pool of haplotypes (n = 15,000) was simulated. Given GRR and MAF of causal variants, cases and controls were simulated by randomly choosing pairs of haplotypes and calculating the risk of each individual to probabilistically assign phenotype.

More »

Expand

Table 1.

Power of different methods in the simulation analysis.

More »

Expand

Figure 2.

Distributions of three informative weights (r, phastCons and r×phastCons) for causal variants and non-causal variants on the causal and null chromosomes in the simulation study.

In each MAF range, weights were collected from 200 replicates, and weights in each replicate were scaled by dividing each by the maximal value so as to bound final weight between 0 and 1.

More »

Expand

Table 2.

NOD2 and ITPA causal variants in the exome sequencing data.

More »

Expand

Figure 3.

Causal variant detection in the exome sequencing data analysis.

(A): NOD2 data; (B): ITPA data. The two top panels are from one replicate of the simulation. For single variant test, SNP effect size was represented by −log10 of p value from logistic regression model; for Bayesian liability model, it was represented by the standardized effect estimated at each SNP. Red dots indicate two causal variants (see Table 1 for more information). Blue vertical bars show values of SNP weights (r × phastCons). The horizontal dashed line indicates effect size at the significance threshold (permutation p value = 0.01). The bottom panel shows proportion of simulations where a variant was detected (i.e., significant at permutation p = 0.01 level). Causal variants are marked in red color.

More »

Expand