Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits

doi:10.1371/journal.pgen.1006869

Fig 1.

Calibration of p-values produced by MAPIT via QQ-plots.

The QQ-plots applying MAPIT to 100 simulated null datasets assuming sample sizes (A) 1,000, (B) 1,750, and (C) 2,500. Blue dots are p-values produced by under the normal test (or z-test), while the black dots represent p-values tested using the Davies method via a mixture of chi-square distributions. The 95% confidence intervals for the null hypothesis of no association are shown in grey.

More »

Expand

Table 1.

Empirical type I error estimates of MAPIT.

Each entry represents type I error rate estimates as the proportion of p-values a under the null hypothesis based on 100 simulated continuous phenotypes for the normal test (or z-test) and the Davies method. These results are based on 100 simulated data sets using simulation model (i). Empirical size for the analyses used significance thresholds of α = 0.05, 0.01, and 0.001. Sample sizes were set to 1,000, 1,750, and 2,500. Values in the parentheses are the standard deviations of the estimates.

More »

Expand

Fig 2.

Empirical power to detect simulated causal interacting makers and estimating their marginal PVE.

Groups 1 and 2 causal markers are colored in light red and light blue, respectively. These figures are based on a broad-sense heritability level of H² = 0.6 and parameter ρ = 0.8, estimated with 100 replicates. Here, ρ = 0.8 was used to determine the portion of broad-sense heritability contributed by interaction effects. (A) shows the power of MAPIT to identify SNPs in each causal group under significance level α = 0.05. The lines represent 95% variability due to resampling error. (B) shows boxplots of the marginal PVE estimates for the group 1 and 2 causal SNPs from MAPIT for the four simulation scenarios. The true PVEs per causal SNP (0.012 for the group 1 SNPs; 0.012, 0.006, 0.0024, and 0.0012 for the Group 2 SNPs) are shown as dashed grey horizontal lines.

More »

Expand

Fig 3.

Power analysis for detecting group 1 and group 2 causal SNPs.

We compare the mapping abilities of MAPIT (solid line) to the exhaustive search procedure in PLINK (dotted line) in scenarios I (A), II (B), III (C), and IV (D), under broad-sense heritability level H² = 0.6 and ρ = 0.8. Here, ρ = 0.8 was used to determine the portion of broad-sense heritability contributed by interaction effects. Group 1 (light red) and group 2 (light blue) causal SNPs. The x-axis shows the false positive rate, while the y-axis gives the rate at which true causal variants were identified. Results are based on 100 replicates in each case.

More »

Expand

Fig 4.

Empirical power of exhaustive search procedures to detect epistatic pairs.

Here, the effectiveness of MAPIT (green) as an initial step in a pairwise detection filtration process is compared against the more conventional single-SNP testing procedure, which is carried out via GEMMA (purple). In both cases, the search for epistatic pairs occurs between the top 100 significant marginally associated SNPs are considered. We use the fully exhaustive search model in PLINK (orange) as a baseline comparison. We compare the three methods in all scenarios (x-axis), under broad-sense heritability level H² = 0.6. Here, ρ = 0.8 was used to determine the portion of broad-sense heritability contributed by interaction effects. The y-axis gives the rate at which true causal epistatic pairs were identified. Results are based on 100 replicates in each case. The lines represent 95% variability due to resampling error.

More »

Expand

Fig 5.

Comparison of epistatic filtration methods with MAPIT and GEMMA on the GEUVADIS data set.

All of these results are based on using MAPIT with genetic relatedness matrix K_cis. (A) shows a histogram of the MAPIT p-values for all variants in the GEUVADIS data set. The horizontal red line corresponds to a uniform distribution of p-values. (B) shows the number of significant pairwise interactions (y-axis) identified by MAPIT (green) and GEMMA (purple) when searching between the top {1000, 2500, 5000, 7500, 10000, 15000, 20000} marginally associated variants (x-axis). We use the number of significant pairs identified by fully exhaustive search model in PLINK as a baseline comparison (orange dotted line). This image shows the distributions of genome-wide significant epistatic pairs as found by each method. An interaction for MAPIT and GEMMA was deemed signifiant if it had a joint p-value below the threshold P = 0.05/(∑_i q_i(q_i − 1)/2), where q_i is the number of top variants located in the cis-window of gene i. In the case of PLINK, we consider two variants to be a significantly associated epistatic pair if they have a joint p-value below the threshold P = 1.09 × 10⁻¹⁰, which corresponds to the Bonferroni-correction that would be used if we examined all possible genome-wide SNP pairs across all genes in the final data set. Overall, PLINK detected 7,361 significant epistatic pairs.

More »

Expand

Fig 6.

Enrichment of eQTL and mepiQTL SNPs in GEUVADIS data set.

Shown here are the distribution of locations for significant SNPs, relative to the 5′ most gene transcription start site (TSS) and the 3′ most gene transcription end site (TES). (A) displays the marginally epistatic QTL (mepiQTL) detected by MAPIT using genetic relatedness matrix K_cis. (B) corresponds to the expression QTL (eQTL) identified by the single-SNP via GEMMA. The x-axis of each plot divides a typical cis-candidate region into a series of bins. The y-axis plots the number of SNPs in each bin that have a p-value less than a gene specific Bonferroni-corrected significance p-value threshold P = 0.05/∑_i s_i, where s_i is the number of cis-SNPs for gene i, divided by the total number of SNPs in that bin. Bars in green denote the region bounded by the TSS and TES, with gene lengths divided into 20 bins for visibility—because the gene body is thus artificially enlarged, SNP density within genes cannot be directly compared with SNP density outside of genes.

More »

Expand

Fig 7.

Select chromosome-wide scans for epistatic effects in GEUVADIS data set.

Depicted are the −log₁₀(P) transformed MAPIT p-values of quality-control-positive cis-SNPs plotted against their genomic position in chromosomes (A) 1, (B) 6, and (C) 22, respectively. Note that MAPIT was implemented with K_cis. Here, the epistatic associated genes are labeled (blue). The (red) horizontal line indicates a genome-wide significance threshold (P = 1.828 × 10⁻⁸). Note that all panels are truncated at −log₁₀(P) = 10 for consistency and presentation, although for some genes there are strongly marginally epistatic associated markers with p-values P ≈ 0.

More »

Expand