Improving the coverage of credible sets in Bayesian genetic fine-mapping

doi:10.1371/journal.pcbi.1007829

Fig 1.

Investigating the error of conditional coverage estimates in Bayesian single causal variant fine-mapping.

(A) Calibration of posterior probabilities for simulations where (left) β ∼ N(0, 0.2²) (right) β = log(1.05) or β = log(1.2). Posterior probabilities from 13000 simulations were binned into 10 equally sized bins. The ‘empirical probability the CV is contained’ is the proportion of causals in each posterior probability bin, with + / −1.96 × standard error shown. (B) Distribution of sampled β values in simulations where β ∼ N(0, 0.2²) for (left) each P value bin or (right) each sample size bin in simulations where P_min < 10⁻⁸. Thick black curve is density function for N(0, 0.2²). Black dashed line shows β = log(1.05) and red dotted line shows β = log(1.2). (C) Box plots (median and IQR with mean marked by black diamond) of error in (left) threshold and (middle) claimed coverage estimates when averaged across all 5000 simulations or (right) when simulations have been binned by the minimum P value in the region. Error is defined as estimated conditional coverage—empirical conditional coverage and empirical conditional coverage is the proportion of 5000 replicate credible sets that contain the causal variant. The two simulations where β = log(1.05) that were in the (10⁻¹², 0] bin were manually removed as a box plot could not be formed.

More »

Expand

Fig 2.

Error of conditional coverage estimates for 90% credible sets including using a reference panel to approximate MAFs and SNP correlations.

Box plots (median and IQR with mean marked by black diamond) of error in conditional coverage estimates from 5000 simulated 90% credible sets. Error is calculated as estimated conditional coverage–empirical conditional coverage and empirical conditional coverage is the proportion of 5000 additional simulated 90% credible sets that contain the causal variant. The two simulations with β = log(1.05) that were in the (10⁻¹², 0] bin were manually removed as a box plot could not be formed. (A) Claimed coverage estimate (the sum of the posterior probabilities of causality for the variants in the credible set) (B) Adjusted coverage estimate using MAFs and SNP correlations from the original (1000 Genomes) data (C) Adjusted coverage estimate using UK10K data to approximate MAFs and SNP correlations (D) Graphical display of SNP correlations in 1000 Genomes data (E) Graphical display of the estimated SNP correlations using UK10K data.

More »

Expand

Fig 3.

Error of conditional coverage estimates for 90% credible sets in regions with 2 causal variants.

Error is calculated as estimated conditional coverage–empirical conditional coverage where empirical conditional coverage is the proportion of 5000 additional simulated 90% credible sets that contain at least one of the 2 causal variants and estimated conditional coverage is the claimed or adjusted coverage estimate as defined in the text. The median error and interquartile range of claimed and adjusted coverage estimates of 90% credible sets from 5000 simulated regions with 2 causal variants that are (A) in low LD (r² < 0.01) (B) in high LD (r² > 0.7). Faceted by odds ratio values at the causal variants.

More »

Expand

Fig 4.

A simple example to illustrate the results of the adjustment method.

(A) The absolute Z scores of the SNPs. (B) The PPs of the SNPs. (C) As in the fine-mapping procedure, variants are sorted into descending order of PP and summed. Starting with the SNP with the largest PP (far right) the cumulative sum (size) of the credible set is plotted as each SNP is added to the set. Red SNPs are those in the adjusted 90% credible set and blue SNPs are those that only appear in the original 90% credible set. The credible set formed of the red SNPs has an adjusted coverage estimate of 0.905 and the credible set formed of both the blue and red SNPs has an adjusted coverage estimate of 0.969.

More »

Expand

Fig 5.

Summary of adjusted coverage estimates and adjusted credible sets in T1D data set.

Top panel: The decrease in size of the credible set after adjustment. Bottom panel: The adjusted coverage estimates of 95% Bayesian credible sets for T1D-associated genomic regions. Black points represents regions where the credible set changed after the adjustment and the “-” values for the circled points represent the decrease in the number of variants from the standard to the adjusted 95% credible set. Blue points represent regions where the credible set did not change after the adjustment and grey points represent regions where the credible set did not need to be adjusted since the threshold (0.95) was contained in the 95% confidence interval of the conditional coverage estimate, or because the credible set already contained only a single variant.

More »

Expand