eQTL mapping using allele-specific count data is computationally feasible, powerful, and provides individual-specific estimates of genetic effects

doi:10.1371/journal.pgen.1010076

Fig 1.

Overview of our pipeline and geoP method.

(a) A workflow starting with raw data on the cloud to extract gene expression information, followed by eQTL mapping using TReCASE. (b) Quantification of ASE by counting allele-specific reads. The table on the right side shows the count for each SNP and the summation (SNP total) or the total count on haplotype level (ASE count) and the latter avoids double counting. (c-e) Comparison of permutation p-values estimated by eigenMT or geoP, versus “true” values generated by 10,000 permutations, using the eQTL data of 14,566 genes from the GEUVADIS dataset [13]. (c) The number of false negatives or false positives at each permutation p-value cutoff labeled in the legend. A gene is considered as false negative (positive) at a cutoff α if its permutation p-value estimate is larger (smaller) than α, while the “true” value from 10,000 permutations is equal to or smaller (larger) than α. (d-e) A scatter plot of -log₁₀(permutation p-value) estimated by 10,000 permutations (x-axis) versus the estimates by eigenMT or geoP.

More »

Expand

Fig 2.

Compare the number of eGenes identified by different methods using the GTEx data [1] or the Geuvadis data [13].

(a)-(c) Comparison of the number of eGenes (at permutation p-value 0.01) identified by MatrixEQTL, TReC, and TReCASE as well as reported by GTEx publication [1]. Each point represents a tissue of GTEx study. The size of a point is proportional to the sample size of the corresponding tissue. Extra black circle is added to a few smallest points to enhance their visibility. The red dotted line is a reference line of y = x. (d) The percentage of additional eGenes identified by TReCASE vs. MatrixEQTL. A piece-wise linear model fit is added to show the trend. (e) Among all the eGenes identified by either TReCASE or MatrixEQTL, the percentage reported by only one method. Two fitted line were added to show the trend. (f) The number of eGenes identified from the Geuvadis dataset [13], with sub-sampling to study the power at different sample sizes.

More »

Expand

Fig 3.

Enrichment of eQTLs in functional categories using the eQTL results from 28 GTEx tissues.

In panel (a)-(c) and (e)-(f), a dot indicates point estimate, and a line indicates 95% confidence interval. (a) Enrichment evaluated using all the SNPs by torus [17] based on the eQTL results from MatrixEQTL, TReC or TReCASE. (b) Enrichment of the top eQTL per gene for the eGenes identified by both MatrixEQTL and TReCASE (permutation p-value < 0.01). The top eQTLs of these eGenes are divided into three groups, the ones reported by both methods or by one of the two methods. (c) Enrichment of the top eQTL per gene for the eGenes reported by either MatrixEQTL or TReCASE, but not both. (d) The percentage of significant eQTLs (top eQTL per eGene with permutation p-value < 0.01) in at least one functional category versus sample size in all 28 GTEx tissues. Each point is a tissue and the color coding is shown at the bottom of Fig 2. Panels (e) and (f) are analogous to panels (b) and (c), but concentrating only on enhancers in five tissues and comparing generic enhancers used in the GTEx study versus tissue-specific enhancers from EnhancerAtlas [16].

More »

Expand

Fig 4.

Dynamic eQTLs.

(a)-(c) The number of dynamic eQTLs identified (q-value < 0.1) using short model (without any additional covariate) versus long model (with 7 additional covariates, top 5 PEER factors and top 2 genotype PCs). X-axis is in log10 scale. Each point is a tissue, and the color scheme is illustrated at the bottom of Fig 2. Tissues with a large number of dynamic eQTLs (> 100 for age or CTCF and > 200 for TP53) using short model are labeled. (d) Association between the first PEER factor from GTEx study and the proportions of neutrophil in whole blood. (e) Association between neutrophil proportion in whole blood and age. (f) An example of dynamic eQTL (q<0.1 in long model) whose eQTL effect size varies with respect to age.

More »

Expand

Fig 5.

Compare TReCASE vs. RASQUAL.

(a) Compare the number of significant findings (q-value < 0.05) between TReCASE and RASQUAL for different number of feature SNPs (fSNPs) using Geuvadis data with sample size of 280. (b) The number of significant findings (p-value <0.05) after permuting SNP genotypes, which provides an empirical estimate of type I error. The results of panels (c)-(f) are from simulations with 10,000 replicates. (c) Evaluation of type I error for TReCASE and TReCASE-RL when there is smaller over-dispersion within a sample and larger over-dispersion across samples. We assume there are two heterozygous fSNPs per gene and per sample. Total read counts were simulated with negative binomial with over-dispersion 0.5. The results in (d)-(f) assume there is no over-dispersion across SNPs within an individual. (d) Type I error when the over-dispersion of negative binomial (NB) and beta-binomial (BB) are the same. (e) Effect of double counting. We assume 15% double counting and simulate the data assuming NB over-dispersion to be 0.5. (f) Power analysis when the over-dispersion of NB and BB are both 0.5.

More »

Expand