Reader Comments

Post a new comment on this article

Misinterpretation of eQTL data

Posted by johnhardy1 on 23 Jul 2012 at 17:43 GMT

Adaikalavan Ramasamy1,2, Daniah Trabzuni2,3, Mina Ryten2, Michael E. Weale1, John Hardy2*

1 Department of Medical & Molecular Genetics, King's College London, 8th Floor, Tower Wing, Guy's Hospital, London SE1 9RT

2 Reta Lila Weston Institute and Department of Molecular Neuroscience, UCL Institute of Neurology, Queen Square, London WC1N 3BG, UK

3 Department of Genetics, King Faisal Specialist Hospital
and Research Centre, PO Box 3354, Riyadh 11211, Saudi Arabia
* correspondence at

The article by Zou et al [1] on expression quantitive trait loci (eQTL) attempts an understanding of the genetic regulation of gene expression in respect of the influence of genetic variability for disease. While this is clearly an important goal, the report has, in our view, a serious methodological flaw.

Polymorphism(s) such as SNPs and indels within probe sequence affect the binding affinity of target mRNA since the probes are designed to a single reference genome. Thus an eQTL analyses between SNP and the expression profile is suspect when there is a documented “polymorphism-in-probe”. Furthermore, the association between any SNPs located outside the probe sequence but in high linkage disequilibrium (LD) with the polymorphism within sequence is similarly suspect.
We checked the 94 unique associations (SNP – expression probe) involving 77 SNPs and 79 probes listed in Tables 1 – 5 of the report for the polymorphism-in-probe. We summarize our findings in Table 1. Three of the probes (ILMN_1651745, ILMN_1689177, ILMN_2130441) do not have unique hybridization when re-aligned to the hg19 build according to ReMOAT tool [2]. Of the remaining 76 probes, we found a significant proportion (36 probes or 47.3%) contained common polymorphisms (SNPs or indels with frequency > 1% in Europeans) according to the latest version of the 1000 Genomes Project (March 2012: Integrated Phase I haplotype release version 3) [3]
We found four of the eQTL association presented involved the hit SNP located within the probe sequence itself. A further 17 eQTL association involved hit SNPs that were in high LD (r2 > 0.8) with the polymorphism in probe sequence; seven involved hit SNPs that were in moderate high LD (0.8 < r2 < 0.5) and 13 involved low LD (r2 < 0.50).

Additionally, we checked the European American panel of the NHLBI Exome Sequencing Project for additional novel variants and identified that ILMN_1730477 (TAS2R43) contained five SNPs. Therefore the association from this probe should also be treated as highly suspect. In summary, only half of the findings presented in the paper are clear of the polymorphism-in-probe problem in the current build of the human genome.

A clear example of false positive that has been reported here is the association between hit SNP rs1981997 and the MAPT probe ILMN_1710903. This probe contains a common 2 base pair deletion (rs67759530 also known as chr17:44102741:D, MAF=23% in Europeans) that is in very high LD, r2=0.9431 with the hit SNP. Removal of this probe from analysis subverted the interpretation of the MAPT eQTL [4]. We suspect that similar analyses would subvert the analysis of a very high proportion of the reported hits.

Despite the apparently large proportion of the eQTLs that are suspect, it can be dealt with analytically. Three general ways of dealing with this polymorphism-in-probe in the HT12 array are: a) discarding all associations from polymorphism-containing probes, b) remove associations if the hit SNP is correlated (above an arbitrarily defined r2 threshold) with the polymorphism in probe sequences or c) re-analyzing the suspect eQTLs by conditioning on the polymorphism in probe. Our own work suggests that the most satisfactory solution to this problem for HT12 array is via the conditional analysis.

Eliminating false positive results such as these important to avoid misguided research effort.

1. Zou F, Chai HS, Younkin CS, Allen M, Crook J, et al. (2012) Brain Expression Genome-Wide Association Study (eGWAS) Identifies Human Disease-Associated Variants. PLoS Genet 8: e1002707.
2. Barbosa-Morais NL, Dunning MJ, Samarajiwa SA, Darot JF, Ritchie ME, et al. (2010) A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data. Nucleic Acids Res 38: e17.
3. (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061-1073.
4. Trabzuni D, Wray S, Vandrovcova J, Ramasamy A, Walker R, et al. (2012) MAPT expression and splicing is differentially regulated by brain region: relation to genotype and implication for tauopathies. Hum Mol Genet.


No competing interests declared.

RE: Misinterpretation of eQTL data

PLOS_Genetics replied to johnhardy1 on 31 Aug 2012 at 11:40 GMT

Ramasamy and colleagues have provided their above comment, "Misinterpretation of eQTL data", in a PDF file that includes a table not shown in the original comment. It is available at the following link: http://www.plosgenetics.o...

No competing interests declared.

Caution regarding false positives and false negatives in eQTL studies

net04 replied to PLOS_Genetics on 10 Sep 2012 at 01:55 GMT

Caution regarding false positives and false negatives in eQTL studies

Fanggeng Zou1, High Seng Chai2, Curtis S. Younkin1, Mariet Allen1, Julia Crook3, V. Shane Pankratz2, Minerva M. Carrasquillo1, Steven G. Younkin1, Nilüfer Ertekin-Taner1,4,#.

1Department of Neuroscience, Mayo Clinic, Jacksonville, Florida, USA. 2Department of Biostatistics, Mayo Clinic, Rochester, Minnesota, USA. 3Department of Biostatistics, Mayo Clinic, Jacksonville, Florida, USA. 4Department of Neurology, Mayo Clinic, Jacksonville, Florida, USA.

#: Corresponding author
Contact information: Mayo Clinic Florida, Departments of Neurology and Neuroscience, 4500 San Pablo Road, Birdsall 3, Jacksonville, FL 32224.

In their commentary of our article, entitled “Brain Expression Genome-Wide Association Study (eGWAS) Identifies Human Disease-Associated Variants”[1], Ramasamy et al. raise concern regarding false positives in probe hybridization-based expression studies, arising from variants potentially residing within probes. The authors attempt to address this concern by focusing on only a subset of our results presented in Tables 1-5 and then conclude that “only half of the findings presented in the paper are clear of the polymorphism-in-probe problem”. In the Zou et al. paper[1], we have, in fact, performed a thorough investigation of all probes for presence of variants and reported the results in a section dedicated to this issue in our Supplementary Text. We determined that 8% of all probes and 17% of the top probes have polymorphisms corresponding to their sequence, similar to rates reported by other groups[2]. Additionally, Ramasamy et al. give the MAPT probe ILMN_1710903 as a “clear example of false positive”. We have also investigated this problem not only for the MAPT ILMN_1710903 probe and but also for another MAPT probe ILMN_2298727, that have significant cisSNP/transcript associations in our study. We addressed this issue by assessing the population in our eGWAS, where these cisSNP/transcript associations were identified, rather than relying solely on variant databases, which can be misguided. We already reported in our manuscript[1] that while the MAPT ILMN_1710903 cisSNP/transcript associations may be suspect, that the MAPT ILMN_2298727 associations are devoid of this artifact in our population. Although, we agree with Ramasamy et al. that investigating the presence of polymorphisms within the probe sequences is important to identify potential false positive SNP/transcript associations, their conclusions based on a small proportion of our results and omission of the thorough analysis we already conducted and presented in our manuscript is per se misleading. Herein we provide a brief overview of our published eGWAS results[1], a comparison of some of our significant findings to the literature where there is confirmation of our results by others using alternative approaches and importantly a summary of our comprehensive investigations on the potential “polymorphism-in-probe” problem that was not mentioned in the Ramasamy et al. commentary.

In our article[1], we have completed a comprehensive analysis of gene expression in 773 frozen brain samples from cerebellum (n=197) and temporal cortex (n=202) of subjects with pathologic Alzheimer’s disease (AD) and those with other brain pathologies (non-AD, cerebellar n=177, temporal cortex n=197), by measuring expression levels of 24,526 transcripts which we tested for associations against 213,528 cisSNPs residing within ±100 kb of the tested transcript. The transcript measurements were conducted using the Whole Genome DASL assay (Illumina, San Diego, CA)[3], which is specifically designed to measure RNA obtained from archival tissue such as frozen brain samples. Utilizing stringent analytic criteria requiring detection rates for the expression probes in ≥75% of tested subjects and corrections for multiple testing by false discovery rate based q values, we identified 2,980 cerebellar cisSNP/transcript level associations (2,596 unique cisSNPs) significant in both ADs and non-ADs, of which 2,087 were also significant in the temporal cortex. Amongst the cisSNP/transcript level associations we identified, there are many that have also been detected by other groups, using alternative methods. For example, CLU expression level association we detected in our eGWAS for the AD risk-associating SNP rs11136000[4] is also identified using quantitative PCR (qPCR) measurements in human brain tissue[5]. Another example is GSTM3, an antioxidant gene, with strong cisSNP/transcript associations in our eGWAS, the brain levels of which also associates with an AD risk variant in another study (r2 and D’ with SNPs in our study are 0.1-0.6 and 1, respectively), which detected gene expression levels also using qPCR[6]. GSTM3, was also amongst the list of antioxidant genes enriched for significant cisSNPs in a pathway analysis of our eGWAS findings[7]. The most significant pathway we identified in that report[7], includes GSTO2, also previously detected by others[8]. In the Zou et al. paper, we also performed a comparison of our eGWAS to published liver[9]and brain[8,10] eGWAS. Using results from our HapMap2 imputed eGWAS, all tested probes and a p value threshold of <1.0E-4, we detect 24-32% overlap with the other eGWAS, despite differences in the analytic approaches, expression and genotyping arrays. This overlap frequency is similar to or better than that seen in other eGWAS comparisons[9],[10].

While these results provide additional support for the authenticity of our findings, there is still an inherent concern regarding hybridization-based expression studies, arising from variants potentially residing within probes. In their commentary, Ramasamy et al., attempted to address this concern by first checking 79 unique probes from our Tables 1-5 (there are in fact 80 such probes) for re-alignment to the hg19 build according to ReMOAT tool[11], and then against the 1000 Genomes[12] and NHLBI Exome Sequencing projects for variants, which reside within the sequence of the probes. The authors detect lack of alignment for 3 probes and one or more potential variants within the sequence of 37 additional probes.
In our comprehensive assessment of this concern in the Zou et al. paper[1], we reported the results in a section dedicated to this issue in our Supplementary Text. We summarized the results of this comparison by including a column depicting the presence of a variant in probe in all of the Supplementary Tables where cisSNP/transcript associations are reported (Supplementary Tables S3 and S6-S26). As reported in our paper, we annotated all of the probes by comparing their positions according to NCBI Ref Seq, Build 36.3 to those of all variants within dbSNP131 and identified the list of probes which have ≥1 variants within their sequence. Among the 17,121 unique probes tested in the cerebellar eGWAS, there were 1,441 unique probes that harbor a polymorphism (8%) in their sequence. Within the top 2,980 cerebellar cisSNP/transcript associations from 746 unique probes, there are 124 unique probes with ≥1 polymorphisms in them (17%). Based on this, although there was some excess of probes with variants amongst the top probes compared to all tested probes, the majority of the probes did not have this potential problem. These rates are very similar to those reported by other groups[2]. It is likely that additional variants may be identified in the probe regions with new variant discovery efforts in the human genome, though our findings underline the importance of a thorough comparison of all probes against the significant probes to detect the accurate effect of the “variant-in-probe” problem for bias in detecting significant eQTLs.

It is also important to emphasize that while identification of a variant within a probe sequence by database searches raises concern regarding cisSNP/transcript associations detected with that probe, it is critical to check the tested population for the presence of these probe-variants. We have done this in the Zou et al. paper, for MAPT probes ILMN_1710903 and ILMN_2298727, both of which yield significant cisSNP/transcript associations in our study and also reported by others[10]. We genotyped the variants rs67759530 and rs66561280, annotated to be within the sequence of ILMN_1710903, as well as rs73314997 identified within ILMN_2298727, and found that while the variants within ILMN_1710903 are polymorphic in our eGWAS population and also in linkage disequilibrium with the associating cisSNPs, rs73314997 is essentially monomorphic. Thus, we have already stated in our article[1], the potential concern regarding MAPT ILMN_1710903, but maintain that the MAPT ILMN_2298727 associations are unlikely to suffer from this potential artifact.

In conclusion, and as reported in the Zou et al. paper, though caution needs to be exercised with respect to the potential hybridization artifacts arising from variants within probe regions, our thorough investigation of this potential problem in our paper and replication of many significant results by others utilizing alternative strategies suggest that this potential artifact is unlikely to account for the majority of the significant findings reported by our group[1] and others[2]. Depiction of all significant results arising from probes with “reported variants” in their sequence as false positive, without a thorough analysis of the population in which the association is reported, could lead to potential elimination of valid results and an undesirable false negative problem in eQTL studies. The true testing of the impact of probe variants on SNP/transcript associations, awaits either sequencing of the probe regions or the entire transcriptome in the eGWAS populations. Until then eGWAS studies should continue to be utilized as a guide to pursue potential regulatory loci that may have implications in human disease and traits[13,14].


1. Zou F, Chai HS, Younkin CS, Allen M, Crook J, et al. (2012) Brain expression genome-wide association study (eGWAS) identifies human disease-associated variants. PLoS Genetics 8: e1002707.
2. Doss S, Schadt EE, Drake TA, Lusis AJ (2005) Cis-acting expression quantitative trait loci in mice. Genome Res 15: 681-691.
3. Waddell N, Cocciardi S, Johnson J, Healey S, Marsh A, et al. (2010) Gene expression profiling of formalin-fixed, paraffin-embedded familial breast tumours using the whole genome-DASL assay. J Pathol 221: 452-461.
4. Allen M, Zou F, Chai HS, Younkin CS, Crook J, et al. (2012) Novel late-onset Alzheimer disease loci variants associate with brain gene expression. Neurology 79: 221-228.
5. Ling IF, Bhongsatiern J, Simpson JF, Fardo DW, Estus S (2012) Genetics of clusterin isoform expression and Alzheimer's disease risk. PLoS One 7: e33923.
6. Maes OC, Schipper HM, Chong G, Chertkow HM, Wang E (2010) A GSTM3 polymorphism associated with an etiopathogenetic mechanism in Alzheimer disease. Neurobiology of aging 31: 34-45.
7. Allen M, Zou F, Chai HS, Younkin CS, Miles R, et al. (2012) Glutathione S-transferase omega genes in Alzheimer and Parkinson disease risk, age-at-diagnosis and brain gene expression: an association study with mechanistic implications. Molecular neurodegeneration 7: 13.
8. Webster JA, Gibbs JR, Clarke J, Ray M, Zhang W, et al. (2009) Genetic control of human brain transcript expression in Alzheimer disease. Am J Hum Genet 84: 445-458.
9. Schadt EE, Molony C, Chudin E, Hao K, Yang X, et al. (2008) Mapping the genetic architecture of gene expression in human liver. PLoS Biol 6: e107.
10. Myers AJ, Gibbs JR, Webster JA, Rohrer K, Zhao A, et al. (2007) A survey of genetic human cortical gene expression. Nat Genet 39: 1494-1499.
11. Barbosa-Morais NL, Dunning MJ, Samarajiwa SA, Darot JF, Ritchie ME, et al. (2010) A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data. Nucleic acids research 38: e17.
12. (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061-1073.
13. Nica AC, Dermitzakis ET (2008) Using gene expression to investigate the genetic basis of complex disorders. Hum Mol Genet 17: R129-134.
14. Ertekin-Taner N (2011) Gene expression endophenotypes: a novel approach for gene discovery in Alzheimer's disease. Mol Neurodegener 6: 31.

No competing interests declared.