Conceived and designed the experiments: MX. Analyzed the data: XW HD LL YZ GP. Contributed reagents/materials/analysis tools: JDR. Wrote the paper: MX.
The authors have declared that no competing interests exist.
Although great progress in genomewide association studies (GWAS) has been made, the significant SNP associations identified by GWAS account for only a few percent of the genetic variance, leading many to question where and how we can find the missing heritability. There is increasing interest in genomewide interaction analysis as a possible source of finding heritability unexplained by current GWAS. However, the existing statistics for testing interaction have low power for genomewide interaction analysis. To meet challenges raised by genomewide interactional analysis, we have developed a novel statistic for testing interaction between two loci (either linked or unlinked). The null distribution and the type I error rates of the new statistic for testing interaction are validated using simulations. Extensive power studies show that the developed statistic has much higher power to detect interaction than classical logistic regression. The results identified 44 and 211 pairs of SNPs showing significant evidence of interactions with FDR<0.001 and 0.001<FDR<0.003, respectively, which were seen in two independent studies of psoriasis. These included five interacting pairs of SNPs in genes LST1/NCR3, CXCR5/BCL9L, and GLS2, some of which were located in the target sites of miR3243p, miR433, and miR382, as well as 15 pairs of interacting SNPs that had nonsynonymous substitutions. Our results demonstrated that genomewide interaction analysis is a valuable tool for finding remaining missing heritability unexplained by the current GWAS, and the developed novel statistic is able to search significant interaction between SNPs across the genome. Real data analysis showed that the results of genomewide interaction analysis can be replicated in two independent studies.
It is expected that genomewide interaction analysis can be a possible source of finding heritability unexplained by current GWAS. However, the existing statistics for testing interaction have low power for genomewide interaction analysis. To meet challenges raised by genomewide interactional analysis, we develop a novel statistic for testing interaction between two loci (either linked or unlinked) and validate the null distribution and the type I error rates of the new statistic through simulations. By extensive power studies we show that the developed novel statistic has much higher power to detect interaction than the classical logistic regression. To provide evidence of gene–gene interactions as a possible source of the missing heritability unexplained by the current GWAS, we performed the genomewide interaction analysis of psoriasis in two independent studies. The preliminary results identified 44 and 211 pairs of SNPs showing significant evidence of interactions with FDR<0.001 and 0.001<FDR<0.003, respectively, which were common in two independent studies. These included five interacting pairs of SNPs, some of which were located in the target sites: LST1/NCR3, CXCR5/BCL9L and GLS2 of miR3243p, miR433, and miR382, and 15 pairs of interacting SNPs that had nonsynonymous substitutions.
In the past three years, about 400 genomewide association studies (GWAS) that
focused largely on individually testing the associations of single SNP with diseases
have been conducted
The previous GWAS are mainly based on the common disease, common variant hypothesis. However, in addition to single nucleotide polymorphisms (SNPs) with a minor allele frequency (MAF) greater than 1%, there are other classes of human genetic variation including: (a) rare variants that are defined as mutations with a MAF of less than 1% and (b) structural variants including copy number variants (CNVs) and copy neutral variation such as inversions and translocations. Common diseases can also be caused by multiple rare mutations, each with a low marginal genetic effect. A more realistic model is that the entire spectrum of genetic variants ranging from rare to common contributes to disease susceptibility.
Most of current GWAS have focused on SNP analysis in which each variant is tested for association individually. However, common disease often arises from the combined effect of multiple loci within a gene or interaction of multiple genes within a pathway. If we only consider the most significant SNPs, the genetic variants that jointly have significant impact on risk, but individually make only a small contribution, will be missed.
The power of the widely used statistics for detection of genegene interaction and geneenvironment interactions is low. Many interacting SNPs have not been identified.
Another way to discover the missing heritability of complex diseases is to
investigate genegene and geneenvironment interaction. Disease development is a
dynamic process of genegene and geneenvironment interactions within a complex
biological system which is organized into interacting networks
GWAS in which several hundred thousands or even a millions of SNPs are typed in thousands of individuals provide unprecedented opportunities for systematic exploration of the universe of variants and interactions in the entire genome and also raise several serious challenges for genomewide interaction analysis. The first challenge comes from the problems imposed by multiple testing. Even for investigating pairwise interaction, the total number of tests for interaction between all possible SNPs across the genome will be extremely large. Bonferronicorrected Pvalues for ensuring genomewide significance level of 0.05 will be too small to reach. The second challenge is the need for computationally simple statistics for testing interactions. The simplest way to search for interactions between two loci is to test all possible twolocus interactions. This exhaustive search demands large computations. Therefore, the computational time of each twolocus interaction test should be short. The third challenge is the power of the statistics for testing interaction. To ensure the genomewide significance, the statistics should have high power to detect interaction. Developing simple and efficient analytic methods for evaluation of the genegene interactions is critical to the success of genomewide genegene interaction analysis. Finally, the fourth challenge is replication of the finding of such interactions in independent studies.
This report will attempt to meet these challenges, at least in part. To achieve this,
we first should define a good measure of genegene interaction. Despite current
enthusiasm for investigation of genegene interactions, published results that
document these interactions in humans are limited and the essential issue of how to
define and detect genegene interactions remains unresolved. Over the last three
decades, epidemiologists have debated intensely about how to define and measure
interaction in epidemiologic studies
To demonstrate that the pseudohaplotype oddsratio interaction measurebased
statistic for detection of interaction between two loci will not cause false
positive problems, we then investigate type I error rates. To reveal the merit and
limitation of the pseudohaplotype oddsratio interaction measurebased statistic for
detection of interaction, we will compare its power for detecting interaction with
the traditional logistic regression and “fastepistasis” in PLINK
Although nearly 400 GWAS have been documented, few genomewide interaction analyses
have been performed and few findings of significant interaction reported
A casecontrol study design for detection of interaction between two loci (SNPs) where two loci can be either linked or unlinked were considered. The statistics for testing interaction are usually motivated by the measure of interaction. The widely used logistic regression methods for detection of genegene interaction are based on then oddsratio measure of interaction. Traditional additive and multiplicative odds ratio measures of interaction are defined in terms of genotypes at two loci. In this report, a novel statistic for testing interaction between two loci is based on multiplicative oddsratio measures defined in terms of pseudohaplotypes. For the convenience of presentation, we first briefly introduce the odds ratio interaction measure in terms of genotypes, alleles, and then present the odds ratio measure in terms of pseudohaplotypes.
Consider two loci: G and H. Assume that the codes
The values of oddsratio defined in terms of genotypes depends on how to code
indicator variables G and H. Suppose that alleles
Similar to the odds ratio for genotypes, we can define oddsratio in terms of
alleles. Let
Suppose that the locus G has two alleles
Similar to genotypes, we can compute a multiplicative interaction measure in
terms of log oddsratio for haplotypes as
To gain understanding the multiplicative oddsratio interaction measure, we study several special cases.
One of two loci is a marker. If we assume that the locus H is a marker and is
not associated with disease, then we have
Logistic regression interpretation.
We define two indicator variables:
G  H  
G_{1}H_{1}  1  1 
G_{1}H_{2}  1  0 
G_{2}H_{1}  0  1 
G_{2}H_{2}  0  0 
It follows from the logistic regression model in equation (1)
that
In the previous section we defined the haplotype multiplicative oddsratio
interaction measure, which can be estimated by haplotype frequencies in cases
and controls. By the delta method, we can obtain the variance of the estimator
of the haplotype oddsratio interaction measure
For the unlinked loci, we can use case only design
In the previous sections, we have shown that when the sample size is large enough
to apply large sample theory, the distribution of the statistic
Sample Size  Nominal levels  




300  0.04790  0.00995  0.00080 
400  0.04815  0.00820  0.00080 
500  0.04745  0.00930  0.00085 
600  0.04880  0.00850  0.00095 
700  0.05060  0.00920  0.00075 
800  0.05120  0.01015  0.00100 
900  0.04935  0.00805  0.00090 
1000  0.04860  0.00880  0.00090 
(A) Quantilequantile plots for the test statistic
Sample Size  Nominal levels  




300  0.04990  0.00945  0.00120 
400  0.04995  0.01030  0.00085 
500  0.05170  0.01065  0.00080 
600  0.05070  0.00980  0.00100 
700  0.04725  0.00965  0.00113 
800  0.04945  0.00895  0.00075 
900  0.04830  0.00950  0.00080 
1000  0.04920  0.00975  0.00110 
Recessive 

Locus 1\2  D_{2}D_{2}  D_{2}d_{2}  d_{2}d_{2} 
D_{1}D_{1} 



D_{1}d_{1} 



d_{1}d_{1} 



Dominant 

Locus 1\2  D_{2}D_{2}  D_{2}d_{2}  d_{2}d_{2} 
D_{1}D_{1} 



D_{1}d_{1} 



d_{1}d_{1} 



Additive 

Locus 1\2  D_{2}D_{2}  D_{2}d_{2}  d_{2}d_{2} 
D_{1}D_{1} 



D_{1}d_{1} 



d_{1}d_{1} 



To evaluate the performance of the statistic
(A) The power of the test statistic
(A) The power of the test statistic
(A) The power of the test statistic
SNP1(rs)  Gene1  SNP2(rs)  Gene2  Dataset 1  Dataset 2  Nonsynonymous mutation  Protein Residue  
PValue  FDR  PValue  FDR  
10837771  OR51B4  16973321  RYR3  1.20E07  9.00E04  2.28E07  1.34E03  rs10837771  T 
7671095  GRID2  10839659  OR2D3  2.00E08  3.97E04  2.82E08  5.29E04  rs10839659  S 
1545133  POLR1B  8064077  MYH11  6.71E07  1.97E03  5.88E07  2.06E03  rs1545133  L 
1958715  OR4L1  3844750  EFNA5  2.15E08  4.10E04  1.25E07  1.03E03  rs1958715  N 
1958716  OR4L1  3844750  EFNA5  4.48E08  5.73E04  1.22E07  1.02E03  rs1958716  V 
2227956  HSPA1L  3135392  HLADRA  3.20E10  6.02E05  7.82E10  1.05E04  rs2227956  M 
2227956  HSPA1L  3134929  NOTCH4  7.76E09  2.57E04  2.36E11  2.07E05  rs2227956  M 
1799964  LTA/TNF  2227956  HSPA1L  7.52E09  2.53E04  2.98E08  5.42E04  rs2227956  M 
1052248  LST1/NCR3  2227956  HSPA1L  8.24E07  2.17E03  1.87E08  4.40E04  rs2227956  M 
35258  PDE4D  2230793  IKBKAP  7.50E08  7.24E04  7.85E07  2.34E03  rs2230793  L 
2254524  LSS  10860869  IGF1  8.58E08  7.70E04  6.35E09  2.71E04  rs2254524  V 
327325  NRG1  3742290  UTP14C  7.47E07  2.07E03  5.90E07  2.06E03  rs3742290  A 
4253211  ERCC6  10435892  GABBR2  5.60E07  1.81E03  1.11E09  1.23E04  rs4253211  P 
940389  STON1  10745676  PLXNC1  2.20E08  4.14E04  7.02E07  2.23E03  rs940389  T 
676925  CXCR5  999890  PIP5K3  3.07E07  1.38E03  2.93E07  1.51E03  rs999890  A 
When two loci are unlinked where we do not observe the allelic association
between two loci in the population as a whole, our results also hold. We assumed
the following allele and haplotype frequencies in the population:
(A) The power of the test statistic
To evaluate its performance for detection of interaction between two loci, the
proposed test statistic
Since testing for all possible twolocus interactions across the genome in
genomewide interaction analysis requires extremely large computation, we
conducted pathwaybased genomewide interaction analysis. We assembled 501
pathways from KEGG
In total, 44 pairs of SNPs showed significant evidence of interactions with
FDR<0.001, which roughly corresponds to the Pvalue
<
Each pathway was represented by an ellipse with the number. The SNPs were represented by nodes and placed insight their located pathways. Nearby each SNP there was its RS number and the name of its located gene. The pathway and its harbored SNPs were labeled by the same color. The interacting SNPs were connected by the solid light green lines.
SNP1(rs)  Gene 1  SNP2(rs)  Gene 2  Dataset 1  Dataset 2  MicroRNA Binding Site  
PValue  FDR  PValue  FDR  
1052248  LST1/NCR3  2227956  HSPA1L  8.24E07  2.17E03  1.87E08  4.40E04  rs1052248 (miR3243p) 
1052248  LST1/NCR3  3131636  MICB  5.56E13  3.03E06  7.76E10  1.04E04  rs1052248 (miR3243p) 
676925  CXCR5/BCL9L  999890  PIP5K3  3.07E07  1.38E03  2.93E07  1.51E03  rs676925 (miR382) 
163274  ACSM1  2638315  GLS2  8.16E07  2.16E03  9.14E07  2.51E03  rs2638315 (miR433) 
2072619  MYH11  3822711  GALNT10  1.83E07  1.09E03  3.77E08  6.03E04  rs3822711 (miR3243p) 
Association of SNP  Interaction  
Pvalue  Pvalue  Dataset 1  Dataset 2  
SNP1(rs)  Dataset 1  Dataset 2  Gene 1  SNP2(rs)  Dataset 1  Dataset 2  Gene 2  PValue  FDR  PValue  FDR 
626072  0.227074  0.053394  LAMA1  6121989  0.862496  0.311346  LAMA5  1.11E15  1.41E07  5.67E07  2.03E03 
626072  0.227074  0.053394  LAMA1  4925386  0.935809  0.264641  LAMA5  1.47E13  1.73E06  9.81E07  2.59E03 
1052248  1.28E05  0.002907  LST1/NCR3  3131636  0.012961  0.0006472  MICB  5.56E13  3.03E06  7.76E10  1.04E04 
1052248  1.28E05  0.002907  LST1/NCR3  3132468  0.014008  0.0005969  MICB  8.41E13  3.70E06  8.96E10  1.12E04 
443198  0.000703  2.35E11  NOTCH4  3131636  0.012961  0.0006472  MICB  1.13E11  1.24E05  3.98E08  6.18E04 
443198  0.000703  2.35E11  NOTCH4  3132468  0.014008  0.0005969  MICB  1.19E10  3.76E05  6.55E08  7.72E04 
1799964  0.001104  0.009606  LTA/TNF  3131636  0.012961  0.0006472  MICB  1.62E10  4.35E05  1.36E09  1.34E04 
1799964  0.001104  0.009606  LTA/TNF  3132468  0.014008  0.0005969  MICB  2.94E10  5.80E05  2.51E09  1.78E04 
4766587  0.813376  0.391864  ACACB  4807055  0.530091  0.0742653  NDUFA11  3.07E10  5.90E05  6.61E07  2.17E03 
2227956  0.001216  0.000149  HSPA1L  3135392  0.581239  0.75373  HLADRA  3.20E10  6.02E05  7.82E10  1.05E04 
1060856  0.824965  0.751351  ALDH7A1  2711288  0.258241  0.0910624  PRKCE  3.57E10  6.33E05  2.56E07  1.42E03 
326346  0.979881  0.212752  CD47  11081513  0.229512  0.79174  VAPA  4.24E10  6.84E05  2.81E07  1.48E03 
1932067  0.043627  0.970441  PAFAH2  13203100  0.208767  0.598145  TIAM2  5.64E10  7.81E05  3.04E08  5.47E04 
2012359  0.369854  0.40799  PARP4  10823333  0.614239  0.882698  HK1  5.65E10  7.82E05  7.20E07  2.25E03 
9311951  0.131357  0.719726  MAGI1  11195879  0.361463  0.072601  NRG3  8.53E10  9.50E05  8.25E07  2.40E03 
3768650  0.318227  0.611621  STAM2  11993811  0.732675  0.862927  FGF20  9.20E10  9.81E05  3.25E08  5.64E04 
785915  0.290127  0.406203  GCNT1  11713331  0.752161  0.621571  PRICKLE2  1.30E09  1.15E04  2.95E07  1.51E03 
785916  0.274961  0.307539  GCNT1  11713331  0.752161  0.621571  PRICKLE2  1.37E09  1.17E04  5.51E07  2.00E03 
1202674  0.254783  0.978976  RPS6KA2  6061796  0.952187  0.932697  CDH4  2.46E09  1.52E04  1.63E08  4.14E04 
1048471  0.414631  0.566854  ST3GAL1  2830096  0.754145  0.728396  APP  2.95E09  1.65E04  9.63E07  2.57E03 
Some researchers suggest that in genomewide interaction analysis only SNPs with
large or mild marginal genetic effects should be tested for interaction. To
examine whether this strategy will miss detection of interacting SNPs, we showed
in
PValue  
Dataset1  Dataset2  
T_{IH}  PLINK  Logistic Regression  T_{IH}  PLINK  Logistic Regression  
rs1  Gene 1  rs2  Gene 2  Recessive  Additive  Dominant  Recessive  Additive  Dominant  
626072  LAMA1  6121989  LAMA5  1.11E15  1.63E07  2.61E02  7.82E08  3.14E06  5.67E07  0.001854  3.65E02  1.40E03  1.10E01 
626072  LAMA1  4925386  LAMA5  1.47E13  1.01E06  7.64E02  5.82E07  1.05E05  9.81E07  0.002709  5.07E02  1.88E03  1.27E01 
1052248  LST1/NCR3  3131636  MICB  5.56E13  2.86E09  8.61E03  1.14E09  1.11E05  7.76E10  1.76E05  4.47E01  9.67E06  9.75E06 
1052248  LST1/NCR3  3132468  MICB  8.41E13  4.28E09  1.16E02  1.72E09  1.10E05  8.96E10  2.03E05  4.75E01  9.97E06  7.73E06 
443198  NOTCH4  3131636  MICB  1.13E11  6.14E08  3.29E01  2.95E08  4.53E05  3.98E08  6.33E05  2.93E02  3.63E05  7.83E04 
443198  NOTCH4  3132468  MICB  1.19E10  2.81E07  3.21E01  1.52E07  1.57E04  6.55E08  6.32E05  2.81E02  4.65E05  1.39E03 
1799964  LTA/TNF  3131636  MICB  1.62E10  3.08E07  7.09E02  1.52E07  3.68E02  1.36E09  7.26E05  4.19E01  3.25E05  2.59E06 
1799964  LTA/TNF  3132468  MICB  2.94E10  7.26E07  9.57E02  3.80E07  3.80E02  2.51E09  8.30E05  4.51E01  4.33E05  2.76E06 
4766587  ACACB  4807055  NDUFA11  3.07E10  6.25E05  3.13E04  4.84E05  4.51E01  6.61E07  0.000855  3.27E03  7.74E04  9.85E01 
2227956  HSPA1L  3135392  HLADRA  3.20E10  2.46E06  1.14E04  1.49E06  2.63E02  7.82E10  9.60E06  7.29E05  5.17E06  2.14E01 
1060856  ALDH7A1  2711288  PRKCE  3.57E10  3.01E05  1.93E07  1.71E05  8.81E01  2.56E07  0.000292  7.50E01  2.09E04  2.26E04 
2012359  PARP4  10823333  HK1  5.65E10  3.84E05  1.26E04  2.70E05  1.37E01  7.20E07  0.000148  2.06E04  7.68E05  1.00E+00 
9311951  MAGI1  11195879  NRG3  8.53E10  6.22E05  1.00E03  3.29E05  3.40E03  8.25E07  0.001808  3.81E02  1.09E03  1.71E02 
3768650  STAM2  11993811  FGF20  9.20E10  4.53E06  8.71E05  2.52E06  1.18E01  3.25E08  0.000115  1.25E03  9.91E05  8.50E01 
785915  GCNT1  11713331  PRICKLE2  1.30E09  5.34E05  5.01E01  3.64E05  3.37E04  2.95E07  3.82E05  7.13E05  4.26E05  3.70E02 
785916  GCNT1  11713331  PRICKLE2  1.37E09  6.08E05  5.08E01  4.21E05  4.32E04  5.51E07  6.98E05  1.31E04  8.08E05  6.61E02 
1202674  RPS6KA2  6061796  CDH4  2.46E09  4.19E05  3.78E01  3.11E05  1.12E05  1.63E08  0.000516  6.48E03  3.18E04  7.53E04 
1048471  ST3GAL1  2830096  APP  2.95E09  2.43E05  4.55E04  1.84E05  1.00E01  9.63E07  0.001105  1.11E01  1.09E03  3.65E03 
1025951  GALNT13  17568302  FMO2  3.40E09  3.39E05  3.16E03  2.36E05  2.26E01  9.29E07  0.002402  3.30E03  1.94E03  1.39E02 
6954  KIAA0467  4773873  ABCC4  3.67E09  3.30E06  1.24E04  2.25E06  3.11E03  4.74E07  0.003416  5.42E01  1.59E03  1.36E02 
To further evaluate the performance of the proposed statistic
SNP1 (rs)  Gene 1  Chrom 1  Position 1  SNP2 (rs)  Gene 2  Chrom 2  Position 2  Pvalue  
Dataset 1  Dataset 2  
1052248  LST1/NCR3  6  31664560  3131636  MICB  6  31584073  5.56E013  7.76E010 
1052248  LST1/NCR3  6  31664560  3132468  MICB  6  31583465  8.41E013  8.96E010 
443198  NOTCH4  6  32298384  3131636  MICB  6  31584073  1.13E011  3.98E008 
626072  LAMA1  18  6941189  6121989  LAMA5  20  60350108  1.11E015  5.67E007 
626072  LAMA1  18  6941189  4925386  LAMA5  20  60354439  1.47E013  9.81E007 
7113099  NCAM1  11  112409545  10025210  SCD5  4  83858485  2.05E011  1.00E005 
802509  CNTNAP2  7  145603003  1462140  HPSE2  10  100355999  2.29E011  1.96E005 
832504  PLXNC1  12  93197019  13222291  KDELR2  7  6483965  2.55E012  2.19E005 
2227956  HSPA1L  6  31886251  3134929  NOTCH4  6  32300085  7.76E009  2.36E011 
3129869  HLADRA  6  32513649  3177928  HLADRA  6  32520413  3.77E008  5.65E014 
3177928  HLADRA  6  32520413  9269080  HLADRB4  6  32548947  1.96E007  3.70E011 
3129882  HLADRA  6  32517508  3177928  HLADRA  6  32520413  6.96E007  2.71E014 
2620452  CNTNAP2  7  146644926  16982241  FUT2  19  53894671  1.50E006  3.48E012 
1479838  CNTNAP2  7  146638597  16982241  FUT2  19  53894671  1.85E006  1.77E012 
3134929  NOTCH4  6  32300085  3177928  HLADRA  6  32520413  2.81E006  <1.00E17 
2856993  TAP2  6  32899381  9269080  HLADRB4  6  32548947  8.99E006  1.71E013 
6498575  MYH11  16  15795817  9364864  RPS6KA2  6  166984655  1.14E005  1.83E012 
935672  PRKCE  2  45899463  2744600  ALDH5A1  6  24641411  2.09E005  1.88E011 
Eighteen significantly interacting SNPs identified by Bonferroni correction were
listed in
The development of most diseases is a dynamic process of genegene and geneenvironment interactions within a complex biological system. We expect that genomewide interaction analysis will provide a possible source of finding missing heritability unexplained by current GWAS that test association individually. But, in practice, very few genomewide interaction analyses have been conducted and few significant interaction results have been reported. Our aim is to develop statistical methods and computational algorithms for genomewide interaction analysis which can be implemented in practice and provide evidence of genegene interaction. The purpose of this report is to address several issues to achieve this goal.
The first issue is how to define and measure interaction. Oddsratio is a widely used
measure of interaction for casecontrol design. The oddsratio based measure of
interaction between two loci is often defined as a departure from additive or
multiplicative oddsratios of both loci defined by genotypes. The genotypebased
oddsratio does not explore allelic association information between two loci
generated by interaction between them in the cases. Any statistics that are based on
genotype defined oddsratio will often have low power to detect interaction. To
overcome this limitation, we extended genotype definition of oddsratio to
haplotypes and revealed relationships between haplotypedefined oddsratio and
haplotype formulation of logistic regression. To further examine the validity of
this concept, we studied the distribution of the test statistic under the null
hypothesis of no interaction between two either linked or unlinked loci. Through
extensive simulation (assuming allelic association in the controls), we show that
the distribution of the haplotype oddsratiobased statistic is close to a central
The second issue is the power of the test statistic for genomewide interaction analysis. The genomewide interaction analysis requires testing billions of pairs of SNPs for interactions. The Pvalues for ensuring genomewide significance level should be very small. Therefore, developing statistics with high power to detect interaction is an essential issue for the success of genomewide interaction analysis. As an alternative to the logistic regression and the “fastepistasis” in PLINK, we presented a haplotype oddsratiobased statistic for detection of interaction between two loci and illustrated its power by extensive simulations. The power of the haplotype oddsratiobased statistic ended up being a function of the measure of interaction and had much higher power to detect interaction than the “fastepistasis” in PLINK and logistic regression.
The third issue is whether the interactions exist with no marginal association and how often they might occur in practice. Our data demonstrated that the majority of the significantly interacting SNPs showed no marginal association. Surprisingly, 75% of interacting SNPs with Pvalues (for testing marginal association) larger than 0.2 and 44% of interacting SNPs with Pvalues (for testing marginal association) larger than 0.5 in two studies were observed in our analysis. This strongly suggested that testing interaction for only SNPs with strong or mild marginal association will miss the majority of interactions.
The fourth issue is that of replication of the results. Genomewide interaction analysis involves testing billions of pairs of SNPs. Even if after correction of multiple tests, the false positive results might be still high. To increase confidence in interaction test results, replication of interaction findings in independent studies is often sought. To date, very few results of genomewide interaction analysis have been replicated. This begs the question whether the significant interaction can be replicated in independent studies. In this report, we show that interaction findings can be replicated in two independent studies.
The fifth issue is correction for multiple testing. Genomewide interaction analysis
often involves billions of tests, which would require an extremely small
Bonferronicorrected Pvalue to ensure a genomewide significance level of 0.05.
Replication of finding at such small Pvalues in independent studies is often
extremely difficult. However, Bonferroni correction assumes that the tests are
independent, yet many interaction tests are highly correlated. Correlations in the
interaction tests come from two levels
Although our data show that interactions can partially find the heritability of complex diseases missed by the current GWAS, they are still preliminary. Due to extremely intensive computations demanded by genomewide interaction analysis we only tested interactions of a small set of SNPs which were located in the genes of 501 assembled pathways in a PC computer. The truly whole genome interaction analysis in which we will test for interactions between all possible pairs of SNPs across the genome has not been conducted. Genegene interaction is an important, though complex concept. The statistical interactions are scale dependent. There are a number of ways to define genegene interaction. How to define genegene interaction and develop efficient statistical methods and computational algorithms for genomewide interaction analysis are still great challenges facing us. The main purpose of this report is to stimulate discussion about what are the optimal strategies for genomewide interaction analysis. We expect that in coming years, genomewide interaction analysis will be one of major tasks in searching for remaining heritability unexplained by the current GWAS approach.
A total of 44 pairs of SNPs showing significant interaction with FDR less than 0.001 in two independent studies.
(0.04 MB XLS)
A total of 211 pairs of interacted SNPs with FDR less than 0.003 in at two studies.
(0.07 MB XLS)
Pvalues for testing association of single SNP and interaction between two SNPs.
(0.08 MB XLS)
Pvalue for testing interaction calculated by T_{IH}, PLINK and logistic regression using genotype coding.
(0.10 MB XLS)
Appendices.
(0.06 MB DOC)
The dataset(s) used for the analyses described in this manuscript were obtained from
the database of Genotype and Phenotype (dbGaP) found at