Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Testing gene-environment interactions for rare and/or common variants in sequencing association studies

  • Zihan Zhao,

    Roles Formal analysis, Methodology, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Texas Academy of Mathematics & Science, University of North Texas, Denton, TX, United States of America

  • Jianjun Zhang,

    Roles Formal analysis, Writing – review & editing

    Affiliation Department of Mathematics, University of North Texas, Denton, TX, United States of America

  • Qiuying Sha,

    Roles Visualization, Writing – review & editing

    Affiliation Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States of America

  • Han Hao

    Roles Methodology, Writing – original draft, Writing – review & editing

    Han.Hao@unt.edu

    Affiliation Department of Mathematics, University of North Texas, Denton, TX, United States of America

Abstract

The risk of many complex diseases is determined by a complex interplay of genetic and environmental factors. Advanced next generation sequencing technology makes identification of gene-environment (GE) interactions for both common and rare variants possible. However, most existing methods focus on testing the main effects of common and/or rare genetic variants. There are limited methods developed to test the effects of GE interactions for rare variants only or rare and common variants simultaneously. In this study, we develop novel approaches to test the effects of GE interactions of rare and/or common risk, and/or protective variants in sequencing association studies. We propose two approaches: 1) testing the effects of an optimally weighted combination of GE interactions for rare variants (TOW-GE); 2) testing the effects of a weighted combination of GE interactions for both rare and common variants (variable weight TOW-GE, VW-TOW-GE). Extensive simulation studies based on the Genetic Analysis Workshop 17 data show that the type I error rates of the proposed methods are well controlled. Compared to the existing interaction sequence kernel association test (ISKAT), TOW-GE is more powerful when there are GE interactions’ effects for rare risk and/or protective variants; VW-TOW-GE is more powerful when there are GE interactions’ effects for both rare and common risk and protective variants. Both TOW-GE and VW-TOW-GE are robust to the directions of effects of causal GE interactions. We demonstrate the applications of TOW-GE and VW-TOW-GE using an imputed data from the COPDGene Study.

Introduction

The etiology of many diseases is characterized by the interplay between genetic and environment factors. For example, anthracyclines are one of the most effective classes of chemotherapeutic agents currently available for cancer treatment. The therapeutic potential of anthracyclines, however, is limited because of their strong dose-dependent relation with progressive and irreversible cardiomyopathy leading to congestive heart failure. Both gene hyaluronan synthase 3 (HAS3) and gene CUGBP Elav-like family member 4 (CELF4) modify the risk of anthracycline on the development of anthracycline-related cardiomyopathy [1, 2]. A genome-wide gene environment interaction analysis indicates that gene EBF1 plays together with stress associated with cardiovascular disease. Additionally, gene EBF1 not only shows gene-by-stress interaction effect for hip circumference but also indicates gene-by-stress interaction effects for waist circumference, body mass index (BMI), fasting glucose, type II diabetes, and common carotid intimal medial thickness (CCIMT) [3].

To date, most of the successful findings in gene-environment (GE) interactions are for common genetic variants. There has been very limited success in findings for rare variants’ GE interactions. This is often attributed to study design issues, such as sample size or population heterogeneity [4]. Lack of statistical methodology on rare variants’ GE also contributes to the limitations.

Rare variants, which are usually defined as genetic variants with minor allele frequency (MAF) less than 5% (or 1%), may play an important role in studying the etiology of complex human diseases. Numerous statistical methods have been developed for testing the main effects of rare variants, such as the sequence kernel association test (SKAT) [5], the combined multivariate and collapsing (CMC) method [6], the weighted sum statistic (WSS) [7], and Testing the effect of an Optimally Weighted combination of variants (TOW) [8].

To our knowledge, limited methods have been developed for testing GE interactions in sequencing association studies. Existing methods for assessing common variants by environment interactions, such as the gene-environment interactions association test (GESAT) [9] are less powerful when naively applied to rare variants [10]. To test rare variants by environment interactions, [10] developed the interaction sequence kernel association test (ISKAT) to assess the effects of rare variants by environment interactions. As ISKAT considers the special weights Beta(MAF;1, 25), the beta distribution density function with parameters 1 and 25 evaluated at the sample MAF, which is the recommended weight for ISKAT when there is no prior information, ISKAT may lose power when the MAFs of causal variants are not in the range (0.01,0.035) [11].

In this article, to test for rare and/or common variants and environment interactions in sequencing association studies, we develop two novel methods: 1) Testing the Optimally weighted combination of GE interactions for rare variants (TOW-GE); 2) testing effects of weighted combination of GE interactions for both rare and common variants (variable weight TOW-GE, refer to this statistic as VW-TOW-GE). Both TOW-GE and VW-TOW-GE are robust to directions of effects of causal GE interactions. We evaluate the performance of the proposed methods via simulation studies and real data analysis using the imputed sequencing data from the COPDGene Study.

Methods

Consider n unrelated individuals sequenced in a testing region with m genetic variants. In the testing region, we are interested in testing the effects of p rare variants (p < m) by environment interactions on a trait, which can be a quantitative or a qualitative trait. For ease of presentation, we only consider a single environmental factor. The method can be easily extended to the case when there are multiple environmental factors. For individuals i = 1, …, n, let yi denote the trait, Xi = (xi1, …, xiq)T denote the q covariates, Gi = (gi1, …, gip)T denote genotypes for the p rare variants in a genomic region (a gene or a pathway) and Ei as the environmental factor. Let Si = (Ei gi1, …, Ei gip)T be a vector of variants by environment interaction terms for the ith individual.

We use the generalized linear model (GLM) to model the relationship between the trait values yi and covariates Xi, genotypes Gi, environmental factor Ei and GE interactions, Si: (1) where g(⋅) is a canonical link function. Two commonly used models under the generalized linear model framework are the linear model with the identity link for a continuous or quantitative trait, and the logistic regression model with the Logit link for a binary trait. α1, α2, α3, and β are defined as q × 1 coefficient vector of covariate, the coefficient of environmental factor, p × 1 coefficient vector of genotype and p × 1 coefficient vector of GE interactions for the ith individual and the trait, respectively. Let and α = (α1, α2, α3)T. Testing the association between the trait and the rare variants by environment interactions is equivalent to testing the null hypothesis H0: β = 0.

We develop a score test by treating α as nuisance parameters and then adjust both the trait value yi and Si for the covariates Xi, the genotypic score Gi, and the environmental variable Ei by applying linear regression. Denote as the residual of yi and as the residual of Si, regressed on . Then, the relationship between and can be modeled by the GLM: (2)

Testing H0: β = 0 in (1) is equivalent to testing H0: β* = 0 in (2) (Sha et al., [8]). Here, we utilize a weight selection scheme proposed by Sha et al. [8] on our new model to test the effect of a weighted combination of GE, . Following Sha et al. [12], we propose the following score test statistic under the generalized linear model:

Because GE interactions for rare variants are essentially independent, we have:

Thus, as a function of (w1, …, wp), the score test statistic S(w1, …, wp) reaches its maximum when and .

Similarly, we define the statistic to Test the effect of the Optimally Weighted combination of GE interactions (TOW-GE), , as: (3) which is equivalent to , where can be viewed as a constant when we use a permutation test to evaluate p-values.

The optimal weight is equivalent to , where is the correlation coefficient between and . From the expression of , we can see that it is proportional to and thus will put large weights to the GE interactions that have strong associations with the trait and also adjust for the direction of the association. Simultaneously, is proportional to and will put large weights to GE interactions with small variations which are common in GE interactions for rare variants.

TOW-GE focuses primarily on rare variants by environment interactions and it may lose power because of the small weights on common variants by environment interactions. Thus, to test the GE interactions’ effects of both rare and common variants, we propose the following variable weight TOW-GE denoted as VW-TOW-GE. We first divide GE interactions into two parts based on rare or common variants and then we apply TOW-GE to the two parts separately. Let where Tr and Tc denote the test statistics of TOW-GE for GE interactions’ effects of rare and common variants, respectively. λ is a tuning parameter. Denote pλ as the p-value of Tλ, and then the test statistic of VW-TOW-GE is defined as TVWTOWGE = min0≤λ≤1 pλ. In this study, we use a simple grid search method to choose the tuning parameter λ and minimize the p-value. Divide the interval [0, 1] into K subintervals of equal-length. Let λk = k/K for k = 0, 1, …, K. Then, .

The p-value of TVWTOWGE can be evaluated by permutation tests following similar permutation tests for variable weight TOW (VW-TOW) proposed by [8]. Suppose that we perform B times of permutations. In each permutation, we randomly shuffle the trait values. Let and denote the values of Tr and Tc, respectively, based on the bth permuted data, where b = 0 represents the original data. Based on and (b = 0, 1, 2, …, B), we can calculate for b = 0, 1, 2, …, B and k = 0, 1, 2, …, K, where var(Tr) and var(Tc) are estimated using and (b = 0, 1, 2, …, B). Then, we transfer to by

Let . Then, the p-value of TVWTOWGE is given by

Simulation

We compared the performance of our proposed methods with the interaction sequence kernel association test (ISKAT) [10], the modified WSS for testing the effects of GE interactions [7] and the modified CMC method for testing the effects of GE interactions [6]. In this study, the rank sum test used by WSS and the T2 test used by CMC were replaced with the score test based on residuals and . The empirical Mini-Exome genotype data provided by the Genetic Analysis Workshop 17 (GAW17) is used for simulation studies. The dataset contains genotypes of 697 unrelated individuals on 3,205 genes. Because gene ELAVL4 in GAW17 was used to simulate GE interaction’s effect on quantitative trait Q1 which follows a normal distribution, we chose gene ELAVL4 in our simulation study. Gene ELAVL4 has 10 variants, containing 8 rare variants and 2 common variants. Rare variants in the simulation are defined with MAF < 0.05.

To evaluate type I error, we generate trait values independent of GE interactions (e.g. β1 = 0 and βc = 0) by using the model: where ϵ1 follows a normal distribution with mean as 0 and variances as ; α1 = 0.015; S is GE interactions for rare variants and Sc is GE interaction for a common variant. We consider two covariates: a standard normal covariate X1 and a binary covariate X2 with P(X2 = 1) = 0.5. The environmental factor E is assumed to be continuous following a standard normal distribution.

For type I error evaluation, we consider two different cases: 1) testing the effects of GE interactions for rare variants; 2) testing the effects of GE interactions for both rare and common variants. For each case, we consider two scenarios: (a) with main effect; (b) without main effect in the model. When the main effects exist, we set the magnitudes of vector α2 as 0.3 and the sign of each coefficient is random sampled from (−1, 1). When main effects do not exist, we set α2 = 0.

For power comparisons, the phenotype is generated using similar settings to type I evaluation except for existing GE interactions’ effects. We compare the power of TOW-GE, ISKAT, WSS and CMC to test rare variant GE interactions’ effects considering two scenarios: (a) including main effects, α2 ≠ 0 for rare variants; (b) no main effects, α2 = 0 for rare variants. We vary the number of non-zero in the vector βi, the proportion of non-zero in βi that are positive, and the magnitudes of the non-zero βij. We set the magnitudes of the non-zero βij’s as |βij| = c, and increase c from 0.1 to 0.5. In each simulation scenario, p-values are estimated by 10,000 permutations and 1,000 replicated samples.

Simulation results

The empirical type I error rates are shown in Tables 1 and 2. For 10,000 replicated samples, the 95% confidence intervals for type I error rates of nominal levels as 0.05, 0.01 and 0.001 are (0.046, 0.054), (0.008, 0.012) and (0.0004, 0.0016), respectively. When there are (a) main effects, e.g. α2 ≠ 0, TOW-GE, VW-TOW-GE, ISKAT and WSS control type I error rates well and the burden test CMC tends to have very conservative type I error rates (top panel of Tables 1 and 2). When there are (b) no main effects. e.g. α2 = 0, all methods can control type I error rates well (bottom panel of Tables 1 and 2).

thumbnail
Table 1. Type 1 error rates for testing the effects of GE interactions of rare variants in the presence of main effects (top panel) and in the absence of main effects (bottom panel) (n = 2000).

https://doi.org/10.1371/journal.pone.0229217.t001

thumbnail
Table 2. Type 1 error rates for testing the effects of GE interactions for both rare and common variants in the presence of main effects (top panel) and in the absence of main effects (bottom panel) (n = 2000).

https://doi.org/10.1371/journal.pone.0229217.t002

The results for testing the effects of GE interactions of rare variants when including main effect and no main effect are given in Figs 1 and 2, respectively. In both of these two scenarios, we consider the sample size as 2000 without a GE interaction of a common variant. We do not apply VW-TOW-GE here because it is designed for existing GE interactions’ effects of both common and rare variants. The top, middle, and bottom panels in Figs 1 and 2 provide results for three cases, e.g. when there are 2, 6 and 8 non-zero βij’s, respectively. The left and right panels of Figs 1 and 2 present for two cases, e.g. 50% of the βij are positive and 100% of the βij are positive, respectively. For each plot, we vary c, the magnitudes of the non-zero βij. As shown in the four plots for the case when 50% of the βij are positive, TOW-GE is more powerful than the other three tests. For the case when 100% of the βij are positive, WSS is relatively more powerful than TOW-GE since all the GxEs have the same direction of effects. TOW-GE is more powerful than the other two tests. However, WSS is very sensitive to the directions of effects due to aggregation of GE interactions directly. Among the four tests (TOW-GE, ISKAT, WSS and CMC) in the two different cases, CMC is the least powerful test. CMC loses power as it gives GE interactions of common variants large weights, and thus GE interactions of common neutral variants will introduce large noise.

thumbnail
Fig 1. Power comparisons of the four tests (TOW-GE, ISKAT, WSS and CMC) for testing GE interaction effects for rare variants on a continuous outcome when there are main effects (n = 2000 and the significance level of α = 0.05).

https://doi.org/10.1371/journal.pone.0229217.g001

thumbnail
Fig 2. Power comparisons of the four tests (TOW-GE, ISKAT, WSS and CMC) for testing GE interaction effects of rare variants on a continuous outcome when there are no main effects (n = 2000, significance level of α = 0.05).

https://doi.org/10.1371/journal.pone.0229217.g002

Power comparisons of the five tests (TOW-GE, VW-TOW-GE, ISKAT, WSS and CMC) for testing GE interaction effects for both rare and common variants are given in Fig 3. For each scenario in Fig 3, we vary c from 0.02 to 0.1 and set 50% of the βij as positive. Simultaneously, we set the coefficient of a common variant by environment interaction as positive and the magnitudes of as twice of βij which is the coefficient of a rare variant by environment interaction. From Fig 3, we can see that VW-TOW-GE is the most powerful test. CMC is the second most powerful test as CMC puts large weights on GE interactions of common variants and gains power increment when the GE interaction of a common variant plays an important role as the causal effect. WSS is the least powerful test, which loses power because it puts very small weight on the GE interaction of the common variant.

thumbnail
Fig 3. Power comparisons of the five tests (TOW-GE, ISKAT, WSS, CMC and VW-TOW-GE) for testing GE interaction effects for both rare and common variants on a continuous outcome (n = 2000 and the significance level of α = 0.05).

Left panel: With main effect; Right panel: With no main effect.

https://doi.org/10.1371/journal.pone.0229217.g003

TOW-GE, VW-TOW-GE, and ISKAT can all be considered as quadratic statistics which have reasonable power across a wide range of alternative hypothesis. The three methods are robust to the different directions of the GE interaction effects. We perform a further assessment for the three methods. Fig 4 shows the results. When there are causal effects of GE interactions for both common and rare variants, VW-TOW-GE outperforms TOW-GE and ISKAT. TOW-GE is more powerful than ISKAT except when the magnitude of the GE interactions is less than 0.04.

thumbnail
Fig 4. Power comparisons of the three quadratic tests (TOW-SE, iSKAT, and VW-TOW-SE) for testing GE interaction effects of both rare and common variants on a continuous outcome (n = 2000, the significance of α = 0.05).

Left panel: With main effect; Right panel: Without main effect.

https://doi.org/10.1371/journal.pone.0229217.g004

Real data analysis

Chronic obstructive pulmonary disease (COPD) is one of the most common lung diseases characterized by long term poor airflow and is a major public health problem [13]. It is a complex disease which is influenced by genetic factors, environmental influences, and genotype-environment interactions. We have known that cigarette smoking is the major environmental determinant of COPD [14]. Several genes have been suggested to play a role in the presence of a gene-by-smoking interaction term. Specifically, [15] reported that the 30-repeat allele of HMOX1 was associated with COPD in presence of a gene-by-smoking (pack-years) interaction term. [14] presented that the GSTM1 gene was associated with severe chronic bronchitis in heavy smokers and an association of the TNF—308A allele with COPD was found in a Taiwanese population. [15] reported that the SFTPB Thr131Ile polymorphism was associated with COPD, but only in the presence of a gene with an environment interaction. The SNP rs2292566 in gene EPHX1 was associated with COPD only in presence of a gene-by-smoking (pack-years) interaction. [16] showed that two SNPs in the promoter region of TGFB1 (rs2241712 and rs1800469) and one SNP in exon 1 of TGFB1 (rs1982073) were significantly associated with COPD among smokers in a COPD case control study.

The COPDGene Study is a multi-center genetic and epidemiologic investigation to study COPD [17]. Participants in the COPDGene Study gave consent for the use of data collected during the study in downstream analyses. This study is sufficiently large and appropriately designed for analysis of COPD. In this study, we consider more than 5,000 non-Hispanic Whites (NHW) participants where the participants have completed a detailed protocol, including questionnaires, pre- and post-bronchodilator spirometry, high-resolution CT scanning of the chest, exercise capacity (assessed by six-minute walk distance), and blood samples for genotyping. The participants were genotyped using the Illumina OmniExpress platform. The genotype data have gone through standard quality-control procedures for genome-wide association analysis detailed at http://www.copdgene.org/sites/default/files/GWAS_QC_Methodology_20121115.pdf. We imputed the COPD genotype data using the EUR haplotypes from the 1000 Genome Project as references.

Based on the literature of COPD [18, 19], we selected 7 key quantitative COPD-related phenotypes, including FEV1 (% predicted FEV1), Emphysema (Emph), Emphysema Distribution (EmphDist), Gas Trapping (GasTrap), Airway Wall Area (Pi10), Exacerbation frequency (ExacerFreq), Six-minute walk distance (6MWD), and one qualitative phenotypes (case-control disease status denoted as COPD in following tables). 3 covariates, including BMI, Age and Sex and one environmental factor (Pack-Years) were considered in our analysis. EmphDist is the ratio of emphysema at -950 HU in the upper 1/3 of lung fields compared to the lower 1/3 of lung fields where we did a log transformation on EmphDist in the following analysis, referred to [18]. In the analysis, participants with missing data in any of these phenotypes were excluded.

To evaluate the performance of our proposed method on a real data set, we applied all of the 5 methods (TOW-GE, ISKAT, WSS, CMC, and VW-TOW-GE) to six COPD associated genes (HMOX1, GSTM1, TGFB1, TNF, SFTPB, and EPHX1) through an interaction with cigarette smoking. In the analysis, we removed the extreme rare SNPs (MAF<0.001) in any genotypic variants and missing value in any of the 7 phenotypes and 3 covariates. We considered three different scenarios: (1) main effect; (2) gene-by-smoking interaction with main effect and (3) gene-by-smoking interaction without main effect. When we considered only the main effect, we used five existing methods (TOW-GE, SKAT, WSS, CMC, and VW-TOW) which are specifically designed for testing the main effect of a gene. We adopted 104 permutations for our methods and used 0.05 as the significance level.

The results for testing association between COPD and gene HMOX1 and GSTM1 are summarized in Tables 3 and 4 respectively. The results for testing association between COPD and gene TGFB1, TNF, SFTPB, and EPHX1 are summarized in S1S4 Tables. At gene HMOX1, both TOW-GE and modified WSS verified significant GE intecation effects without main effect for two traits Emph and Pi10. ISKAT and VW-TOW-GE verified significant GE intecation effects without main effect for trait Emph. At gene GSTM1, TOW-GE, VW-TOW-GE and ISKAT verified GE interaction effect without main effect for trait EmphDist, while all other methods failed in the verification tests. At gene TGFB1, TOW-GE, VW-TOW-GE and ISKAT verified GE interaction effect without main effect for trait ExacerFreq (S1 Table). Gene TNF was only identified by the modified CMC method and the modified WSS method for gene-by-smoking interaction with main effect (S2 Table). Gene EPHX1 was only identified by the modified WSS method for gene-by-smoking interaction with main effect (S4 Table). Four genes with gene-by-smoking interaction effects (GSTM1, HMOX1, SFTPB, and TGFB1) were identified by our methods (S1 and S3 Tables, Tables 3 and 4).

thumbnail
Table 3. Summary results of association analysis for HMOX1 based on the COPD dataset.

The p-values are shown for testing the gene’s main effect (top panel), gene-by-smoking interaction with main effect (middle panel), gene-by-smoking interaction without main effect (bottom panel).

https://doi.org/10.1371/journal.pone.0229217.t003

thumbnail
Table 4. Summary results of association analysis for GSTM1 based on the COPD dataset.

The p-values are shown for testing the gene’s main effect (top panel), gene-by-smoking interaction with main effect (middle panel), gene-by-smoking interaction without main effect (bottom panel).

https://doi.org/10.1371/journal.pone.0229217.t004

Discussion

Recent evidence shows that gene-environment interactions of rare variants may play an important role in explaining the etiology of a complex disease. However, there are limited methods that can be employed to test the effects of GE interactions for rare variants. In this study, we propose two new methods for testing GE interactions for rare variants only or for both rare and common variants. We employ a generalized linear model to model the relationship between the trait and the GE interactions. Our model focuses on GE interactions by first adjusting for genetic main effects, environmental main effects, and possible covariates. Two methods are designed for different scenarios through specific weigh-selection mechanisms. TOW-GE assigns the majority of weights on rare variants by environment interactions. VW-TOW-GE balances common and rare variants by performing weight assignments separately for common variants by environment interactions and rare variants by environment interactions. Both methods achieve the best possible power with an adaptive weight selection procedure.

In the application, we have tested genetic association for 7 traits of COPD. Our proposed methods verified the most significant GE interactions, especially for gene-by-smoking interactions without main effect and performs the best compared to other methods. In simulation studies, we also demonstrated that our proposed methods perform better in different scenario: with main effect and without main effect. Our results show that the proposed methods TOW-GE or VW-TOW-GE demonstrate better power in most cases compared with competing methods.

The power of a test varies according to the number of GE interactions of rare or common variants, the effect directions of GE interactions, and the MAFs of variants. When substantial of GE interactions have opposite directions of effects, the quadratic statistics TOW-GE, VW-TOW-GE, and ISKAT are powerful. When effects of GE interactions of common variants play a primary role, CMC is more powerful than ISKAT, WSS, and has similar power to VW-TOW-GE.

In our proposed method, the optimal weights of TOW-GE are derived analytically; thus the computation cost is relatively small. On the other hand, TOW-GE is flexible and allows for prior biological information to be incorporated by using flexible weights, such as weights derived from the expression quantitative trait locus (eQTL), which may further improve the power of TOW-GE. In addition, TOW-GE allows for adjustment of covariates. The covariates could be demographic variables, environmental variables, clinical variables, and/or principle components of genotype scores. The adjustment of covariates makes TOW-GE not only able to eliminate the effect of confounders but also able to correct for possible population stratification in admixed populations. One possible advantage of TOW-GE compared to ISKAT is that TOW-GE utilizes the residuals of both the trait value and the GE interactions, which are obtained by adjusting for covariates from linear regression models, respectively, while ISKAT utilizes only the residual of the trait value.

The proposed test statistic TOW-GE does not have an asymptotic distribution and a permutation procedure is needed to estimate its p-value, which is time consuming compared to methods with asymptotic distributions. To save time when applying the proposed methods in genetic association studies, we can use the “step-up” procedure [20, 21] to determine the number of permutations. This can show evidence of association based on a small number of permutations first (e.g.1,000) and then a large number of permutations are used to test the selected potentially significant genes. Specifically, the computation time of p-value estimation of TOW-GE and VW-TOW-GE for a gene in the real data analysis was about 30 seconds using our R program on 6 Dell PowerEdge C6320 servers. Each server has two 2.4GHz Intel Xeon E5-2680 v4 fourteen-core processors and 600 MB average memory. We have uploaded the R program onto GitHub at https://github.com/Jianjun-CN/Single-GE.

Acknowledgments

A superior high-performance computing infrastructure at the University of North was used in obtaining results presented in this publication.

References

  1. 1. Wang X, Liu W, Sun C, Armenian SH, Hakonarson H, Hageman L, et al. Hyaluronan synthase 3 variant and anthracycline-related cardiomyopathy: a report from the children’s oncology group. Journal of Clinical Oncology. 2014;32(7):647–653. pmid:24470002
  2. 2. Wang X, Sun C, Quiones-Lombraa A, Singh P, Landier W, Hageman L, et al. CELF4 variant and anthracycline-related cardiomyopathy: a children’s oncology group genome-wide association study. Journal of Clinical Oncology. 2016;34(8):863–870. pmid:26811534
  3. 3. Singh A, Babyak MA, Nolan DK, Brummett BH, Jiang R, Siegler IC, et al. Gene by stress genome-wide interaction analysis and path analysis identify EBF1 as a cardiovascular and metabolic risk gene. European Journal of Human Genetics. 2015;23(6):854–862. pmid:25271088
  4. 4. Thomas D. Gene-environment-wide association studies: emerging approaches. Nature Reviews Genetics. 2010;11(4):259–272. pmid:20212493
  5. 5. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. The American Journal of Human Genetics. 2011;89(1):82–93. pmid:21737059
  6. 6. Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. The American Journal of Human Genetics. 2008;83(3):311–321. pmid:18691683
  7. 7. Madsen BE, Browning SR. A groupwise association test for rare mutations using a weighted sum statistic. PLoS genetics. 2009;5(2),e1000384. pmid:19214210
  8. 8. Sha Q, Wang X, Wang X, Zhang S. Detecting association of rare and common variants by testing an optimally weighted combination of variants. Genetic epidemiology. 2012;36(6):561–571. pmid:22714994
  9. 9. Lin X, Lee S, Christiani DC, Lin X. Test for interactions between a genetic marker set and environment in generalized linear models. Biostatistics. 2013;14(4):667–681. pmid:23462021
  10. 10. Lin X, Lee S, Wu MC, Wang C, Chen H, Li Z, et al. Test for rare variants by environment interactions in sequencing association studies. Biometrics. 2016;72(1),156–164. pmid:26229047
  11. 11. Yang X, Wang S, Zhang S, Sha Q. Detecting association of rare and common variants based on cross-validation prediction error. Genetic epidemiology. 2017;41(3):233–243. pmid:28176359
  12. 12. Sha Q, Zhang Z, Zhang S. An improved score test for genetic association studies. Genetic epidemiology. 2011;35(5):350–359. pmid:21484862
  13. 13. Murphy TF, Sethi S. Chronic obstructive pulmonary disease. Aging. 2002;19(10):761–775.
  14. 14. Sandford AJ, Silverman EK. Chronic obstructive pulmonary disease 1: Susceptibility factors for COPD the genotype environment interaction. Thorax. 2002;57(8),763–741.
  15. 15. Hersh CP, DeMeo DL, Lange C, Litonjua AA, Reilly JJ, Kwiatkowski D, et al. Attempted replication of reported chronic obstructive pulmonary disease candidate gene associations. American journal of respiratory cell and molecular biology. 2005;33(1),71–78. pmid:15817713
  16. 16. Celedon JC, Lange C, Raby BA, Litonjua AA, Palmer LJ, DeMeo DL, et al. The transforming growth factor-1 (TGFB1) gene is associated with chronic obstructive pulmonary disease (COPD). Hum Mol Genet. 2004; 13(15), 1649–1656. pmid:15175276
  17. 17. Regan EA, Hokanson JE, Murphy JR, Make B, Lynch DA, Beaty TH, et al. Genetic epidemiology of COPD (COPDGene) study design. COPD: Journal of Chronic Obstructive Pulmonary Disease. 2011;7(1):32–43.
  18. 18. Chu JH, Hersh CP, Castaldi PJ, Cho MH, Raby BA, Laird N, et al. Analyzing networks of phenotypes in complex diseases: methodology and applications in COPD. BMC systems biology. 2014;8(1):78. pmid:24964944
  19. 19. Han MK, Kazerooni EA, Lynch DA, Liu LX, Murray S, Curtis JL, et al. Chronic obstructive pulmonary disease exacerbations in the COPDGene study: associated radiologic phenotypes. Radiology. 2011;26(1):274–282.
  20. 20. Barnett I, Mukherjee R, and Lin X. The generalized higher criticism for testing SNP-set effects in genetic association studies. Journal of the American Statistical Association. 2017;112(517):64–76. pmid:28736464
  21. 21. Pan W, Kim J, Zhang Y, Shen X, Wei P. A powerful and adaptive association test for rare variants. Genetics. 2014;197(4):1081–1095. pmid:24831820