Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Kernel-based gene–environment interaction tests for rare variants with multiple quantitative phenotypes

  • Xiaoqin Jin,

    Roles Conceptualization, Formal analysis, Methodology, Software, Supervision, Validation, Writing – original draft

    Affiliation State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, Shaanxi, China

  • Gang Shi

    Roles Conceptualization, Funding acquisition, Methodology, Supervision, Writing – review & editing

    gshi@xidian.edu.cn

    Affiliation State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, Shaanxi, China

Abstract

Previous studies have suggested that gene–environment interactions (GEIs) between a common variant and an environmental factor can influence multiple correlated phenotypes simultaneously, that is, GEI pleiotropy, and that analyzing multiple phenotypes jointly is more powerful than analyzing phenotypes separately by using single-phenotype GEI tests. Methods to test the GEI for rare variants with multiple phenotypes are, however, lacking. In our work, we model the correlation among the GEI effects of a variant on multiple quantitative phenotypes through four kernels and propose four multiphenotype GEI tests for rare variants, which are a test with a homogeneous kernel (Hom-GEI), a test with a heterogeneous kernel (Het-GEI), a test with a projection phenotype kernel (PPK-GEI) and a test with a linear phenotype kernel (LPK-GEI). Through numerical simulations, we show that correlation among phenotypes can enhance the statistical power except for LPK-GEI, which simply combines statistics from single-phenotype GEI tests and ignores the phenotypic correlations. Among almost all considered scenarios, Het-GEI and PPK-GEI are more powerful than Hom-GEI and LPK-GEI. We apply Het-GEI and PPK-GEI in the genome-wide GEI analysis of systolic blood pressure (SBP) and diastolic blood pressure (DBP) in the UK Biobank. We analyze 18,101 genes and find that LEUTX is associated with SBP and DBP (p = 2.20×10−6) through its interaction with hemoglobin. The single-phenotype GEI test and our multiphenotype GEI tests Het-GEI and PPK-GEI are also used to evaluate the gene–hemoglobin interactions for 22 genes that were previously reported to be associated with SBP or DBP in a meta-analysis of genetic main effects. MYO1C shows nominal significance (p < 0.05) by the Het-GEI test. NOS3 shows nominal significance in DBP and MYO1C in both SBP and DBP by the single-phenotype GEI test.

Introduction

Genome-wide association studies (GWASs) have identified numerous common variants associated with common diseases or phenotypes [1]. Nevertheless, a small portion of the heritabilities can be explained by the discovered common variants [2, 3]. Sequencing studies showed that some of the “missing heritability” was attributable to rare variants [4, 5]. Complex diseases are usually influenced by genetic factors, environmental factors and the interplay between them. Wang et al. showed that the interactions between SMC5 variants and alcohol consumption are associated with fasting plasma lipid levels [6]. Yang et al. demonstrated that the interactions between PDE3B variants and smoking are associated with pulmonary function [7]. Johansson et al. revealed that the interactions between NFE2L2 variants and second-hand smoke are associated with pediatric asthma risk [8]. For a long time, gene–environment interactions (GEIs) have been expected to explain some of the “missing heritability” and shed light on the genetic etiology of complex diseases [9].

Existing studies suggest that the interaction between a common variant and an environmental factor may be associated with multiple correlated phenotypes, which is called GEI pleiotropy [10]. Kilpeläinen et al. identified four loci in or near CLASP1, LHX1, SNTA1 and CNTNAP2 that are associated with three blood lipid levels: low density lipoprotein, high density lipoprotein and triglycerides through their interactions with physical activity [11]. Novel gene-sleep interactions were also identified for known lipid loci, including LPL and PCSK9 [12]. To date, all the reported GEI pleiotropies are with common variants. From a methodological perspective, Majumdar et al. showed that statistical power to detect GEI effects can be improved by analyzing multiple phenotypes jointly [10]. However, multiphenotype methods for testing GEIs with rare variants are lacking.

To the best of our knowledge, there is only one method currently available for testing GEIs with rare variants and multiple phenotypes [13]. The method consists of three steps: remove correlation among multiple phenotypes by using principal component analysis or other linear transformations; obtain p value for each transformed phenotype by testing the effects of an optimally weighted combination of GEIs for rare variants (TOW-GE) [14]; employ Fisher’s combination test (FCT) to combine the p values of multiple phenotypes. We denote the method as TOWGE-FCT in this paper. It can be expected that the degree of freedom of TOWGE-FCT would become larger with the increasing number of phenotypes, which might limit statistical power of the test.

In this work, we model the correlations among the GEI effects of a variant on multiple phenotypes by assuming four different kernel matrices, similar to those for multiphenotype tests of genetic main effects [15]. We extend the single-phenotype GEI test [16] and propose four multiphenotype GEI tests for rare variants, which are the test with homogeneous kernel (Hom-GEI), the test with heterogeneous kernel (Het-GEI), the test with projection phenotype kernel (PPK-GEI) and the test with linear phenotype kernel (LPK-GEI). We conduct simulation studies to examine the empirical distributions of the four test statistics under the null hypothesis and compare their statistical power under different scenarios. In the analysis of systolic blood pressure (SBP) and diastolic blood pressure (DBP) in the UK Biobank, we chose hemoglobin (Hb) as the environmental variable, which is known to be associated with both SBP and DBP [17, 18]. With the whole-exome sequencing data in 200,643 samples, we applied Het-GEI and PPK-GEI in the genome-wide analyses of gene-Hb interactions. We also carried out single-phenotype and multiphenotype GEI tests to evaluate the gene-Hb interactions for 22 genes that were previously reported to be associated with SBP or DBP in a meta-analysis of main genetic effects [19].

Methods

Single-phenotype GEI test

Assume that n unrelated individuals are sequenced in a gene or region with m rare variants and K quantitative phenotypes are measured. For the k-th phenotype, yk = (y1k, y1k, ⋯, ynk)T denotes an n × 1 phenotype vector, and X = (X1, X2, ⋯, Xq+1) is an n × (q + 1) matrix comprised of intercept and covariate vectors with Xt = (X1t, X2t, ⋯, Xnt)T, t = 1, 2, …, q+1. The first vector X1 represents the intercept vector with elements Xi1 = 1 and i = 1, 2, ⋯, n. The other q vectors are the covariate vectors. Let G = (G1, G1, ⋯, Gm) be an n × m genotype matrix, in which Gj = (G1j, G2j, ⋯, Gnj)T, j = 1, 2, …, m, and Gij is the number of minor alleles. E = diag{Ei} denotes an n × n diagonal matrix of environmental measurements, and Ei is centralized and included in X as a covariate for adjusting the environmental effect. Following the single-phenotype GEI test for rare variants in rareGE [16], we consider the linear mixed model as follows: (1) where αk = (αk1, αk2, ⋯, αk(q+1))T is a (q+1)×1 vector of covariate effects for the k-th phenotype. W = diag{wj} is an m × m weight matrix for the m variants. The weight of the j-th variants is wj = Beta(MAFj, 1, 25) [20], where MAFj is the minor allele frequency (MAF) of the j-th variants. In addition, βk = (β1k, β2k, ⋯, βmk)T is an m × 1 vector consisting of genetic main effects for the k-th phenotype, and γk = (γ1k, γ2k, ⋯, γmk)T is an m×1 vector of the interaction effects. Here, the main genetic effects βk are assumed to be fixed and the interaction effects γk to be random, γk ~ MVN(0, σ2Im). In addition, εk = (ε1k, ε2k, ⋯, εnk)T denotes an n×1 error vector, and . The null hypothesis for testing the GEI interactions is H0: σ2 = 0. The model under the null hypothesis is (2)

Here, αk, βk, and can be estimated by linear regression, and the estimated mean and variance-covariance matrix of yk are where , with Z = (X, GW). The score statistic for testing the GEI effects is (3) which is mathematically equivalent to (4) Here, is the score statistic for the j-th variant.

Under H0, follows a mixture of chi-square distributions with 1 degree of freedom, and λj are nonzero eigenvalues of the regional relationship matrix (5)

The p-value can be computed by using Kuonen’s saddlepoint approximation method [16, 21]. In the same spirit, we extend the single-phenotype GEI test for multiple phenotypes.

Kernel-based multiphenotype GEI tests

Denote y = (y1, y2, ⋯, yK) as the n × K matrix of K phenotypes and A = (α1, α2, ⋯, αK) as the (q+1) × K matrix of covariate effects. Let B = (β1, β2, ⋯, βK) be the m × K matrix of genetic main effects and Γ = (γ1, γ2, ⋯, γK) be the m × K matrix of GEI effects. In addition, ε = (ε1, ε2, ⋯, εK) is the n × K error matrix. In light of the correlation among phenotypes, we assume ε = (ε1, ε2, ⋯, εK) ~ MVN(0, Σ), i = 1, 2, ⋯, n. Then, the mixed model for multiple phenotypes can be formulated in a matrix form as follows: (6)

Stack columns of the phenotype matrix y into a vector and columns of the error matrix ε into . We have vec(ε)~MVN(0, ΣIn), where ⊗ is the Kronecker product [22]. We rewrite model (6) in vector form as (7)

Assume vec(Γ)~MVN(0, σ2ΣPIm), where ΣP is a K × K kernel in the phenotype space and models the correlation among the GEI effects of a variant on multiple phenotypes. As a result, vec(y)~MVN(vec(μ), H), where μ = XA + GWB and H = σ2(ΣPEGWWGT E) + ΣIn. The null hypothesis for testing the GEI effects with multiple phenotypes is H0: σ2 = 0, and the score statistic is (8) where and are the estimated mean and variance-covariance matrix, respectively.

The score statistic Q asymptotically follows a mixture of 1-freedom chi-square distributions and λj are nonzero eigenvalues of (9) where P = InZ(ZTZ)−1ZT and Z is the same as in the single-phenotype GEI test. The corresponding p-values can be computed via Kuonen’s saddlepoint method [21].

As can be seen in (8), our proposed tests depend on the kernel matrix ΣP. Similar to [15], we use four types of kernel matrices to model the correlation among the GEI effects on multiple phenotypes.

Homogeneous kernel.

Assume that the GEI effects of a variant on multiple different phenotypes are homogeneous, implying that γj1 = γj2 = ⋯ = γjK. The kernel is constructed as where 1K = (1, 1, ⋯, 1)T is a K × 1 vector. ΣHom indicates the GEI effects of a variant on multiple phenotypes to be the same.

Heterogeneous kernel.

Assuming that the GEI effect sizes of a variant on multiple phenotypes are heterogeneous, the kernel is where IK is a K × K identity matrix. Here, ΣHet implies that the GEI effects of a variant on multiple phenotypes are independent.

Projection phenotype kernel.

Assume that the correlation among the GEI effects of a variant on multiple phenotypes can be depicted by the correlation among the phenotypes. That is, where is the estimated variance-covariance matrix of the phenotypes.

Linear phenotype kernel.

Assume that the GEI effects of a variant on multiple phenotypes equal the squared correlation among the phenotypes. That is,

Similar to the proof in [22], the test score statistic (8) is (10) which can be rewritten as (11)

Therefore, the LPK-GEI test simply combines statistics of single-phenotype GEI tests across multiple phenotypes.

Based on different choices of the kernel matrix ΣP, we propose four multiphenotype GEI tests, which are named Hom-GEI, Het-GEI, PPK-GEI and LPK-GEI.

Results

Numerical simulations

To evaluate the null distributions and statistical power of the four proposed tests, we carried out extensive simulation studies. Using the calibrated coalescent model implemented in COSI [23], we generated 10,000 haplotypes in a 200 kb genomic region. Parameters in the coalescent model were used to mimic the linkage disequilibrium pattern, local recombination rate and demographic history for the population of European ancestry. We randomly paired these haplotypes to form diploid genotype data of 10,000 individuals and randomly selected 5000 out of the 10,000 individuals. A subregion length of 3 kb was randomly selected from the 200 kb region to obtain the genotype data of the 5000 samples for each replicate, and 1000 replicates of genotype data were generated. Variants with MAF ≤ 0.01 were considered to be rare and used for simulations.

To evaluate null distributions of our proposed tests, four phenotypes of the 5000 unrelated individuals under the null hypothesis were generated. For the sake of simplicity, phenotypes shared the same covariate sets and were generated as follows: (12) Where yi1, yi2, yi3, yi4 are the four phenotypes for the i-th individual (i = 1, 2, …, n). For the i-th individual, sexi is a binary covariate following a Bernoulli distribution with probability 0.5, namely, sexi ~ Bernoulli(0.5). Both agei and bmii are continuous covariates: agei ~ N(50, 25), bmii ~ N(50, 25). Gij (j = 1, 2, ⋯, m) are the coded genotypes of the simulated causal variants for individual i. Here, we assumed a proportion of causal variants θ = 0.1, 0.2, 0.3. In addition, wj (j = 1, 2, ⋯, m) is the weight of variant j; βj1, βj2, βj3, and βj4 are the main genetic effects for phenotype 1, phenotype 2, phenotype 3 and phenotype 4, respectively, with βj1 = 0.1, βj2 = 0.2, βj3 = 0.1, and βj4 = 0.2; and l1, l2, l3, and l4 are indicator variables, with lk = 1 when phenotype k is associated with genetic variants and lk = 0 otherwise. Since not all of the phenotypes may be associated with the rare variants [22, 24], we considered scenarios under which pleiotropy exists or does not exist. Specifically, we assumed that the first t phenotypes were associated with the rare variants, namely, l1 = ⋯ = lt = 1 and lt+1 = ⋯ = l4-t = 0, t = 1,2,3,4. εik is a random error for the i-th individual and the k-th phenotype, εi = (εi1, εi2, εi3, εi4)T ~ MVN(0, Σ), where and ρ represents the correlation among different phenotypes. Three levels of correlation strength were considered: weak correlation with ρ = 0.25, moderate correlation with ρ = 0.5 and strong correlation with ρ = 0.75. For each simulation setup, 20 replicates of phenotypes and covariates were simulated based on one genotype dataset; thus, a total of 20,000 replicates of phenotypes and covariates were simulated.

To evaluate the statistical power of our proposed four multiphenotype GEI tests, we simulated the four correlated phenotypes for 5000 independent individuals under the alternative hypothesis. For each of the genotype datasets, one phenotype and covariates set was simulated according to the following model: (13) where sexi, agei, bmii, Gij, wj, βjk (k = 1,2,3,4), lk (k = 1,2,3,4) and εi = (εi1, εi2, εi3, εi4)T are the same as described in model (12). The body mass index (BMI) was centered and used as the environmental variate Ei. Here, γjk is the gene-BMI interaction effect of the j-th causal rare variant on the k-th phenotype, with γjk ~ N(0, 0.05)2. Since the interaction effects of a variant on each phenotype were simulated independently, the gene–BMI interaction effects of a variant on multiple phenotypes are heterogeneous.

In all simulations and the analyses of the simulated data, variant weights were the density function of beta distribution with degrees of freedom of 1 and 25 evaluated at the MAF of rare variants [20] as described in the single-phenotype GEI test. We considered the gene–BMI interaction to be significant if its p-value was less than 2.5×10−6, corresponding to a correction for multiple testing in genome-wide studies of 20,000 genes. Empirical power was the portion of significant results in 1000 replicates.

Null distributions.

We examined null distributions of test statistics for Hom-GEI, Het-GEI, PPK-GEI and LPK-GEI with causal rare variant proportion θ = 0.2 and all phenotypes associated with genetic variants. We first estimated means and residuals by performing phenotype-specific regression analyses. The variance-covariance matrix Σ was estimated by residuals from all phenotypes. Test statistics of the four tests were computed as in (8) with corresponding kernels for Hom-GEI, Het-GEI, PPK-GEI and LPK-GEI. Using Kuonen’s saddlepoint approximation method [21], the p-values of the four test statistics were computed. Finally, we compared distributions of empirical p-values with the expected uniform distribution between 0 and 1.

The quantile-quantile (Q–Q) plots of the four multiphenotype statistics under weak (ρ = 0.25), moderate (ρ = 0.5) and strong (ρ = 0.75) correlations among phenotypes are shown in Figs 13, respectively. The empirical distributions of the four test statistics are aligned with their theoretical distributions, as expected.

thumbnail
Fig 1. Q–Q plots of the test statistics under the null hypothesis with weak among-phenotype correlation ρ = 0.25.

The horizontal and vertical axes represent the negative log10 of the expected p-values and the negative log10 of the observed p-values, respectively. A: Hom-GEI; B: Het-GEI; C: PPK-GEI; D: LPK-GEI.

https://doi.org/10.1371/journal.pone.0275929.g001

thumbnail
Fig 2. Q–Q plots of the test statistics under the null hypothesis with moderate among-phenotype correlation ρ = 0.5.

The horizontal and vertical axes represent the negative log10 of the expected p-values and the negative log10 of the observed p-values, respectively. A: Hom-GEI; B: Het-GEI; C: PPK-GEI; D: LPK-GEI.

https://doi.org/10.1371/journal.pone.0275929.g002

thumbnail
Fig 3. Q–Q plots of the null distributions under the null hypothesis with strong among-phenotype correlation ρ = 0.75.

The horizontal and vertical axes represent the negative log10 of the expected p-values and the negative log10 of the observed p-values, respectively. A: Hom-GEI; B: Het-GEI; C: PPK-GEI; D: LPK-GEI.

https://doi.org/10.1371/journal.pone.0275929.g003

Statistical power.

The statistical power of Hom-GEI, Het-GEI, PPK-GEI and LPK-GEI under weak, moderate and strong correlations among phenotypes are shown in Figs 46, respectively. In each figure, the power with three different proportions of causal variants and different numbers of associated phenotypes are presented. All four tests show improved power as the proportion of causal variants increases. This is because an increased proportion of causal variants leads to larger interaction effects under the test. Taking Fig 5D as an example, for the causal rare variant proportion θ = 0.1, Hom-GEI, Het-GEI, PPK-GEI and LPK-GEI have powers of 0.067, 0.424, 0.431 and 0.199, respectively. For θ = 0.2, the corresponding powers are 0.145, 0.72, 0.725 and 0.428, respectively. For θ = 0.3, the power increases further to 0.278, 0.867, 0.869 and 0.608.

thumbnail
Fig 4. Statistical power of Hom-GEI, Het-GEI, PPK-GEI and LPK-GEI under weak among-phenotype correlation ρ = 0.25.

The horizontal and vertical axes represent the proportion of causal rare variants and the statistical power, respectively. A: Power when only the first phenotype is associated with interactions; B: Power when the first two phenotypes are associated with interactions; C: Power when the first three phenotypes are associated with interactions; D: Power when all four phenotypes are associated with interactions.

https://doi.org/10.1371/journal.pone.0275929.g004

thumbnail
Fig 5. Statistical power of Hom-GEI, Het-GEI, PPK-GEI and LPK-GEI under moderate among-phenotype correlation ρ = 0.5.

The horizontal and vertical axes represent the proportion of causal rare variants and the statistical power, respectively. A: Power when only the first phenotype is associated with the interactions; B: Power when the first two phenotypes are associated with the interactions; C: Power when the first three phenotypes are associated with the interactions; D: Power when all four phenotypes are associated with the interactions.

https://doi.org/10.1371/journal.pone.0275929.g005

thumbnail
Fig 6. Statistical power of Hom-GEI, Het-GEI, PPK-GEI and LPK-GEI under strong among-phenotype correlation ρ = 0.75.

The horizontal and vertical axes represent the proportion of causal rare variants and the statistical power, respectively. A: Power when only the first phenotype is associated with the interactions; B: Power when the first two phenotypes are associated with the interactions; C: Power when the first three phenotypes are associated with the interactions; D: Power when all four phenotypes are associated with the interactions.

https://doi.org/10.1371/journal.pone.0275929.g006

From Figs 46, we can see that the four tests provide improved power with more phenotypes associated with the interactions, which suggests that our multiphenotype analyses can exploit GEI pleiotropy effectively. For instance, in Fig 6, with the causal proportion θ = 0.2, when only the first phenotype is associated with the interactions, the powers of Hom-GEI, Het-GEI, PPK-GEI and LPK-GEI are 0.013, 0.206, 0.292 and 0.045, respectively, as shown in Fig 6A. When the first two phenotypes are associated with the interactions, the corresponding power improves to 0.092, 0.607, 0.629 and 0.152, as shown in Fig 6B. With three phenotypes associated with the interactions, the power become even larger and are 0.173, 0.766, 0.767 and 0.252, as shown in Fig 6C. When all phenotypes are associated with the interactions, the power further increase to 0.236, 0.827, 0.828 and 0.345, as shown in Fig 6D.

We can also see from Figs 46 that Hom-GEI, Het-GEI and PPK-GEI have enhanced power as the correlation among phenotypes becomes stronger, but LPK-GEI suffers power loss. For instance, we observe the power of the four tests with causal proportion θ = 0.2. When the correlation among phenotypes is weak, the power values of Hom-GEI, Het-GEI, PPK-GEI and LPK-GEI are 0.105, 0.508, 0.523 and 0.370, respectively, in Fig 4C. When the correlation is moderate, the power values are 0.104, 0.597, 0.598 and 0.311, as shown in Fig 5C. With a strong correlation, the power values are 0.173, 0.766, 0.767 and 0.252, as shown in Fig 6C. This demonstrates that Hom-GEI, Het-GEI and PPK-GEI can benefit from the increased correlations among phenotypes. However, since LPK-GEI directly combines statistics from single-phenotype GEI tests and ignores the phenotypic correlations, the increased correlation among phenotypes leads to a substantial power loss.

Among almost all of the considered scenarios in Figs 46, PPK-GEI has approximately the same or slightly larger power than Het-GEI, and the two tests outperform Hom-GEI and LPK-GEI. Hom-GEI shows the poorest power performance among all the proposed tests. This is because the phenotypes were simulated based on heterogeneous interaction effects, violating the assumption that Hom-GEI is based upon. Because the GEI effects of a variant on multiple phenotypes can hardly be homogeneous in reality, Hom-GEI may not be a good choice for real data analysis. Therefore, we choose Het-GEI and PPK-GEI for our genome-wide interaction analysis in UK Biobank.

With the proportion of causal rare variant θ = 0.1 and the among-phenotype correlation ρ = 0.5, we compared the power of our tests with the TOWGE-FCT under different numbers of phenotypes associated with the interactions, the results are shown in Fig 7. Because TOWGE-FCT is a permutation based method, which is computationally very expensive, the power results were evaluated at the significance level of 0.05. As can be observed from Fig 7 that all the five tests can provide enhanced power as more phenotypes associated with the interactions. Het-GEI, PPK-GEI and LPK-GEI tests have higher power than TOWGE-FCT, however, Hom-GEI has lower power than TOWGE-FCT. We can also see that Het-GEI and PPK-GEI tests outperform the other tests, further indicating Het-GEI and PPK-GEI to be two powerful tests. For instance, when only the first two phenotypes are associated with the interactions, the power values for Hom-GEI, Het-GEI, PPK-GEI, LPK-GEI and TOWGE-FCT are 0.177, 0.507, 0.517, 0.403 and 0.304, respectively.

thumbnail
Fig 7. Statistical power of Hom-GEI, Het-GEI, PPK-GEI, LPK-GEI and TOWGE-FCT with different correlated phenotypes, ρ = 0.5, θ = 0.1.

The horizontal and vertical axes represent the number of phenotypes associated with interactions and the statistical power, respectively. The significance level is 0.05.

https://doi.org/10.1371/journal.pone.0275929.g007

Gene-Hb interaction analysis of blood pressure phenotypes in UK Biobank

UK Biobank is a prospective study that recruited approximately 500,000 volunteers aged 40 and 69 years in the United Kingdom and collected extensive genetic and phenotypic data [25, 26]. We used the whole-exome sequencing data released by UK Biobank with a total of 200,643 samples. Individuals who had withdrawn and one member in each pair with kinship larger than 0.25 measured via KING [27] were removed. We considered SBP and DBP as the blood pressure (BP) phenotypes and Hb as the environmental factor, which is known to be associated with both SBP and DBP [17, 18]. Covariates included age, age2, sex, BMI and 20 principal components to adjust for population stratification. SBP and DBP averaged over multiple measurements at baseline were used. For individuals taking BP-lowering medications, 10 mm Hg and 5 mm Hg were added to the SBP and DBP, respectively [28, 29]. Phenotypes and covariates located 5 standard deviations away from their respective means were defined as outliers. Outliers or individuals with missing phenotypes or missing covariates were removed. As a result, 157,514 individuals, including 71,501 males (45.4% males) and 86,013 females (54.6% females), were included in our gene-Hb interaction analyses. BP phenotypes, age, BMI and Hb were standardized before the analysis.

We carried out genome-wide analysis on 18,101 genes from 22 autosomal chromosomes. Variants in the genotype dataset were annotated via VEP [30]. We restricted to variants annotated as stop_loss, missense_variant, start_lose, splice_donor_variant, inframe_deletion, frameshift_variant, splice_acceptor_variant, stop_gained or inframe_insertion with PolyPhen scores larger than 0.15 and Sift scores less than 0.05 [31]. Those variants with MAFs less than 3% were extracted via PLINK [32]. Genotypes were further transformed into numeric values using fcGENE [33].

For each of the 18,101 genes, we performed Het-GEI and PPK-GEI analysis of the gene-Hb interactions on SBP and DBP phenotypes. Manhattan plots of p-values from the two tests are presented in Fig 8, and QQ plots of the Het-GEI and PPK-GEI tests are shown in Fig 9. With a genome-wide significance level of 2.5×10−6, only LEUTX is significant according to the Het-GEI test (p-value = 2.2×10−6), and its p-value according to the PPK-GEI test is 7.43×10−6. If we consider a suggestive significance level at 1×10−4, twelve genes passed the threshold, whose details are presented in Table 1.

thumbnail
Fig 8. Manhattan plot of genome-wide multiphenotype analysis of gene-Hb interactions in BPs.

The horizontal and vertical axes represent the genomic position and the negative log10 of the p-values, respectively. A: Het-GEI; B: PPK-GEI.

https://doi.org/10.1371/journal.pone.0275929.g008

thumbnail
Fig 9. Q–Q plots of genome-wide multiphenotype GEI analysis of gene-Hb interactions in BPs.

The horizontal and vertical axes represent the negative log10 of the expected p-values and the negative log10 of the observed p-values, respectively. A: Het-GEI; B: PPK-GEI.

https://doi.org/10.1371/journal.pone.0275929.g009

thumbnail
Table 1. Genes showing suggestive evidence (p-values < 1 × 10−4) of gene-Hb interactions in the Het-GEI or PPK-GEI tests.

https://doi.org/10.1371/journal.pone.0275929.t001

Recently, Surendran et al. reported 22 genes associated with SBP or DBP in a meta-analysis of 1.3 million samples from multiple cohorts, including UK Biobank, the Million Veterans Program and deCODE [19]. For the 22 genes, we looked up our genome-wide results for the possible interactions with Hb. For comparison, we also conducted single-phenotype GEI tests using the INT-FIX function from the rareGE R package [16] for the two BP phenotypes separately. The number of rare variants involved in the analysis and p-values from the multiphenotype and single-phenotype GEI tests are provided in Table 2.

thumbnail
Table 2. Multiphenotypic analyses and single-phenotype analyses of gene-Hb interactions in BP-associated genes.

https://doi.org/10.1371/journal.pone.0275929.t002

There are no significant results after correcting the multiple testing. At the nominal significance level of 0.05, only one gene, MYO1C, shows interactions with Hb for BP phenotypes by the PPK-GEI test (p-value = 0.038). With the single-phenotype GEI test, NOS3 has a p-value of 0.026 for DBP, and MYO1C has p-values of 0.018 and 0.011 for SBP and DBP, respectively.

Discussion

In this paper, we propose four statistical tests, Hom-GEI, Het-GEI, PPK-GEI and LPK-GEI, to test GEI effects with rare variants for multiple correlated quantitative phenotypes. Through simulation studies, the statistical power of the tests was investigated in terms of the proportion of causal variants, the number of phenotypes associated with interactions and the correlation strength among phenotypes. Simulation results show that all tests demonstrate improved statistical power when the proportion of causal variants or the number of associated phenotypes increases. Hom-GEI, Het-GEI and PPK-GEI benefit from correlation among phenotypes; however, the LPK-GEI test suffers power loss, especially when correlation among phenotypes is strong. This is because LPK-GEI directly combines statistics from single-phenotype GEI tests and ignores phenotype dependence. In addition, among almost all of the considered scenarios, Het-GEI and PPK-GEI have almost the same power and outperform the other two tests. Hom-GEI shows the poorest power due to its unrealistic assumption. In summary, Het-GEI and PPK-GEI are two powerful tests for investigating GEI with multiple quantitative phenotypes.

We applied Het-GEI and PPK-GEI in the genome-wide analysis of SBP and DBP in order to detect possible gene-Hb interactions in UK Biobank. We analyzed 18,101 genes and identified LEUTX to be associated with BP phenotypes through its interaction with Hb via the Het-GEI test. At the suggestive significance level, twelve genes were identified to be associated with BP phenotypes through their interactions with Hb. LEUTX was previously reported to play a central role in embryo genome activation [34] whose role in BP regulation is unclear. Recent study of rare variants suggests that BP-associated variants are enriched in active chromatin regions of fetal tissue and potentially link fetal development to BP regulation in later life [19]. Thus, LEUTX might be associated with the BP phenotypes in a similar manner. In the analysis of 454,787 UK Biobank participants, genetic main effect of LEUTX was not associated with BP phenotypes at the nominal significance level of 0.05 [35]. However, our tests identified LEUTX to be interacted with Hb in BP phenotypes at the genome-wide significance level of 2.5×10−6 in the sample of 157,514 UK Biobank participants. We also conducted a single-phenotype GEI test and multiphenotype GEI tests to evaluate the gene-Hb interactions for 22 genes that were previously reported to be associated with SBP or DBP in a meta-analysis of genetic main effects. MYO1C shows nominal significance by the Het-GEI test. NOS3 shows nominal significance in DBP and MYO1C in SBP and DBP by the single-phenotype GEI test.

Our proposed multiphenotype GEI tests are an extension of the single-phenotype GEI test in rareGE [16]. The tests retain the desirable properties of the single-phenotype GEI test, which allows for adjusting covariates and is powerful when the GEI effects of variants act in different directions on phenotypes. Our proposed multiphenotype GEI tests, except for Hom-GEI, are more powerful than the existing multiphenotype GEI test TOWGE-FCT. Moreover, our methods are computationally less expensive. This is because that TOWGE-FCT employs permutations to evaluate p value for each transferred phenotype, however, our tests are based on asymptotic distributions and p values can be computed analytically.

Our proposed multiphenotype GEI tests have the following limitations. First, our tests have specific assumptions on the correlation structure for the GEI effects among multiple phenotypes, violating the assumption would lead to a substantial loss of power. Second, our tests assume individuals to be unrelated. Familiar correlation is not considered, and related samples must be excluded, which may substantially reduce the sample size and result in a loss of power. Third, our proposed GEI tests are for rare variants only, while both common and rare variants may contribute to complex diseases [3638]. In future work, we plan to consider a unified test to address these issues. Finally, while we identified LEUTX to interact with Hb in BP phenotypes, the result lacks independent validation. Thus, it should be considered as being preliminary and further experiments are necessary.

Conclusion

In this paper, we modeled the correlation among the GEI effects of a variant on multiple phenotypes by using four kernels. Based on these kernels, we proposed four multiphenotype GEI tests for rare variants. The four tests retain the desirable properties of the single-phenotype GEI test and provide enhanced statistical power by analyzing multiple phenotypes simultaneously. We applied Het-GEI and PPK-GEI to test gene-Hb interactions for 18,101 genes in SBP and DBP in UK Biobank. LEUTX was associated with BP phenotypes through the interaction with Hb via the Het-GEI test. At the suggestive significance level, twelve genes were reported. Our proposed tests can be readily used to test GEIs in a variety of correlated phenotypes and hopefully contribute to the genetic studies of complex diseases.

References

  1. 1. Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47(D1):D1005–D1012. pmid:30445434.
  2. 2. Nolte IM, van der Most PJ, Alizadeh BZ, de Bakker PI, Boezen HM, Bruinenberg M, et al. Missing heritability: is the gap closing? An analysis of 32 complex traits in the Lifelines Cohort Study. Eur J Hum Genet. 2017;25(7):877–885. pmid:28401901.
  3. 3. López-Cortegano E, Caballero A. Inferring the Nature of Missing Heritability in Human Traits Using Data from the GWAS Catalog. Genetics. 2019;212(3):891–904. pmid:31123044.
  4. 4. Geddes L. Genetic study homes in on height’s heritability mystery. Nature. 2019;568(7753):444–445. pmid:31015700.
  5. 5. Yu C, Arcos-Burgos M, Baune BT, Arolt V, Dannlowski U, Wong ML, et al. Low-frequency and rare variants may contribute to elucidate the genetics of major depressive disorder. Transl Psychiatry. 2018;8(1):70. pmid:29581422.
  6. 6. Wang Z, Chen H, Bartz TM, Bielak LF, Chasman DI, Feitosa MF, et al. Role of Rare and Low-Frequency Variants in Gene-Alcohol Interactions on Plasma Lipid Levels. Circ Genom Precis Med. 2020;13(4):e002772. pmid:32510982.
  7. 7. Yang T, Jackson VE, Smith AV, Chen H, Bartz TM, Sitlani CM, et al. Rare and low-frequency exonic variants and gene-by-smoking interactions in pulmonary function. Sci Rep. 2021;11(1):19365. pmid:34588469.
  8. 8. Johansson E, Martin LJ, He H, Chen X, Weirauch MT, Kroner JW, et al. Second-hand smoke and NFE2L2 genotype interaction increases paediatric asthma risk and severity. Clin Exp Allergy. 2021;51(6):801–810. pmid:33382170.
  9. 9. Dahl A, Nguyen K, Cai N, Gandal MJ, Flint J, Zaitlen N. A Robust Method Uncovers Significant Context-Specific Heritability in Diverse Complex Traits. Am J Hum Genet. 2020;106(1):71–91. pmid:31901249.
  10. 10. Majumdar A, Burch KS, Haldar T, Sankararaman S, Pasaniuc B, Gauderman WJ, et al. A two-step approach to testing overall effect of gene-environment interaction for multiple phenotypes. Bioinformatics. 2021:btaa1083. pmid:33453114.
  11. 11. Kilpeläinen TO, Bentley AR, Noordam R, Sung YJ, Schwander K, Winkler TW, et al. Multi-ancestry study of blood lipid levels identifies four loci interacting with physical activity. Nat Commun. 2019;10(1):376. pmid:30670697.
  12. 12. Noordam R, Bos MM, Wang H, Winkler TW, Bentley AR, Kilpeläinen TO, et al. Multi-ancestry sleep-by-SNP interaction analysis in 126,926 individuals reveals lipid loci stratified by sleep duration. Nat Commun. 2019;10(1):5121. pmid:31719535.
  13. 13. Zhang J, Sha Q, Hao H, Zhang S, Gao XR, Wang X. Test Gene-Environment Interactions for Multiple Traits in Sequencing Association Studies. Hum Hered. 2019;84(4–5):170–196. pmid:32417835.
  14. 14. Zhao Z, Zhang J, Sha Q, Hao H. Testing gene-environment interactions for rare and/or common variants in sequencing association studies. PLoS One. 2020;15(3):e0229217. pmid:32155162.
  15. 15. Dutta D, Scott L, Boehnke M, Lee S. Multi-SKAT: General framework to test for rare-variant association with multiple phenotypes. Genet Epidemiol. 2019;43(1):4–23. pmid:30298564.
  16. 16. Chen H, Meigs JB, Dupuis J. Incorporating gene-environment interaction in testing for association with rare genetic variants. Hum Hered. 2014;78(2):81–90. pmid:25060534
  17. 17. Atsma F, Veldhuizen I, de Kort W, van Kraaij M, Pasker-de Jong P, Deinum J. Hemoglobin level is positively associated with blood pressure in a large cohort of healthy individuals. Hypertension. 2012;60(4):936–941. pmid:22949533.
  18. 18. Wang TJ, Gona P, Larson MG, Levy D, Benjamin EJ, Tofler GH, et al. Multiple biomarkers and the risk of incident hypertension. Hypertension. 2007;49(3):432–438. pmid:17242302.
  19. 19. Surendran P, Feofanova EV, Lahrouchi N, Ntalla I, Karthikeyan S, Cook J, et al. Discovery of rare variants associated with blood pressure regulation through meta-analysis of 1.3 million individuals. Nat Genet. 2020;52(12):1314–1332. pmid:33230300.
  20. 20. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011;89(1):82–93. pmid:21737059.
  21. 21. Kuonen D. Saddlepoint approximations for distributions of quadratic forms in normal variables. Biometrika. 1999;86(4):929–935.
  22. 22. Wu B, Pankow JS. Sequence Kernel Association Test of Multiple Continuous Phenotypes. Genet Epidemiol. 2016;40(2):91–100. pmid:26782911.
  23. 23. Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, Altshuler D. Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 2005;15(11):1576–1583. pmid:16251467.
  24. 24. Broadaway KA, Cutler DJ, Duncan R, Moore JL, Ware EB, Jhun MA, et al. A Statistical Approach for Testing Cross-Phenotype Effects of Rare Variants. Am J Hum Genet. 2016;98(3):525–540. pmid:26942286.
  25. 25. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–209. pmid:30305743.
  26. 26. Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12(3):e1001779. pmid:25826379.
  27. 27. Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26(22):2867–2873. pmid:20926424.
  28. 28. Cui JS, Hopper JL, Harrap SB. Antihypertensive treatments obscure familial contributions to blood pressure variation. Hypertension. 2003;41(2):207–210. pmid:12574083.
  29. 29. Tobin MD, Sheehan NA, Scurrah KJ, Burton PR. Adjusting for treatment effects in studies of quantitative traits: antihypertensive therapy and systolic blood pressure. Stat Med. 2005;24(19):2911–2935. pmid:16152135.
  30. 30. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17(1):122. pmid:27268795.
  31. 31. Cirulli ET, White S, Read RW, Elhanan G, Metcalf WJ, Tanudjaja F, et al. Genome-wide rare variant analysis for thousands of phenotypes in over 70,000 exomes from two cohorts. Nat Commun. 2020;11(1):542. pmid:31992710.
  32. 32. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;25;4:7. pmid:25722852.
  33. 33. Roshyara NR, Scholz M. fcGENE: a versatile tool for processing and transforming SNP datasets. PLoS One. 2014;9(7):e97589. pmid:25050709.
  34. 34. Jouhilahti EM, Madissoon E, Vesterlund L, Töhönen V, Krjutškov K, Plaza Reyes A, et al. The human PRD-like homeobox gene LEUTX has a central role in embryo genome activation. Development. 2016;143(19):3459–3469. pmid:27578796.
  35. 35. Backman JD, Li AH, Marcketta A, Sun D, Mbatchou J, Kessler MD, et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature. 2021;599(7886):628–634. pmid:34662886.
  36. 36. Ionita-Laza I, Lee S, Makarov V, Buxbaum JD, Lin X. Sequence kernel association tests for the combined effect of rare and common variants. Am J Hum Genet. 2013;92(6):841–853. pmid:23684009.
  37. 37. Jiao S, Hsu L, Bézieau S, Brenner H, Chan AT, Chang-Claude J, et al. SBERIA: set-based gene-environment interaction test for rare and common variants in complex diseases. Genet Epidemiol. 2013;37(5):452–464. pmid:23720162.
  38. 38. Zhao G, Marceau R, Zhang D, Tzeng JY. Assessing gene-environment interactions for common and rare variants with binary traits using gene-trait similarity regression. Genetics. 2015;199(3):695–710. pmid:25585620.