The geometry of G × E: How scaling and endogenous treatment effects shape interaction direction

Michal Sadowski; Andy W. Dahl; Noah Zaitlen; Richard Border

doi:10.1371/journal.pgen.1012073

Abstract

Gene-environment interaction (G × E) studies hold promise for identifying genetic loci mediating the effects of environmental risk on disease. However, interpretation of G × E effects is often confounded by two fundamental issues: the dependence of interaction estimates on outcome scale and the presence of endogenous treatment effects, in which genetic liability influences environmental exposure. These factors can induce apparent G × E signals—even when genetic and environmental contributions are purely additive on an unobserved scale. In this work, we demonstrate that any monotone convex transformation of an outcome induces sign-consistent G × E effects: the sign of the interaction term aligns with the sign of the corresponding main genetic effect. Convex transformations are a broad class of functions that include many commonly used data transformations, such as exponential and logarithmic functions, the square root, and other power transformations. We further show that endogenous treatment effects, modeled as threshold-based interventions, generate G × E effects with a similar directional signature. Exploiting this property, we propose a simple diagnostic: sign consistency across G × E estimates can signal when interactions are driven by outcome scaling or exposure endogeneity. We validate our framework in the UK Biobank using transcriptome-wide interaction studies (TxEWAS) across multiple trait–environment pairs, observing widespread sign consistency in some settings—suggesting confounding by scaling or treatment bias. Our results provide both a theoretical foundation and a practical tool for interpreting G × E findings, enabling researchers to assess whether the observed G × E signal may depend substantially on outcome scaling or be influenced by exposure endogeneity.

Author summary

Gene-environment interaction (G × E) studies examine the extent to which genetic differences modulate environmental impacts on individuals’ health outcomes. However, their results depend on how these outcomes are measured or modeled, and are often confounded by endogenous treatment effects, where exposure to an environment depends on the health outcome itself (for example, individuals with high blood pressure are more likely to receive blood pressure reducing medications). We demonstrate that both a wide class of scaling functions and endogenous treatment effects induce sign-consistent G × E: the direction of the interaction aligns with the direction of the main genetic effect. This property can be used as a diagnostic to assess when an apparent G × E signal could be driven by outcome scaling or exposure endogeneity.

Citation: Sadowski M, Dahl AW, Zaitlen N, Border R (2026) The geometry of G × E: How scaling and endogenous treatment effects shape interaction direction. PLoS Genet 22(4): e1012073. https://doi.org/10.1371/journal.pgen.1012073

Editor: Xiang Zhou, Yale University, UNITED STATES OF AMERICA

Received: August 4, 2025; Accepted: February 27, 2026; Published: April 1, 2026

Copyright: © 2026 Sadowski et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The UK Biobank data underlying the results presented in this study were accessed under application 33127 and cannot be further distributed in accordance with UK Biobank policies. Researchers may obtain access to these data by submitting an application directly to the UK Biobank: https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access. All data and code necessary to reproduce the figures and results presented in this study are publicly available in a GitHub repository at https://github.com/michalsad/gxe_sign.

Funding: This work was funded by the National Institutes of Health grants R01MH130581, U01MH126798, R01MH122688, R01HG006399, R01HG011345, and R01GM142112 (NZ, RB, MS); L30HG013856 (RB); and R35GM150822 and K25HL157603 (AWD). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Individuals exhibit substantial phenotypic heterogeneity in response to environmental perturbations. Part of this heterogeneity arises from individual differences in genetic background and is referred to as gene-environment interaction (G × E). Several interactions identified to date have important implications for human health. For example: (1) dietary treatment prevents symptoms of phenylketonuria—a genetic disorder caused by mutations in the PAH gene [1]; (2) physical activity blunts the effects of obesity risk variants in the fat mass and obesity-associated gene, FTO [2]; (3) a variant of the NAT2 gene elevates the risk of bladder cancer in smokers [3]; and (4) a multitude of gene polymorphisms have been shown to impact drug response or toxicity [4–8]. These examples showcase the potential of G × E discovery to enhance disease prevention and management, to enable design of individualized treatments that are safer and more effective, and, more generally, to advance our understanding of disease etiology. To unlock this potential, many methods for G × E detection have been developed [9–12] and they are continually being optimized. Most recent approaches enable genome-wide G × E screens in large-scale studies of human populations [13–15].

Standard statistical methods assess G × E by comparing additive and non-additive models of genetic and environmental effects. Although valuable for exploratory analysis, this definition depends on modeling assumptions and might not reflect biological mechanisms of interaction [16,17]. Here, we focus on two fundamental issues that complicate the interpretation of current G × E approaches: (1) dependence on phenotype scale and (2) endogenous treatment effects. In case (1), the detection of an interaction effect and its direction depend on the scale on which the outcome is measured, or to which it might be transformed [18–21]. For example, even though a genetic variant G and an environmental factor E impact an outcome Y additively, an interaction test performed on Y that has been log-transformed (e.g., as part of quality control processing) can yield a highly significant G × E effect (Fig 1). More generally, many interaction effects can be induced or removed by monotonic non-linear transformations of the data. In case (2), exposure and genetic liability are causally intertwined. For example, imagine that a treatment is administered to taper the level of a heritable phenotype when it crosses some threshold (e.g., statins may be prescribed to lower low-density lipoprotein (LDL) cholesterol levels). In this case, exposure to the intervention is related to genetic factors influencing phenotype. Such endogenous treatment effects can result in apparent G × E, even when gene and environment act additively on the observed scale. As a result, most G × E findings require the caveat that they are specific to a particular measurement or may be a consequence of endogenous treatment effects [22–25].

Download:

Fig 1. G × E effect induced by log-transformation of outcome Y.

A: Depiction of the effects of the genetic variant G (with reference allele A and alternative allele B, MAF = 0.4), the environmental factor E (drawn from a standard normal distribution), and the interaction between the two (G × E) on the outcome Y (generated as for 1,000 samples). The p-value for the G × E effect (P_GxE) is given in the right lower corner. B: Depiction of the same effects on the log-transformed Y. Whereas the G × E effect is not detectable for Y (A), it is detectable for the log-transformed Y (B).

https://doi.org/10.1371/journal.pgen.1012073.g001

Not all observed interactions require these caveats. For instance, there is an extreme form of interaction, also known as a crossover effect, in which the direction of an association (rather than only its magnitude) depends on a moderating factor. This type of interaction cannot be eliminated by a monotonic transformation [16] and, when sufficiently large, has a relatively straightforward interpretation [17]. Other statistical interactions, however, can in principle be removed by a monotonic transformation—meaning that, after transforming the outcome, the data are adequately described by an additive model.

Here, we demonstrate that monotone convex transformations of an outcome induce sign-consistent G × E, where the direction of the interaction effects is determined by the sign of the corresponding main effects. We further show that endogenous treatment effects, modeled as threshold-based interventions, also generate sign-consistent G × E. Finally, we discuss examples of non-convex transformations, like the logistic function, showing why and under what circumstances they induce this particular type of interaction effects. Our results indicate that a simple examination of sign consistency across detected G × E can rule out the possibility that all interactions have been induced by a monotone convex scaling of the outcome or endogenous treatment effects. Another consequence of this result is that if G × E signal is not sign-consistent, it cannot be eliminated by a monotone convex transformation.

Monotone convex transformations describe a broad class of functions that include many of the most commonly used data transformations in genetic studies and data analysis in general. For example, Box–Cox transformations, which are widely used to reduce departures from normality, are convex. Examples of transformations that are convex down include the square function, power transformations with even exponents, and the exponential function. Transformations that are convex up include the logarithm, the square root, and power transformations with exponents between zero and one. Many more commonly used transformations are locally convex; for instance, the logistic function and the hyperbolic tangent are convex down on one half of their domain and convex up on the other. We restrict our attention to monotone transformations as they preserve (or completely reverse) the order of data values.

We demonstrate the usefulness of sign consistency examination in real data, as our analysis of this property identifies that statin use induces false positive gene-age interaction effects on LDL cholesterol levels.

Results

Sign-consistent interaction property

We investigate the relationship between the signs of regression coefficients estimated in the G × E model across multiple genetic variants. More concretely, consider testing two haploid variants, G₁ and G₂, for an interaction with a binary environmental exposure E against phenotype Y in two single-variant regressions:

where , , and are coefficients estimated in regression i, and represents homoskedastic residual variation. Critically, neither of these models needs to accurately describe causal relationships; we just assume we can fit these regression models and have well-behaved errors.

Next, consider the same two tests performed for the same phenotype Y measured on a different scale:

where is the map from the scale of the former measurement of Y to the scale of the latter measurement of Y, and the corresponding coefficients and residuals are marked with superscript .

We demonstrate that the signs of interaction effects, and , induced by a monotone convex transformation depend on the signs of the main genetic effects, and . The interaction effects satisfy a precise sign rule:

Theorem 1 (Sign-consistent interaction property). Assume measurement Y has homogeneous variance and exhibits no G × E effects () on the original scale, with G and E independent, mean-centered, and having finite variance. If is a monotone convex transformation, then the G × E effects and satisfy:

(1)

where denotes the sign of the second derivative of (positive for convex down, negative for convex up).

Note that if we assume that the alleles of G₁ and G₂ are encoded so that their main effects have the same direction (i.e., ), the above property becomes:

(2)

which we call sign consistency. By contraposition, if the homoskedasticity and assumptions hold but property (2) does not, non-zero G × E effects and are not both induced by a monotone convex scaling of Y.

Though we focus on the simple case of haploid genotypes and binary environmental exposures in the primary text, these results apply to diploid genotypes and continuous environmental exposures.

Corollary 1 (Sign consistency for diploid genotypes). The sign rule (1) extends to diploid genotypes under Hardy-Weinberg equilibrium, with E any random variable satisfying the independence and moment conditions above.

We further generalize this result to allow for moderate G-E correlations:

Theorem 2 (Sign consistency for correlated G and E). When G and E are correlated, the induced interaction effect includes an additional correlation term:

where arises from the non-orthogonality of predictors. The sign rule (1) holds when the transformation effect dominates: .

Corollary 2 (Dominance of transformation effect). For small gene-environment correlations or strong curvature , the sign rule approximately holds. In practice, if observed interactions are strongly sign-consistent across many variants, the transformation effect likely dominates any correlation-induced deviations.

Finally, we show that endogenous treatment effects—where treatment is assigned based on a phenotype threshold—also induce sign-consistent interactions:

Theorem 3 (Opposite-sign rule for endogenous treatment). Consider an environmental exposure E assigned when phenotype Y exceeds a threshold t, with treatment effect on Y. For a genetic variant with effect on Y, if the treatment threshold exceeds the reference mean (), the induced interaction satisfies:

We prove these results rigorously in S1 Appendix.

Generally, in abstraction from this two-variant example, we demonstrate that if there is a scale, on which phenotype Y has homogeneous variance across values of environmental factor E and genetic variant G, and those factors have only additive effects on Y, then the direction of the G × E effect estimated for a measurement that is a monotone convex transformation of Y depends on the direction of the corresponding main genetic effect.

We, therefore, propose to examine observed G × E effects for the sign consistency property, as it provides a means to exclude the family of monotone convex transformations as the sole source of these effects.

Sketch of argument

Suppose that phenotype Y has homogeneous variance across values of environmental factor E and genotype G:

(3)

where is a symmetric distribution with mean ν and variance . We fit the following linear regression model to test the effect of an interaction between G and E on this phenotype:

(4)

where , , and are estimated coefficients and ε is the error. We consider a simplified example, where E and G are binary variables, which corresponds to the case of a haploid genetic variant and a binary environmental exposure. Importantly, our reasoning is not contingent on whether this model is correct, only that the homoskedasticity condition (3) holds. The coefficients in (4) can be related to the empirical conditional expectations:

where . For example, P_0,1 is the average value of phenotype Y in individuals with genotype G = 1 who are not exposed to environmental factor E (i.e., E = 0).

Without loss of generality, suppose further that coefficients , and are estimated to be positive, and that the estimate of the G × E effect is zero, as depicted on the x-axis of Fig 2A. Consider now a regression similar to (4), but performed on phenotype Y measured on a different than the original scale:

(5)

Download:

Fig 2. Monotone convex transformations of the outcome induce sign-consistent G × E effects in interaction tests.

A: The x-axis shows the intercept (), the main effect of E (), the main effect of G () and the G × E effect () estimated by regressing E, G and GE on phenotype Y. If, as assumed here, and are positive and is null, a similar regression on this phenotype transformed with an increasing convex down function will yield the main effect of G () and the G × E effect () that are positive. The sign of the G × E effect can be calculated by , where . B: Similar to A, but shows the signs of and when is negative. C: Similar to A, but shows the signs of and when is increasing convex up. The colored segments on the x-axis indicate the sign of —red if positive and blue if negative. Orange and cyan segments on the y-axis denote, respectively, positive and negative values of or , whichever has smaller magnitude. The green segment represents the difference between these two differences, which may be positive or negative depending on the transformation.

https://doi.org/10.1371/journal.pgen.1012073.g002

where is the map from the original scale of Y to the new scale.

If we assume that is increasing convex down (Fig 2A), then the signs of and can be related to points P_a,b as:

(6)

(7)

That is, with the above assumptions, the direction of the effects estimated for the scaled phenotype can be expressed using quantities P_a,b defined on the original scale of Y (see Methods for a derivation of this fact). Looking at Fig 2A, it is easy to see that in our example:

The signs of differences and are the same, and, by (6), follow the sign of .
The magnitude of is larger than the magnitude of .

From (7) and these two facts it follows that the sign of is positive. Note that this will be true for any genetic variant whose main effect, , tested in (5) is positive. If, on the other hand, is negative, the G × E effect, , will be negative (Fig 2B). In general, we have the following relation:

Applying a similar argument, it can be shown that when is increasing convex up, the opposite relation, , holds (Fig 2C). In general, the direction of this relation depends on whether is convex down or convex up, and on the sign of , that is, the estimated effect of environmental factor E (Table 1). A complete proof discussing these cases is given in Methods.

Download:

Table 1. Monotone convex transformations induce G × E effects (

) whose directions are consistent with the directions of the observed main effects (

,

). In general,

. Note that functions like

and

must be defined on appropriate domains (e.g., x > 0).

https://doi.org/10.1371/journal.pgen.1012073.t001

Endogenous treatment effects

Suppose that the environmental factor E tested in the G × E regression model is a treatment. Suppose that this treatment is administered to taper the level of a heritable phenotype when it crosses some threshold—e.g., the statin therapy for individuals with high LDL cholesterol. In this case, exposure to the intervention is related to genetic factors influencing phenotype—which we refer to as endogenous treatment effects. As shown in [8], endogenous treatment effects can cause false discoveries when the observed levels of the phenotype (subjected to treatment) are tested for G × E. Following, we prove that the G × E effects induced in a simple model of endogenous treatment effects are sign-consistent.

Consider the following model of phenotype Y:

where, again, G_i indicates the presence of an alternative allele at haploid variant i, is the effect of this allele on Y, and ε is the environmental noise. We assume that the environmental noise is homoskedastic ().

Suppose that if the level of Y is high, an individual is administered treatment E:

where t is some threshold. When applied, treatment E changes the level of Y by :

(8)

Claim 1 (Endogenous treatment effect interaction sign property). Suppose that we observe phenotype and test the effect of the interaction between variant G_j and environmental factor E on this phenotype:

(9)

Then the direction of the estimated G × E effect, , is opposite to the direction of the main genetic effect, .

As in Sketch of argument, we define quantities P₀ and P₁ on the scale of Y, which, transformed, can be used to compute the main genetic effect, , and the G × E effect, , on the scale of . Note that in this case, not only the signs, but also the values of these effects can be expressed in terms of P₀ and P₁ transformed by a certain function. We investigate the properties of this function to determine the properties of and . Specifically, we define and , and show (Methods) that the main genetic effect and the G × E effect estimated in (9) can be related to these points as:

where is the inverse Mill’s Ratio (, where and are the standard normal probability density function and cumulative distribution function, respectively). Importantly, is decreasing and strictly convex down [26].

Suppose that . Then, looking at Fig 3, we see that:

The difference has the opposite sign to difference .
The magnitude of is larger than the magnitude of .
The difference has the same sign as difference .
The magnitude of is greater than the magnitude of .

Download:

Fig 3. Endogenous treatment effects induce sign-consistent G × E effects.

The x-axis shows the quantities and defined for phenotype Y affected by the haploid genetic variant G. The main effect of G and the effect of GE on phenotype Y, after treatment E is applied to reduce levels of Y that exceed threshold t, can be expressed as functions of P₀ and P₁ and their images under the inverse Mill’s Ratio and . The signs of those functions are dependent.

https://doi.org/10.1371/journal.pgen.1012073.g003

From our assumption that and facts 1 and 2 above, it follows that the sign of is negative. From the same assumption and facts 1, 3 and 4, it follows that the sign of is positive. Note that once the sign of difference is established, the signs of and can be determined based on the properties of . In our example, is negative, which makes negative and positive. On the other hand, when is positive, is positive and is negative. Therefore, we have shown that G × E effects induced by endogenous treatment effects (modeled as in (8)) have opposite directions to the corresponding main genetic effects. That is, for any j, .

The same property holds when treatment E is administered whenever the level of Y is below threshold t (Methods).

Non-convex transformations

An arbitrary non-linear scaling of an outcome may or may not induce sign-consistent G × E. To provide more intuition on this, we discuss properties of G × E that can be produced by two examples of non-convex scaling commonly used in genetic analyses: (1) the logistic function and (2) the inverse normal transformation (INT).

Specifically, we are interested in the relationship between the signs of the main effect of genetic variant G and the effect of the interaction between G and environmental factor E on -transformed phenotype Y in the linear regression:

(10)

Following Sketch of argument, we assume that E and G are binary and that Y (on the original scale) has homogeneous variance and does not exhibit G × E effects, meaning that the linear regression:

yields .

As demonstrated earlier, given these assumptions, and can be defined in terms of the empirical conditional means of untransformed Y:

(11)

(12)

where and f is the PDF of ε.

Case study: the logistic function.

Let be a logistic function:

(13)

Using definitions (11) and (12), it is easy to see that the sign of induced by the logistic scaling depends not only on the relative positions of points P_0,0 and P_1,0 (as was the case for monotone convex transformations), but also on their values and the width of the distribution of ε (Fig 4A). More specifically, the values of P_0,0 and P_1,0 determine—up to noise—the relation between the magnitudes of and in (12). Since these values will be different for different genetic variants, the relationship between the signs of and can be variant-specific.

Download:

Fig 4. Logistic transformation of the outcome induces G × E effects that may or may not be sign-consistent.

A: Depiction of the effect that the logistic transformation of the outcome may have on the regression-based G × E test. Compare with Fig 2. B: An example of two genetic variants (green and orange) with positive effects on the phenotype that after transforming this phenotype with the logistic function exhibit G × E effects of opposite directions. Compare with Fig 2.

https://doi.org/10.1371/journal.pgen.1012073.g004

Consider an illustrative example of two genetic variants G₁ and G₂. For simplicity, we assume , which simplifies (11) and (12) to:

For G₁ we assume: , , and . This means that in unexposed individuals carrying the reference allele at G₁ the average value of phenotype Y is 5, whereas in exposed individuals carrying the same allele it is 8; and the effect of G₁ in both unexposed and exposed groups is 0.5. For variant G₂, on the other hand, we assume: , , and , which corresponds to average values of 5.7 and 8.7 in unexposed and exposed non-carriers, respectively, and the genetic effect of 1 (see x-axis of Fig 4B). Now imagine that we convert phenotype Y to a risk scale where the value of 7 corresponds to the risk of 50%: , and perform regression (10). For variant G₁, this regression yields a positive main genetic effect and a positive G × E effect. For variant G₂, it produces a positive main effect, but a negative G × E effect (Fig 4B). Thus, in this example the logistic transformation induces G × E effects that are not sign-consistent.

There are, however, scenarios where the logistic scaling will induce G × E that are sign-consistent. A plausible example of such a scenario in healthcare data occurs when x₀ in (13) is large (meaning that the cases are called at high phenotype values) and the environmental effect and individual genetic effects on (untransformed) Y are relatively small, so that all points P_a,b for all considered genetic variants are smaller than x₀. Since the logistic function is convex on the domain , transforming points P_a,b with this function yields a relation (Table 1). In general, the sign of a G × E effect induced by the logistic scaling depends on the relative positions of P_0,0, P_1,0, P_0,1 and P_1,1 with respect to x₀—all possible cases are detailed in S1 File.

Case study: the inverse normal transformation.

Another data transformation commonly used in genetic analyses is the INT. It matches quantiles of the data distribution with the quantiles of the standard normal distribution. Because the transformation depends on the data’s distribution, its consequences cannot be generalized. More specifically, INT preserves the order of data points, but not the distances between them. In particular, the relationship between the transformed differences of the conditional means: and ( and in (12)), depends not only on the ordering of these conditional means but also on their magnitudes. Consequently, this relationship need not be the same across variants whose original effects have the same direction. As a result, the INT transformation can induce G × E effects in any direction with respect to the main genetic effect of a given sign.

Previously published interaction results exhibit sign consistency property

We have examined sign consistency for several G × E studies, selecting E-outcome pairs for which interactions have previously been found (Fig 5A). More specifically, we performed TxEWAS [8,27] in the UK Biobank [28] population of unrelated white British individuals (Methods). TxEWAS tests the effect of the interaction between predicted expression of a gene G and environmental exposure E on phenotype Y using the following linear regression model:

Download:

Fig 5. Sign consistency between the main (

) and interaction (

) effects in TxEWAS for select E’s and outcomes.

A: Main vs interaction effects for identified genes. For each gene, we plot the estimates corresponding to the tissue with the strongest interaction p-value. is the main environmental effect. B: The fraction of G × E that have sign-consistent effects. This fraction was calculated among interacting genes (called at hFDR < 10%) whose main effects were nominally significant at 5%. The tissue with the strongest interaction p-value for a given gene was considered.

https://doi.org/10.1371/journal.pgen.1012073.g005

where C_i is the i-th additional environmental covariate included in the model, Greek letters represent effect sizes, and . Among additional covariates we included: age, sex, birth date, Townsend deprivation index, and the first 16 genetic principal components (PCs) [29] (if not already used as E). In a single study, we performed multiple tests for a single gene—corresponding to multiple tissues in which this gene was expressed—and used the hierarchical FDR (hFDR) correction to call significant interactions from aggregated results [8]. Sign consistency was examined considering these interactions in tissues, in which they had the strongest effects.

We have investigated gene-sex interaction effects on the primary male sex hormone, testosterone, and the end-product of the purine metabolism, urate; gene-smoking interaction effects on body mass index (BMI); gene-age interaction effects on LDL cholesterol levels; and gene-statin interaction effects on statins’ primary target, LDL cholesterol, and phenotypes related to their potential side effect on diabetes risk [30,31]—blood glucose and hemoglobin A1c (see Methods for phenotype definitions and preprocessing details). We have observed moderate to strong evidence for sign-consistent G × E effects across these traits. More specifically, the fraction of sign-consistent G × E effects was moderate for the sex-testosterone E-outcome pair, high for the sex-urate and statin-hemoglobin A1c pairs, and maximal for the rest of our studies (Fig 5B).

Such a high degree of sign consistency calls for careful interpretation, as many of these interaction effects may have been induced by the outcome measurement scaling or endogeneity. Monotone convex transformations systematically amplify G × E effects whose sign is consistent with that of the corresponding main genetic effect, while attenuating interactions with the opposite sign. As a result, the degree of sign consistency after such a transformation can be high even in the presence of G × E with the opposite sign pattern on the untransformed scale. For example, under an increasing convex up transformation and a positive main environmental effect, G × E effects opposing the main genetic effects are amplified, whereas those aligned with the genetic main effects are reduced or eliminated (Fig 6A and 6B). Consistent with this intuition, our simulations show high sign consistency rates after applying monotone convex transformations to outcomes with randomly directed G × E effects (Figs 6 and S1 Fig). Even when the phenotypic variance explained by interaction effects exceeds that of the additive effects, commonly used transformations—such as the logarithm or square—yield sign consistency rates exceeding 75% (Figs 6C and S1B Fig). This rate depends on the directionality and size of the interaction effects on the untransformed scale, and on the specific transformation applied. Consequently, there is no universal threshold that indicates when scaling and endogenous treatment effects should be identified as major drivers of observed G × E signal. Nevertheless, a predominance of sign-consistent interactions indicates that the results should be interpreted with care and may deserve closer examination. For comparison, our simulations show that the inverse normal transformation, which is not convex, does not alter the sign consistency rate relative to the original scale (S2 Fig).

Download:

Fig 6. Sign consistency after log-transformation of a simulated outcome with randomly directed G × E effects (Methods; see also S1 and S2 Figs).

A: Z-scores for main genetic (G) and interaction (G × E) effects estimated for the outcome before (left) and after (right) transformation. G × E effects were simulated using . B: Number of detected G × E effects for outcomes on the original and transformed scales as a function of the variance of the simulated G × E effects, . C: Estimated rate of sign consistency for outcomes on the original and transformed scales as a function of the variance of the simulated G × E effects, . The sign consistency rate was defined as the proportion of G × E effects exhibiting the more prevalent sign relationship with their corresponding main effects. Due to this definition, sign consistency rate for the untransformed outcome may exceed 0.5.

https://doi.org/10.1371/journal.pgen.1012073.g006

As a concrete example, we hypothesized that the interaction effects detected in the age-LDL cholesterol study were a consequence of endogenous treatment effects between statin use and LDL cholesterol levels. This is because with age increases the probability of taking statins, which are prescribed at high LDL cholesterol levels—meaning that genetic variation associated with LDL cholesterol levels is also correlated with age. Indeed, when we included statin use in our model as a covariate, the G × E effects disappeared.

Sign consistency alone cannot determine the extent to which an observed G × E signal is driven by endogenous treatment effects. For instance, many gene-statin interactions for LDL cholesterol identified by TxEWAS were replicated in a retrospective longitudinal pharmacogenomic study [8], in which major sources of endogeneity were controlled. To determine the mechanisms underpinning such observed associations, additional analyses and experiments are necessary.

Discussion

We have demonstrated that if there is a scale on which an outcome has homogeneous variance across values of environmental factor E and genetic variant G, and these factors have only additive effects on this outcome, then the direction of the G × E effect estimated on the scale that is a monotone convex transformation of the original outcome scale is determined by the direction of the main effect of G. In addition, we have shown that endogenous treatment effects, modeled as threshold-based interventions, can only produce G × E effects with the same sign property.

A consequence of our result is that if G × E effects in both directions with respect to the main genetic effects are observed, there is no monotone convex transformation that can eliminate the G × E effects. Furthermore, they could not have been all induced by endogenous treatment effects. Our results are related to prior conditions under which outcome scaling can eliminate interaction effects [20], especially prior results bounding interaction effect sizes as a function of the curvature of the scaling function [32].

Our argument assumes a null interaction effect on some scale to assess the properties of a signal fully attributable to an outcome transformation. Our heuristic examines whether observed interactions are consistent with this hypothesis at a large number of loci. Although it is unreasonable in general to imagine that all variants interact in the same way relative to an environmental moderator, a monotone convex transformation of the outcome results in a high sign consistency rate even if this null hypothesis is not true for every locus. Thus, a predominance of sign-consistent interactions provides a meaningful indication that the results should be interpreted with care and may deserve closer examination.

Despite apparent similarities between endogenous treatment effects and gene-environment (G-E) correlation, the two phenomena differ. In the considered model of outcome-dependent treatment allocation, genetic factors that influence the outcome become associated with treatment status, and, critically, treatment status becomes correlated with the error term. A correlation between genetic factors and exposure alone does not induce statistical G × E effects. Although G-E correlation can produce spurious G × E signals when genetic markers such as tag SNPs are analyzed instead of the true causal variants [33], this arises from a different data-generating process and is likely a much weaker source of misleading G × E findings [34].

Sign consistency of G × E effects does not imply that they are induced by endogenous treatment effects, nor does the fact that they can be eliminated by an outcome transformation imply that the outcome should be analyzed on the transformed scale. Whenever possible, the outcome scale should be chosen to ensure that results are interpretable and practically meaningful. For example, the relevant scale may be determined by the specific mechanistic model of a biological phenomenon under study or by the public health intervention being evaluated. However, as it is generally unclear what the correct scale is for a given phenotype, examining sign consistency across observed G × E effects can help assess the extent to which a particular type of outcome transformation may alter the results. Moreover, such examination can rule out the possibility that all observed interactions can be attributed to endogenous treatment effects in studies where such effects may be present and the underlying causal mechanisms are unknown. Our analysis of real data sets demonstrates that our approach can help identify potential confounding.

To reduce inaccuracy in assessing sign consistency, we recommend applying the sign consistency property to genome-wide significant interactions (or, at a minimum, to a threshold determined a priori), as done in the analyses presented in this paper.

We note that the homoskedasticity assumption made in our proofs is also an assumption of the linear regression model. Violation of this assumption results in a biased test for the interaction effect [8,35]. In the observed data, it is specifically common that the variance of the outcome differs across strata defined by the environmental factor [36]. Owing to its importance and incomplete characterization, we comprehensively examine the conditional heteroskedasticity bias in S1 Supporting Information. We analytically describe the conditions under which this bias is expected to arise and the direction of its effect. It has been established that, in the presence of heteroskedasticity, G × E should be modeled using the double generalized linear model or a standard linear model modified to incorporate robust standard errors [8,35].

Methods

Sign consistency of G × E effects under monotone convex transformations of the outcome

Here we provide geometric intuition for the sign-consistent interaction property; a direct algebraic derivation is given in S1 Appendix.

Suppose that there is a scale, on which a phenotype exhibits no G × E, and has homogeneous variance. We show that any monotone convex transformation of this phenotype can only induce sign-consistent G × E effects (Theorem 1). The implication is that if G × E effects in both directions with respect to the main genetic effects are observed, there is no such transformation that can eliminate the G × E effects.

Specifically, consider phenotype Y that has homogeneous variance across values of binary environmental factor E and haploid genotype G:

where is a symmetric distribution with mean ν and variance . Consider further fitting the following linear regression model to Y:

(14)

where , , and are estimated coefficients, and ε is the error. We assume that:

The coefficients in (14) can be related to the empirical conditional means of Y, which we denote by points :

The order of P_0,0 and P_0,1, P_1,0 and P_1,1, P_0,0 and P_1,0, and P_0,1 and P_1,1 is determined by the signs of coefficients and . To see this, note that if , then . Alternatively, if , then . Furthermore, by the assumption that is null, and (Fig 7).

Download:

Fig 7. The order of points P_a,b.

When , the signs of regression coefficients and in (14) determine the order of P_0,0 and P_0,1, P_1,0 and P_1,1, P_0,0 and P_1,0, and P_0,1 and P_1,1. Here, .

https://doi.org/10.1371/journal.pgen.1012073.g007

We will use this fact to show that a regression similar to (14) on a monotone convex transformation of Y yields G × E effects whose directions depend on the directions of the corresponding main genetic effects.

Consider Y transformed by a function , and a linear regression of this transformed Y on E, G and GE:

(15)

We can relate the coefficients in (15) to points P_0,0, P_0,1, P_1,0, and P_1,1 that we have defined on the original scale of Y:

and likewise for :

where f is the PDF of ε. Note that each point P_a,b above is always shifted by the same value; and that the signs of the above expressions are invariant to this shift if is monotone convex:

(16)

(17)

Without loss of generality, suppose that is increasing convex down. To determine the sign of , we need to know the signs of differences and , and the relation between their magnitudes. Since P_1,0 and P_1,1 are shifted from P_0,0 and P_0,1 by the same value, , the signs of and are the same, and, by (16), follow the sign of (Fig 8A). The relation between their magnitudes depends on the sign of . If is positive, the magnitude of is greater than the magnitude of , and the opposite is true if is negative.

Download:

Fig 8. Increasing convex transformations of the outcome induce G × E effects,

, whose direction is determined by the direction of the main genetic effects,

, if the untransformed outcome exhibits no G × E and has homogeneous variance in regression (14).

A: The relation between the signs of and when and are positive and is increasing convex down. B: Similar to A, but when is negative. C: Similar to A, but when is increasing convex up. D: Similar to A, but when is negative and is increasing convex up.

https://doi.org/10.1371/journal.pgen.1012073.g008

If two genetic variants, G₁ and G₂, are regressed like G in (15)—and the assumptions of model (14) are met—such that the sign of in these two cases is the same, the sign of differs between these regressions only if the sign of differs.

For example, when is increasing convex down and is positive, then:

implies , because both and are positive, and (Fig 8A).
implies , because both and are negative, and (Fig 8B).

Therefore, in this example, .

Similarly, when is positive, but is increasing convex up, then:

implies , because both and are positive, and (Fig 8C).
implies , because both and are negative, and (Fig 8D).

Therefore, in this example, .

Note that if is increasing convex down, is decreasing convex up; and if is increasing convex up, is decreasing convex down. Change of the sign of the function inverts the directions of both and (see (16) and (17)). Thus, those pairs of transformations, induce the same relation between the signs of and .

Finally, change of the sign of , inverts the relation between the magnitudes of differences and , which, for a given transformation, results in an inverted relation between the signs of and . We summarize all possible cases in Table 2.

Download:

Table 2. Monotone convex transformations induce G × E effects (

) whose directions are consistent with the directions of the observed main effects (

).

https://doi.org/10.1371/journal.pgen.1012073.t002

In S1 Appendix, we show that the distinction between increasing and decreasing transformations is absorbed into the observed coefficients and , yielding the unified sign rule . In practice, this formula can be applied directly using the estimated coefficients without needing to determine whether the transformation is increasing or decreasing.

Sign consistency of G × E effects under a threshold-based model of endogenous treatment effects

Consider the following model of phenotype Y:

where G_i indicates the presence of an alternative allele at variant i, is the effect of this allele on Y, and ε is the environmental noise. We assume that the genotypes are independent: , and that the environmental noise is homoskedastic: . Note that, unlike in our previous derivation where no specific generating model is assumed, here we assume this is the actual generating process. Without loss of generality, let .

Suppose that if the level of Y is high, treatment E is administered:

where t is some threshold. When applied, treatment E changes the level of Y by :

Suppose further that we observe phenotype and test the effect of the interaction between variant G_j and environmental factor E on this phenotype:

(18)

We prove that the sign of is determined by the sign of the main effect of G_j (Claim 1).

Note that coefficients and can be related to empirical conditional expectations of :

Furthermore, note that phenotype conditioned on the value of E has a truncated normal distribution, and its conditional mean is given by:

where is the probability density function and is the cumulative distribution function of the standard normal distribution, and is the mean of Y. Furthermore, the value of depends on the genotype G_j:

where .

To simplify the above expressions, we denote the inverse Mill’s Ratio , and note that , because is even, and . Furthermore, we define points and , and express the estimated effects and in (18) as a function of these points:

(19)

(20)

The function is decreasing and strictly convex down [26] (Fig 9). As a result, the order of points P₀ and P₁ determines the signs of and . Without loss of generality, let t > 0. Since t distinguishes “high” from “normal” levels of phenotype Y, it is reasonable to assume that means and are smaller than t (note that variants G_i are independent); that is, any individual SNP does not result in high enough Y to receive the treatment, as it is likely for any polygenic trait. There are therefore two possible cases:

, which imposes the following order on the points used in definitions (19) and (20): (Fig 9A). Given this order and the properties of , we have: 1) , , and , which implies that is negative; and 2) , which implies that is positive.
, which results in: (Fig 9B). Given this order and the properties of , we have: 1) , , and , which implies that is positive; 2) , which implies that is negative.

Download:

Fig 9. Endogenous treatment effects induce sign-consistent G × E effects.

A: The x-axis shows the quantities and defined for phenotype Y affected by the haploid genetic variant G_j, where . The main effect of G_j and the effect of G_jE on phenotype Y, after treatment E is applied to reduce levels of Y that exceed threshold t, can be expressed as functions of P₀ and P₁ and their images under the inverse Mill’s Ratio and . The signs of those functions are dependent. B: Similar to A, but when .

https://doi.org/10.1371/journal.pgen.1012073.g009

We have, therefore, shown that:

(21)

which means that the estimated effects and in regression (18) have opposite directions. It can be analogously shown that relation (21) holds when treatment E is administered whenever the level of phenotype Y is below threshold t.

Simulation details

We conducted simulations to assess sign consistency of estimated G × E effects after applying monotone convex transformations to outcomes that exhibit G × E on the original scale. In each simulation, outcomes were generated according to , where E is a binary environmental exposure, X is a matrix of 200 independent diploid SNPs, and denotes the element-wise product of E with each column of X. Additive genetic effects were drawn from a standard normal distribution. We randomly selected 100 SNPs to also have interaction effects, with half having interaction effects aligned in sign and half opposed in sign relative to their corresponding main effects. Interaction effects for these 100 SNPs were drawn from a Gaussian distribution with mean zero and variance . The residual variance was scaled to achieve heritability 0.5 among samples with E = 0. A total of 10,000 samples were simulated (5,000 per environmental condition). Statistical significance of G × E effects was assessed at a 5% false positive rate using single-SNP regressions applied to the original outcome Y and to the transformed outcome.

To estimate the number of detected G × E effects and the sign consistency rate, 100 independent replicates were performed for each value of , and results were summarized using the mean and standard deviation.

TxEWAS in the UK Biobank

The TxEWAS presented in this work were performed following the Sadowski et al. protocol [27]. The studied UK Biobank population of 342,257 unrelated white British individuals was identified by performing the steps described by Sadowski et al. [8]

We imputed gene expression into the UK Biobank using eQTL weights trained in 48 tissues of The Genotype-Tissue Expression (GTEx v7) project, linked by the TxEWAS protocol [27]. Hierarchical FDR (hFDR < 10%) was used to account for multiple hypothesis testing across genes and tissues [37,38].

Individuals who took statins were identified by codes: 1140861958, 1140861970, 1141146138, 1140888594, 1140888648, 1140910632, 1140910654, 1141146234, 1141192410, 1141192414, 1141188146, 1140881748, and 1140864592 in the UK Biobank field 20003-0.0-47. Smoking status was derived from the UK Biobank field 20116-0.0 by encoding the “current” category as 1, and the categories of “never” and “previous” as 0.

For all tested outcomes except testosterone, we discarded measurements greater than five standard deviations from the mean, with the assumption that such extreme levels were results of non-modeled circumstances. The distribution of testosterone levels was bimodal, but the sign consistency pattern for this phenotype presented in Fig 5A remained similar after inverse normally transforming it.

We included age, sex, birth date, Townsend deprivation index, and the first 16 genetic PCs [29] as covariates in our studies. All non-binary covariates were standardized (transformed to mean zero, variance one) before calculating interaction variables.

Supporting information

S1 Appendix. Derivation of the sign-consistent interaction property.

https://doi.org/10.1371/journal.pgen.1012073.s001

(PDF)

S1 File. Supporting Information.

Supplementary notes.

https://doi.org/10.1371/journal.pgen.1012073.s002

(PDF)

S1 Fig. Sign consistency after square-transformation of a simulated outcome with randomly directed G × E effects (Methods).

A: Number of detected G × E effects for outcomes on the original and transformed scales as a function of the variance of the simulated G × E effects, . B: Estimated rate of sign consistency for outcomes on the original and transformed scales as a function of the variance of the simulated G × E effects, . The sign consistency rate was defined as the proportion of G × E effects exhibiting the more prevalent sign relationship with their corresponding main effects. Due to this definition, sign consistency rate for the untransformed outcome may exceed 0.5.

https://doi.org/10.1371/journal.pgen.1012073.s003

(PDF)

S2 Fig. Sign consistency after inverse normal transformation of a simulated outcome with randomly directed G × E effects (Methods).

A: Number of detected G × E effects for outcomes on the original and transformed scales as a function of the variance of the simulated G × E effects, . B: Estimated rate of sign consistency for outcomes on the original and transformed scales as a function of the variance of the simulated G × E effects, . The sign consistency rate was defined as the proportion of G × E effects exhibiting the more prevalent sign relationship with their corresponding main effects. Due to this definition, sign consistency rate for the untransformed outcome may exceed 0.5.

https://doi.org/10.1371/journal.pgen.1012073.s004

(PDF)

References

1. Hillert A, Anikster Y, Belanger-Quintana A, Burlina A, Burton BK, Carducci C, et al. The Genetic Landscape and Epidemiology of Phenylketonuria. American Journal of Human Genetics. 2020;107(2):234–50.
- View Article
- Google Scholar
2. Rampersaud E, Mitchell BD, Pollin TI, Fu M, Shen H, O’Connell JR, et al. Physical activity and the association of common FTO gene variants with body mass index and obesity. Arch Intern Med. 2008;168(16):1791–7. pmid:18779467
- View Article
- PubMed/NCBI
- Google Scholar
3. Freedman ND, Silverman DT, Hollenbeck AR, Schatzkin A, Abnet CC. Association between smoking and risk of bladder cancer among men and women. JAMA. 2011;306(7):737–45. pmid:21846855
- View Article
- PubMed/NCBI
- Google Scholar
4. Pirmohamed M. Pharmacogenomics: current status and future perspectives. Nature Reviews Genetics. 2023;24(6):350–62.
- View Article
- Google Scholar
5. Franczyk B, Rysz J, Gluba-Brzózka A. Pharmacogenetics of Drugs Used in the Treatment of Cancers. Genes (Basel). 2022;13(2):311. pmid:35205356
- View Article
- PubMed/NCBI
- Google Scholar
6. Wang C-W, Preclaro IAC, Lin W-H, Chung W-H. An Updated Review of Genetic Associations With Severe Adverse Drug Reactions: Translation and Implementation of Pharmacogenomic Testing in Clinical Practice. Front Pharmacol. 2022;13:886377. pmid:35548363
- View Article
- PubMed/NCBI
- Google Scholar
7. Takeuchi F, McGinnis R, Bourgeois S, Barnes C, Eriksson N, Soranzo N, et al. A genome-wide association study confirms VKORC1, CYP2C9, and CYP4F2 as principal genetic determinants of warfarin dose. PLoS Genet. 2009;5(3):e1000433. pmid:19300499
- View Article
- PubMed/NCBI
- Google Scholar
8. Sadowski M, Thompson M, Mefford J, Haldar T, Oni-Orisan A, Border R, et al. Characterizing the genetic architecture of drug response using gene-context interaction methods. Cell Genom. 2024;4(12):100722. pmid:39637863
- View Article
- PubMed/NCBI
- Google Scholar
9. Marderstein AR, Kulm S, Peng C, Tamimi R, Clark AG, Elemento O. A polygenic-score-based approach for identification of gene-drug interactions stratifying breast cancer risk. Am J Hum Genet. 2021;108(9):1752–64. pmid:34363748
- View Article
- PubMed/NCBI
- Google Scholar
10. Miao J, Lin Y, Wu Y, Zheng B, Schmitz LL, Fletcher JM, et al. A quantile integral linear model to quantify genetic effects on phenotypic variability. Proc Natl Acad Sci U S A. 2022;119(39):e2212959119. pmid:36122202
- View Article
- PubMed/NCBI
- Google Scholar
11. Zhu C, Ming MJ, Cole JM, Edge MD, Kirkpatrick M, Harpak A. Amplification is the primary mode of gene-by-sex interaction in complex human traits. Cell Genom. 2023;3(5):100297. pmid:37228747
- View Article
- PubMed/NCBI
- Google Scholar
12. Durvasula A, Price AL. Distinct explanations underlie gene-environment interactions in the UK Biobank. Am J Hum Genet. 2025;112(3):644–58. pmid:39965571
- View Article
- PubMed/NCBI
- Google Scholar
13. Pazokitoroudi A, Liu Z, Dahl A, Zaitlen N, Rosset S, Sankararaman S. A scalable and robust variance components method reveals insights into the architecture of gene-environment interactions underlying complex traits. Am J Hum Genet. 2024;111(7):1462–80. pmid:38866020
- View Article
- PubMed/NCBI
- Google Scholar
14. Zhu X, Yang Y, Lorincz-Comi N, Li G, Bentley AR, de Vries PS, et al. An approach to identify gene-environment interactions and reveal new biological insight in complex traits. Nat Commun. 2024;15(1):3385. pmid:38649715
- View Article
- PubMed/NCBI
- Google Scholar
15. Di Scipio M, Khan M, Mao S, Chong M, Judge C, Pathan N, et al. A versatile, fast and unbiased method for estimation of gene-by-environment interaction effects on biobank-scale datasets. Nat Commun. 2023;14(1):5196. pmid:37626057
- View Article
- PubMed/NCBI
- Google Scholar
16. Wang X, Elston RC, Zhu X. The meaning of interaction. Hum Hered. 2010;70(4):269–77. pmid:21150212
- View Article
- PubMed/NCBI
- Google Scholar
17. Thompson WD. Effect modification and the limits of biological inference from epidemiologic data. J Clin Epidemiol. 1991;44(3):221–32. pmid:1999681
- View Article
- PubMed/NCBI
- Google Scholar
18. Greenland S. Interactions in epidemiology: relevance, identification, and estimation. Epidemiology. 2009;20(1):14–7. pmid:19234397
- View Article
- PubMed/NCBI
- Google Scholar
19. Gauderman WJ, Mukherjee B, Aschard H, Hsu L, Lewinger JP, Patel CJ, et al. Update on the State of the Science for Analytical Methods for Gene-Environment Interactions. Am J Epidemiol. 2017;186(7):762–70. pmid:28978192
- View Article
- PubMed/NCBI
- Google Scholar
20. Sverdlov S, Thompson EA. The epistasis boundary: Linear vs. nonlinear genotype-phenotype relationships. bioRxiv. 2018;2018:503466.
- View Article
- Google Scholar
21. Sverdlov S, Thompson E. Combinatorial Methods for Epistasis and Dominance. Journal of Computational Biology. 2017;24(4):267–79.
- View Article
- Google Scholar
22. Ottman R. Gene-environment interaction: definitions and study designs. Prev Med. 1996;25(6):764–70. pmid:8936580
- View Article
- PubMed/NCBI
- Google Scholar
23. Dick DM. Gene-environment interaction in psychological traits and disorders. Annu Rev Clin Psychol. 2011;7:383–409. pmid:21219196
- View Article
- PubMed/NCBI
- Google Scholar
24. Barcellos SH, Carvalho LS, Turley P. Education can reduce health differences related to genetic risk of obesity. Proc Natl Acad Sci U S A. 2018;115(42):E9765–72. pmid:30279179
- View Article
- PubMed/NCBI
- Google Scholar
25. Westerman KE, Sofer T. Many roads to a gene-environment interaction. Am J Hum Genet. 2024;111(4):626–35. pmid:38579668
- View Article
- PubMed/NCBI
- Google Scholar
26. Sampford MR. Some Inequalities on Mill’s Ratio and Related Functions. Ann Math Statist. 1953;24(1):130–2.
- View Article
- Google Scholar
27. Sadowski M, Dahl AW, Zaitlen N. Protocol to estimate the heritability of drug response with GxEMM and identify gene-drug interactions with TxEWAS. STAR Protoc. 2025;6(2):103780. pmid:40249708
- View Article
- PubMed/NCBI
- Google Scholar
28. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–9. pmid:30305743
- View Article
- PubMed/NCBI
- Google Scholar
29. Privé F, Aschard H, Carmi S, Folkersen L, Hoggart C, O’Reilly PF, et al. Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort. Am J Hum Genet. 2022;109(1):12–23. pmid:34995502
- View Article
- PubMed/NCBI
- Google Scholar
30. Preiss D, Welsh P, Murphy SA, Ho JE, Waters DD, Demicco DA. Risk of incident diabetes with intensive-dose. JAMA. 2011;305(24):2556–64.
- View Article
- Google Scholar
31. Collins R, Reith C, Emberson J, Armitage J, Baigent C, Blackwell L, et al. Interpretation of the evidence for the efficacy and safety of statin therapy. Lancet. 2016;388(10059):2532–61. pmid:27616593
- View Article
- PubMed/NCBI
- Google Scholar
32. Sheppard B, Rappoport N, Loh P-R, Sanders SJ, Zaitlen N, Dahl A. A model and test for coordinated polygenic epistasis in complex traits. Proc Natl Acad Sci U S A. 2021;118(15):e1922305118. pmid:33833052
- View Article
- PubMed/NCBI
- Google Scholar
33. Dudbridge F, Fletcher O. Gene-environment dependence creates spurious gene-environment interaction. Am J Hum Genet. 2014;95(3):301–7. pmid:25152454
- View Article
- PubMed/NCBI
- Google Scholar
34. Dahl A, Nguyen K, Cai N, Gandal MJ, Flint J, Zaitlen N. A Robust Method Uncovers Significant Context-Specific Heritability in Diverse Complex Traits. Am J Hum Genet. 2020;106(1):71–91. pmid:31901249
- View Article
- PubMed/NCBI
- Google Scholar
35. Almli LM, Duncan R, Feng H, Ghosh D, Binder EB, Bradley B, et al. Correcting systematic inflation in genetic association tests that consider interaction effects: application to a genome-wide association study of posttraumatic stress disorder. JAMA Psychiatry. 2014;71(12):1392–9. pmid:25354142
- View Article
- PubMed/NCBI
- Google Scholar
36. Mefford J, Smullen M, Zhang F, Sadowski M, Border R, Dahl A, et al. Beyond predictive R2: Quantile regression and non-equivalence tests reveal complex relationships of traits and polygenic scores. Am J Hum Genet. 2025;112(6):1363–75. pmid:40480198
- View Article
- PubMed/NCBI
- Google Scholar
37. Peterson CB, Bogomolov M, Benjamini Y, Sabatti C. TreeQTL: hierarchical error control for eQTL findings. Bioinformatics. 2016;32(16):2556–8. pmid:27153635
- View Article
- PubMed/NCBI
- Google Scholar
38. Peterson CB, Bogomolov M, Benjamini Y, Sabatti C. Many Phenotypes Without Many False Discoveries: Error Controlling Strategies for Multitrait Association Studies. Genet Epidemiol. 2016;40(1):45–56. pmid:26626037
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Hillert A, Anikster Y, Belanger-Quintana A, Burlina A, Burton BK, Carducci C, et al. The Genetic Landscape and Epidemiology of Phenylketonuria. American Journal of Human Genetics. 2020;107(2):234–50.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Rampersaud E, Mitchell BD, Pollin TI, Fu M, Shen H, O’Connell JR, et al. Physical activity and the association of common FTO gene variants with body mass index and obesity. Arch Intern Med. 2008;168(16):1791–7. pmid:18779467
View Article
PubMed/NCBI
Google Scholar

[5] View Article

[6] PubMed/NCBI

[7] Google Scholar

[ref3] 3. Freedman ND, Silverman DT, Hollenbeck AR, Schatzkin A, Abnet CC. Association between smoking and risk of bladder cancer among men and women. JAMA. 2011;306(7):737–45. pmid:21846855
View Article
PubMed/NCBI
Google Scholar

[9] View Article

[10] PubMed/NCBI

[11] Google Scholar

[ref4] 4. Pirmohamed M. Pharmacogenomics: current status and future perspectives. Nature Reviews Genetics. 2023;24(6):350–62.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref5] 5. Franczyk B, Rysz J, Gluba-Brzózka A. Pharmacogenetics of Drugs Used in the Treatment of Cancers. Genes (Basel). 2022;13(2):311. pmid:35205356
View Article
PubMed/NCBI
Google Scholar

[16] View Article

[17] PubMed/NCBI

[18] Google Scholar

[ref6] 6. Wang C-W, Preclaro IAC, Lin W-H, Chung W-H. An Updated Review of Genetic Associations With Severe Adverse Drug Reactions: Translation and Implementation of Pharmacogenomic Testing in Clinical Practice. Front Pharmacol. 2022;13:886377. pmid:35548363
View Article
PubMed/NCBI
Google Scholar

[20] View Article

[21] PubMed/NCBI

[22] Google Scholar

[ref7] 7. Takeuchi F, McGinnis R, Bourgeois S, Barnes C, Eriksson N, Soranzo N, et al. A genome-wide association study confirms VKORC1, CYP2C9, and CYP4F2 as principal genetic determinants of warfarin dose. PLoS Genet. 2009;5(3):e1000433. pmid:19300499
View Article
PubMed/NCBI
Google Scholar

[24] View Article

[25] PubMed/NCBI

[26] Google Scholar

[ref8] 8. Sadowski M, Thompson M, Mefford J, Haldar T, Oni-Orisan A, Border R, et al. Characterizing the genetic architecture of drug response using gene-context interaction methods. Cell Genom. 2024;4(12):100722. pmid:39637863
View Article
PubMed/NCBI
Google Scholar

[28] View Article

[29] PubMed/NCBI

[30] Google Scholar

[ref9] 9. Marderstein AR, Kulm S, Peng C, Tamimi R, Clark AG, Elemento O. A polygenic-score-based approach for identification of gene-drug interactions stratifying breast cancer risk. Am J Hum Genet. 2021;108(9):1752–64. pmid:34363748
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref10] 10. Miao J, Lin Y, Wu Y, Zheng B, Schmitz LL, Fletcher JM, et al. A quantile integral linear model to quantify genetic effects on phenotypic variability. Proc Natl Acad Sci U S A. 2022;119(39):e2212959119. pmid:36122202
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref11] 11. Zhu C, Ming MJ, Cole JM, Edge MD, Kirkpatrick M, Harpak A. Amplification is the primary mode of gene-by-sex interaction in complex human traits. Cell Genom. 2023;3(5):100297. pmid:37228747
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref12] 12. Durvasula A, Price AL. Distinct explanations underlie gene-environment interactions in the UK Biobank. Am J Hum Genet. 2025;112(3):644–58. pmid:39965571
View Article
PubMed/NCBI
Google Scholar

[44] View Article

[45] PubMed/NCBI

[46] Google Scholar

[ref13] 13. Pazokitoroudi A, Liu Z, Dahl A, Zaitlen N, Rosset S, Sankararaman S. A scalable and robust variance components method reveals insights into the architecture of gene-environment interactions underlying complex traits. Am J Hum Genet. 2024;111(7):1462–80. pmid:38866020
View Article
PubMed/NCBI
Google Scholar

[48] View Article

[49] PubMed/NCBI

[50] Google Scholar

[ref14] 14. Zhu X, Yang Y, Lorincz-Comi N, Li G, Bentley AR, de Vries PS, et al. An approach to identify gene-environment interactions and reveal new biological insight in complex traits. Nat Commun. 2024;15(1):3385. pmid:38649715
View Article
PubMed/NCBI
Google Scholar

[52] View Article

[53] PubMed/NCBI

[54] Google Scholar

[ref15] 15. Di Scipio M, Khan M, Mao S, Chong M, Judge C, Pathan N, et al. A versatile, fast and unbiased method for estimation of gene-by-environment interaction effects on biobank-scale datasets. Nat Commun. 2023;14(1):5196. pmid:37626057
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref16] 16. Wang X, Elston RC, Zhu X. The meaning of interaction. Hum Hered. 2010;70(4):269–77. pmid:21150212
View Article
PubMed/NCBI
Google Scholar

[60] View Article

[61] PubMed/NCBI

[62] Google Scholar

[ref17] 17. Thompson WD. Effect modification and the limits of biological inference from epidemiologic data. J Clin Epidemiol. 1991;44(3):221–32. pmid:1999681
View Article
PubMed/NCBI
Google Scholar

[64] View Article

[65] PubMed/NCBI

[66] Google Scholar

[ref18] 18. Greenland S. Interactions in epidemiology: relevance, identification, and estimation. Epidemiology. 2009;20(1):14–7. pmid:19234397
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

[ref19] 19. Gauderman WJ, Mukherjee B, Aschard H, Hsu L, Lewinger JP, Patel CJ, et al. Update on the State of the Science for Analytical Methods for Gene-Environment Interactions. Am J Epidemiol. 2017;186(7):762–70. pmid:28978192
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref20] 20. Sverdlov S, Thompson EA. The epistasis boundary: Linear vs. nonlinear genotype-phenotype relationships. bioRxiv. 2018;2018:503466.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref21] 21. Sverdlov S, Thompson E. Combinatorial Methods for Epistasis and Dominance. Journal of Computational Biology. 2017;24(4):267–79.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref22] 22. Ottman R. Gene-environment interaction: definitions and study designs. Prev Med. 1996;25(6):764–70. pmid:8936580
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref23] 23. Dick DM. Gene-environment interaction in psychological traits and disorders. Annu Rev Clin Psychol. 2011;7:383–409. pmid:21219196
View Article
PubMed/NCBI
Google Scholar

[86] View Article

[87] PubMed/NCBI

[88] Google Scholar

[ref24] 24. Barcellos SH, Carvalho LS, Turley P. Education can reduce health differences related to genetic risk of obesity. Proc Natl Acad Sci U S A. 2018;115(42):E9765–72. pmid:30279179
View Article
PubMed/NCBI
Google Scholar

[90] View Article

[91] PubMed/NCBI

[92] Google Scholar

[ref25] 25. Westerman KE, Sofer T. Many roads to a gene-environment interaction. Am J Hum Genet. 2024;111(4):626–35. pmid:38579668
View Article
PubMed/NCBI
Google Scholar

[94] View Article

[95] PubMed/NCBI

[96] Google Scholar

[ref26] 26. Sampford MR. Some Inequalities on Mill’s Ratio and Related Functions. Ann Math Statist. 1953;24(1):130–2.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref27] 27. Sadowski M, Dahl AW, Zaitlen N. Protocol to estimate the heritability of drug response with GxEMM and identify gene-drug interactions with TxEWAS. STAR Protoc. 2025;6(2):103780. pmid:40249708
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref28] 28. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–9. pmid:30305743
View Article
PubMed/NCBI
Google Scholar

[105] View Article

[106] PubMed/NCBI

[107] Google Scholar

[ref29] 29. Privé F, Aschard H, Carmi S, Folkersen L, Hoggart C, O’Reilly PF, et al. Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort. Am J Hum Genet. 2022;109(1):12–23. pmid:34995502
View Article
PubMed/NCBI
Google Scholar

[109] View Article

[110] PubMed/NCBI

[111] Google Scholar

[ref30] 30. Preiss D, Welsh P, Murphy SA, Ho JE, Waters DD, Demicco DA. Risk of incident diabetes with intensive-dose. JAMA. 2011;305(24):2556–64.
View Article
Google Scholar

[113] View Article

[114] Google Scholar

[ref31] 31. Collins R, Reith C, Emberson J, Armitage J, Baigent C, Blackwell L, et al. Interpretation of the evidence for the efficacy and safety of statin therapy. Lancet. 2016;388(10059):2532–61. pmid:27616593
View Article
PubMed/NCBI
Google Scholar

[116] View Article

[117] PubMed/NCBI

[118] Google Scholar

[ref32] 32. Sheppard B, Rappoport N, Loh P-R, Sanders SJ, Zaitlen N, Dahl A. A model and test for coordinated polygenic epistasis in complex traits. Proc Natl Acad Sci U S A. 2021;118(15):e1922305118. pmid:33833052
View Article
PubMed/NCBI
Google Scholar

[120] View Article

[121] PubMed/NCBI

[122] Google Scholar

[ref33] 33. Dudbridge F, Fletcher O. Gene-environment dependence creates spurious gene-environment interaction. Am J Hum Genet. 2014;95(3):301–7. pmid:25152454
View Article
PubMed/NCBI
Google Scholar

[124] View Article

[125] PubMed/NCBI

[126] Google Scholar

[ref34] 34. Dahl A, Nguyen K, Cai N, Gandal MJ, Flint J, Zaitlen N. A Robust Method Uncovers Significant Context-Specific Heritability in Diverse Complex Traits. Am J Hum Genet. 2020;106(1):71–91. pmid:31901249
View Article
PubMed/NCBI
Google Scholar

[128] View Article

[129] PubMed/NCBI

[130] Google Scholar

[ref35] 35. Almli LM, Duncan R, Feng H, Ghosh D, Binder EB, Bradley B, et al. Correcting systematic inflation in genetic association tests that consider interaction effects: application to a genome-wide association study of posttraumatic stress disorder. JAMA Psychiatry. 2014;71(12):1392–9. pmid:25354142
View Article
PubMed/NCBI
Google Scholar

[132] View Article

[133] PubMed/NCBI

[134] Google Scholar

[ref36] 36. Mefford J, Smullen M, Zhang F, Sadowski M, Border R, Dahl A, et al. Beyond predictive R2: Quantile regression and non-equivalence tests reveal complex relationships of traits and polygenic scores. Am J Hum Genet. 2025;112(6):1363–75. pmid:40480198
View Article
PubMed/NCBI
Google Scholar

[136] View Article

[137] PubMed/NCBI

[138] Google Scholar

[ref37] 37. Peterson CB, Bogomolov M, Benjamini Y, Sabatti C. TreeQTL: hierarchical error control for eQTL findings. Bioinformatics. 2016;32(16):2556–8. pmid:27153635
View Article
PubMed/NCBI
Google Scholar

[140] View Article

[141] PubMed/NCBI

[142] Google Scholar

[ref38] 38. Peterson CB, Bogomolov M, Benjamini Y, Sabatti C. Many Phenotypes Without Many False Discoveries: Error Controlling Strategies for Multitrait Association Studies. Genet Epidemiol. 2016;40(1):45–56. pmid:26626037
View Article
PubMed/NCBI
Google Scholar

[144] View Article

[145] PubMed/NCBI

[146] Google Scholar

Figures

Abstract

Author summary

Introduction

Results

Sign-consistent interaction property

Sketch of argument

Endogenous treatment effects

Non-convex transformations

Case study: the logistic function.

Case study: the inverse normal transformation.

Previously published interaction results exhibit sign consistency property

Discussion

Methods

Sign consistency of G × E effects under monotone convex transformations of the outcome

Sign consistency of G × E effects under a threshold-based model of endogenous treatment effects

Simulation details

TxEWAS in the UK Biobank

Supporting information

S1 Appendix. Derivation of the sign-consistent interaction property.

S1 File. Supporting Information.

S1 Fig. Sign consistency after square-transformation of a simulated outcome with randomly directed G × E effects (Methods).

S2 Fig. Sign consistency after inverse normal transformation of a simulated outcome with randomly directed G × E effects (Methods).

References