Interaction-based Mendelian randomization with measured and unmeasured gene-by-covariate interactions

Studies leveraging gene-environment (GxE) interactions within Mendelian randomization (MR) analyses have prompted the emergence of two similar methodologies: MR-GxE and MR-GENIUS. Such methods are attractive in allowing for pleiotropic bias to be corrected when using individual instruments. Specifically, MR-GxE requires an interaction to be explicitly identified, while MR-GENIUS does not. We critically examine the assumptions of MR-GxE and MR-GENIUS in the absence of a pre-defined covariate, and propose sensitivity analyses to evaluate their performance. Finally, we explore the effect of body mass index (BMI) upon systolic blood pressure (SBP) using data from the UK Biobank, finding evidence of a positive effect of BMI on SBP. We find both approaches share similar assumptions, though differences between the approaches lend themselves to differing research settings. Where a suitable gene-by-covariate interaction is observed MR-GxE can produce unbiased causal effect estimates. MR-GENIUS can circumvent the need to identify interactions, but as a consequence relies on either the MR-GxE assumptions holding globally, or additional information with respect to the distribution of pleiotropic effects in the absence of an explicitly defined interaction covariate.


Introduction
Mendelian randomization (MR) is an epidemiological approach applied to observational data, wherein genetic variants are used as instrumental variables (IVs) to estimate the effect of a modifiable exposure on a downstream outcome [1]. MR encompasses a wide range of statistical methods, and typically relies upon three assumptions to test for causality. A suitable genetic IV is strongly associated with the exposure of interest (IV1), independent of confounders of the exposure and outcome as well as confounders of the genetic IV and outcome (IV2), and independent of the outcome when conditioning on the exposure (IV3) [1,2]. Violation of assumptions IV2-3 can introduce bias into MR effect estimates, and as a consequence methods for identifying and correcting for such bias have formed a central theme within the MR methods literature [2][3][4]. In many cases, such methods focus upon correcting for bias resulting from associations between a genetic IV and an outcome which are unrelated to the exposure of interest, defined as horizontal pleiotropic pathways [5]. Pleiotropy robust methods frequently use heterogeneity in causal effect estimates across multiple genetic IVs as an indicator of horizontal pleiotropy, though such approaches are less feasible as the number of available genetic IVs decreases [6,7].
One solution to the problem of limited available genetic IVs is to leverage variation in instrument strength across one or more covariates within a target population, representing a gene-by-covariate interaction [8][9][10]. Intuitively, were it possible to identify a population subgroup for which a genetic IV and exposure are independent (i.e., a 'no-relevance group'), it follows that, in the absence of horizontal pleiotropy, the genetic IV and outcome should also be independent. A non-zero instrument-outcome association for such a group would therefore be indicative of pleiotropic bias [8,11,12]. It is, however, rare that no-relevance groups of sufficient size are observed in practice.
MR approaches utilising gene-by-covariate interactions, here referred to as interaction-MR, overcome this limitation by using statistical assumptions to extrapolate back to a hypothetical no-relevance group. Two such methods are MR using Gene-by-Environment interactions (MR-GxE) and MR G-Estimation under No Interaction with Unmeasured Selection (MR-GE-NIUS) [8,13]. MR-GxE uses an explicitly defined gene-by-covariate interaction to estimate causal effects, and has previously been framed within a summary-level data context [8]. In contrast, MR-GENIUS accommodates both observed and unobserved interactions, provided they induce a dependence between the genetic IV and exposure variance [13]. MR-GENIUS has the advantage of circumventing the need to explicitly identify gene-by-covariate interactions, though the relative strengths and limitations of the approach compared to MR-GxE have previously been unclear.
In this paper we outline the implementation of MR-GxE in the individual level data setting, and critically evaluate the performance of MR-GxE and MR-GENIUS. Specifically, we focus upon the application of MR-GENIUS in the absence of a pre-defined interaction covariate, which is not possible using MR-GxE. Through simulation we demonstrate how both approaches share similar underlying assumptions, and highlight how implicitly leveraging all potential gene-by-covariate interactions using MR-GENIUS can imply more stringent assumptions with respect to the distribution of pleiotropic effects. Throughout we also propose sensitivity analyses to test the assumptions of the MR-GxE. Finally, we conduct applied analyses using MR-GxE and MR-GENIUS to estimate the effect of body mass index (BMI) on systolic blood pressure (SBP). For both approaches we find evidence of a positive causal effect using data from the UK Biobank, comparing results to conventional MR and observational methods.

The data generating model
Interaction-MR approaches use differences in instrument strength across one or more covariates to estimate and correct horizontal pleiotropic bias [8]. For i 2 {1, 2, . . ., N} observations, let G i denote a single genetic IV for an exposure X i , and let Y i represent the outcome of interest. Further, assume there exists an unmeasured confounder U i of X i and Y i , and a set of interaction covariates Z i 2 {Z 1 , . . ., Z K } across which the instrument-exposure association varies. In order to make our ideas concrete, we now define an underlying data generating model for a continuous exposure and outcome, which are themselves a function of G i , Z i and U i .
In Eqs 1-4, the � (.i) terms represent independent error terms, and relationships with reference to a single interaction covariate Z ki are illustrated in Fig 1 wherein G i , Z ki , and U i are assumed independent for clarity.
subgroups of a genetic IV (G i ). In this paper we illustrate how both approaches are reliant upon three assumptions, summarised as assumptions GxE1-3 below. A suitable interaction (G i Z ki ) is: 1. Strongly associated with the exposure of interest (GxE1).
2. Independent of confounders of the exposure and outcome (GxE2).
3. Not directly associated with the outcome of interest (GxE3).
MR-GxE was originally implemented using an approach analogous to MR-Egger regression in two-sample summary MR [7,8]. Initially sets of instrument-exposure and instrument-outcome associations are obtained across strata of a pre-specified interaction covariate, after which the instrument-outcome associations are regressed upon the instrument-exposure associations including an intercept [8]. While in principle the approach can be performed using publicly available data from genome-wide association studies (GWAS), the summary level MR-GxE approach has two notable limitations. First, summary MR-GxE does not readily provide a means of evaluating interaction strength, relying on observed heterogeneity across gene-exposure associations across interaction covariate strata [8]. Second, ambiguities surrounding the optimal number of interaction covariate strata can have a substantial impact of effect estimates [8]. To address these issues, we propose an individual-level form of MR-GxE within a two-stage least squares (TSLS) framework.
Individual level MR-GxE is implemented by using a gene-by-covariate interaction as an instrument within a TSLS regression model. In the first-stage model (Eq 5), the exposure is regressed upon the genetic IV and observed interaction covariate including an interaction term (γ 3 ). The second-stage model (Eq 5) then regresses the outcome upon the genetic IV, interaction covariate, and fitted values for the exposure (X i ) obtained using the first-stage model.X This returns a causal effect estimate (b 1 ), as well as a horizontal pleiotropic effect estimate as the coefficient of the genetic IV (b 2 ) in the second-stage model. To define the MR-GxE estimand, a reduced form model for Y i given G i and Z ki incorporating Eqs 5 and 6 can initially be written as Using Eqs 5 and 6 the MR-GxE estimand is then defined as Note that a G i × Z ki term is omitted from the second-stage model given in Eq 6 due to its role as an instrument, whilst the inclusion of G i allows for estimation of a horizontal pleiotropic effect on the outcome, denoted by β 2 .
The MR-GENIUS approach is an adapted form of Robins' G-estimation which is robust to additive confounding and pleiotropic bias [13][14][15]. This essentially involves leveraging differences in the variance of a given exposure X i across subgroups of a genetic instrument G i , which are likely the consequence of one or more gene-by-covariate interactions. In the case of a binary instrument and exposure, and using notation from Eqs 1-4, the MR-GENIUS estimator can be written as:b where [13]. MR-GENIUS is implemented by first regressing X i upon G i and obtaining a set of residualŝ ∊ Xi . These residuals are then used to create an instrument ðG i À � GÞ∊ Xi which is incorporated within a TSLS model as a single instrument for X i [13]. Estimates ofb 1 remain unbiased, provided the instrument G i is associated with the exposure of interest, the effect does not change across values of the unmeasured confounders, and the MR-GENIUS model is identified such that the change in variance across levels of the instrument is non-zero [13].
In the binary exposure case, the MR-GENIUS model is identified when cov(G i , var(X i |G i )) 6 ¼ 0, and for a continuous exposure when the residual error � X is heteroskedastic, that is, not constant across levels of G i [13]. This can be evaluated using a Breusch-Pagan test for heteroskedasticity, and these conditions also restrict the degree of joint effect modifiers of both X i and Y i [13,16].
Importantly, it should be noted that the interaction covariate need not be explicitly identified using MR-GENIUS, illustrated by the absence of Z ki in Eq 9. However, identification of the MR-GENIUS model implicitly relies upon the presence of one or more gene-by-covariate interactions to induce the desired dependence between G i and var(X|G). In the absence of a predefined interaction covariate, MR-GENIUS estimates the total effect of X upon Y, without adjusting for the interaction covariate Z k . This contrasts with MR-GxE, which estimates the direct effect of X upon Y adjusting for the interaction covariate in the second stage model.

GxE1: Interaction strength
The MR-GxE estimator can be viewed as an extension of the Wald ratio, including an adjustment for the direct effects of G i and Z ki . Thus, in the special case where G i and Z ki are marginally independent of the exposure and outcome (but their interaction via a single covariate Z ki is not), the MR-GxE estimator simplifies to: From Eq 10 MR-GxE is clearly reliant upon a strong first-stage interaction, such that γ k3 6 ¼ 0 in order to make the denominator non-zero (GxE1). When individual-level data are available, the first-stage F-statistic for the gene-by-covariate interaction can be used to quantify instrument strength, though several aspects of this approach warrant consideration. First, when using a single interaction the F-statistic cannot be related to the magnitude of relative bias towards the observational estimate in a one-sample setting and null in a two-sample setting. This is because such a relationship between instrument strength and the direction of bias only holds when multiple instruments, in this case interactions, are used. Therefore, while an F-statistic of 10 may satisfy the standard threshold for sufficient instrument strength, it would not be possible to relate this to a 10% relative bias towards the observational estimate obtained by regressing the outcome on the exposure without incorporating additional interaction covariates. Second, interaction strength does not mitigate bias from violations of assumptions GxE2-3, just as is the case in conventional MR analyses. Finally, where possible candidate interactions should be identified in separate samples to avoid issues related to Winner's curse, where instruments, in this case interactions, are selected using spurious associations which may be sample population specific [17].
The reliance of MR-GxE upon explicitly defined interactions also invites two potential interaction-specific issues: scale dependency and non-linear interactions. First, as interactions are scale dependent it is possible that applying transformations can create spurious associations [18]. Such spurious associations can exist as an artefact of the data, and consequently estimates leveraging such information are unlikely to be reliable. Gene-by-covariate interactions may also be non-linear, which could potentially be considered by fitting more flexible models (e.g., fractional polynomial models, which include varying exponents with respect to G i Z ki ) to allow for non-linear interactions to be identified. It is, however, important to take care to avoid issues of over-fitting [19].
As MR-GENIUS does not require gene-by-covariate interactions to be identified, testing for identification is performed globally by evaluating heteroskedasticity with respect to the residuals � Xi . Specifically, MR-GENIUS relies upon the residual error in a regression of the exposure upon the genetic IV to be heteroskedastic, evaluated using a Breusch-Pagan test for heteroskedasticity [13].
As a means of identifying candidate gene-by-covariate interactions for MR-GxE we propose using the first-stage F-statistic for the interaction term in the first stage, in a similar fashion to utilising GWASs to identify genetic variants associated with a phenotype of interest. Interactions of sufficient strength can be identified by fitting the first-stage MR-GxE model for each candidate interaction covariate Z ki and calculating the F-statistic with respect to G i Z ki (see Eq 5) [10]. Applying a Bonferroni multiple testing correction, and plotting the −log 10 (p − value) for the F-statistic then allows for instrument strength to be effectively visualised using a scatter plot, following a similar intuition to the use of Manhattan plots in the presentation of results from GWAS [20]. Note that as it is often the case that multiple independent genetic variants are associated, it is often appropriate to use a polygenic risk score as an instrument to maximise instrument strength.

GxE2: Interaction exogeneity
In previous work we show how assumption GxE2 is potentially violated when certain confounding structures exist, specifically, where G i and Z ki are simultaneously downstream of a confounder U i or where there is an open path between the two variables through U i [8]. To briefly recapitulate how such associations can induce bias, consider the path diagram shown in Fig 2. In this case, the interaction covariate Z ki is independent of X i and Y i , and determined by a confounder U i . Further, U i is downstream associated with the genetic instrument G i .
In Fig 2U i serves not only as a confounder of X i and Y i , but also of Z i and Y i . As the MR-GxE model only instruments G i , it is likely estimates for the effect of Z i on Y i will exhibit bias. When G i is not independent of U i , however, the resulting induced association between Z i and Y i from failing to control for U i mimics a pleiotropic association, such that a pathway from G i to Y i is created through U i and Z i . Importantly, such associations do not necessarily bias estimates of the effect of X i on Y i , but can inflate type-I error rates when evaluating instrument validity.
Assumption GxE2 can also be violated when a gene-by-covariate interaction is simultaneously associated with the exposure and one or more confounders of the exposure and outcome, as depicted in Fig 3. In Fig 3 a bidirectional arrow is included to highlight that any direction of association between GZ k and U can potentially introduce bias into MR-GxE estimates. Where GZ k is upstream of U this can be viewed as instrument strength varying across levels of the confounder, with pleiotropic effects being associated with interaction strength. This issue can be viewed as analogous to the INstrument Strength Independent of Direct Effect (InSIDE) assumption in two-sample summary MR [7]. An association from U to GZ k would suggest that a three-way interaction may be present, such that interaction strength γ 3 k varies across levels of U. This can bias effect estimates by inducing an association between GZ k and Y, violating the constant pleiotropy assumption GxE3.

PLOS ONE
Interaction-based Mendelian randomization with measured and unmeasured gene-by-covariate interactions To understand how an association between GZ k and U can induce bias into MR-GxE estimates, we can extend the MR-GxE estimand (Eq 8) to incorporate violation of GxE2, by including covariance terms between U i and (Z k i, G i Z k i)U i , such that were it possible to include U i in the TSLS model, the resulting estimate could be written as: where each b � � indicates a multivariable regression estimate pertaining to the second subscript variable when regressed upon the first, including the unmeasured confounder U i . As it is not possible to directly measure and adjust for U i , the independence U i and G i Z ki is relied upon for Eq 11 to be equivalent to the MR-GxE estimator in Eq 8.
A further consideration is the introduction of collider bias when estimating fitted valuesX i in the first-stage MR-GxE model. As shown in Eq 5, it is necessary to include the interaction covariate in the first-stage model. However, in cases where G i and U i are both simultaneously upstream associated with Z ki , conditioning on Z ki will induce collider bias in the first-stage MR-GxE model, such that the estimate of pleiotropic effectb 2 and subsequent adjustment will be inaccurate. This case is illustrated in Fig 4. Relating assumption GxE2 to the MR-GENIUS approach, associations violating GxE2 would imply associations vary across values of the unmeasured confounders violating the second MR-GENIUS assumption [13]. However, this problem can be mitigated by incorporating additional interaction covariates within the MR-GENIUS model, as described in Eric Tchetgen et al., 2021 [13]. This would necessitate the inclusion of specific interaction covariates within the MR-GENIUS model, such that differences in the variance of X would be evaluated across subgroups of G i , conditional on one or more interaction covariates Z k .
For MR-GxE we present two strategies for addressing GxE2 violation. To evaluate the possibility of collider bias in the first-stage model estimating the correlation between G i and Z ki could serve as an initial test for GxE2 violation. Intuitively, if G i and Z ki are independent, then conditioning on Z ki would not induce an association between G and U. However, it is important to emphasise that independence cannot necessarily be interpreted as GxE2 being satisfied. This would primarily be the case where a three-way interaction exists between the instrument G i , interaction covariate Z ki , and one or more confounders U i . Rather than removing the possibility, an observed correlation between G i and Z ki can highlight a potential issue in the analysis which warrants further consideration.
A potentially more robust approach would be to adopt a genetic proxy variable for the interaction covariate Z ki , as this would share the same benefits with regard to causal direction as G i with respect to environmental confounders. For example, when estimating the effect of alcohol consumption on SBP using education as an interaction covariate, adopting a polygenic risk score (PRS) for education would in principle utilise the explained variation in education excluding environmental confounders such as socio-economic status.

GxE3: Constant pleiotropy
The third MR-GxE assumption requires pleiotropic effects of G i upon Y i to remain constant across values of Z ki , with the gene-by-covariate interaction being independent of Y i when conditioning on X i (i.e. β 4 = 0). Where this is not the case estimates of causal effect will exhibit bias in the direction of β 4 in a similar fashion to horizontal pleiotropic bias in univariate MR analyses, equal to: By reframing MR-GxE within a TSLS framework, it is possible to apply tests of over-identification to evaluate the constant pleiotropy assumption, though this is not possible where only one instrument is available, for example, a single genetic variant. In cases where the single instrument is comprised of many instruments, such as a PRS, it is possible to examine different configurations of instruments iteratively using MR-GxE and assess heterogeneity in the set of MR-GxE estimates obtained from each iteration. These subsets of instruments are hereafter referred to as sub-instruments.
In this scenario, a Sargan test can be used to compare different MR-GxE estimates of the same causal parameter (the coefficient of X i in Eq 6-i.e., β 1 ), assuming we have more instruments than we need to consistently estimate the parameter [21]. However, it is important to note that in applying this test it is crucial for each of the sub-instruments to be sufficiently strong to overcome weak instrument bias, though practically the test can be applied where weak interactions are present if assessing the strength of individual instruments of interest.
To illustrate how over-identification tests can be applied in the context of MR-GxE, consider an extension of Eqs 5 and 6 to include an arbitrary number of sub-instruments, wherein a single instrument G i is comprised of m 2 {1, 2, . . ., M} sub-instruments. Where G mi denotes the m th sub-instrument in G i , we can define a corresponding data generating model as: A Sargan test can be applied by fitting multiple sub-instruments G mi in the same TSLS model. Alternatively, a heterogeneity test such as Cochran's Q-statistic could be used to evaluate heterogeneity in MR-GxE effect estimates using all sets of non-overlapping subinstruments.

An illustration of assumptions GxE1-3 through simulation
To illustrate the importance of assumptions GxE1-3 with respect to MR-GxE and MR-GE-NIUS we present six simulation studies, categorised by assumption, using the data generating model presented in Eqs 1-4. Throughout, we demonstrate the utility of the sensitivity analyses proposed, and highlight the relative performance of both MR-GxE and MR-GENIUS. In each simulation gene-by-covariate first-stage effects (γ 3k ) are generated to be positive, to avoid the possibility that the combined effects of all candidate interactions have a mean of zero. This would potentially invalidate the MR-GENIUS approach in the unlikely event that leveraged candidate interactions have effects such that cov(G i , var(X i |G i )) � 0. Code for performing each simulation study and further information is available at https://github.com/WSpiller/GxE_ Simulation.
Simulation set 1: Interaction selection and strength. As an illustration of how gene-bycovariate interactions can be identified through the evaluation of their first-stage F-statistics, we generated 1, 000 independent data sets, containing 100, 000 observations for a single instrument G, exposure X, outcome Y, and 100 candidate interaction covariates Z k (Simulation 1). All variables were treated as continuous, with observations of exogenous variables randomly sampled from a normal distribution with mean 0 and standard deviation 1. Endogenous variables, determined by one or more additional covariates, were generated following the population models defined in Eqs 1-4, with error terms randomly sampled from a normal distribution with mean 0 and standard deviation 1. The effect of X upon Y was defined as β 1 = 1. Of the 100 interaction covariates, 10 were designated to have a non-zero first-stage interaction, assigning a value for γ 3k sampled from a normal distribution with mean 2 and standard deviation 2, ensuring that all coefficients for non-zero first stage interactions were greater than 1. The complete set of interaction covariates Z K were also generated so as to be independent of G i , such that π k1 = 0. Fig 5A shows how a scatter plot can be constructed in a similar fashion to a Manhattan plot in GWAS analyses. Each value on the scatter plot represents the mean −log 10 (p − value) value for the first-stage F-statistic corresponding to each candidate

PLOS ONE
Interaction-based Mendelian randomization with measured and unmeasured gene-by-covariate interactions interaction across the set of 1, 000 simulated data sets. A Bonferroni multiple testing correction is shown using a solid horizontal line.
In Fig 5A the 10 defined non-zero gene-by-covariate interactions have been identified, with super-imposed numbers indicating the identity of each interaction covariate Z k . The corresponding estimates for the 10 identified interactions show no evidence of apparent bias, with a mean MR-GxE estimate of 1.000 (95% CI = 0.996, 1.004) and a mean F-statistic of 3616.15 (p − value < 0.001). Individual mean estimates for each interaction are provided in the Supplementry material (see S1 Table). Using MR-GENIUS resulted in mean estimate of 1.000 (95% CI = 0.994, 1.006), producing an estimate comparable to MR-GxE without the need to explicitly identify an interaction covariate. Performing a Breusch-Pagan test for identification in the MR-GENIUS model yielded a mean value of 1041.74 (p − value < 0.001), suggesting MR-GE-NIUS estimates are sufficiently strong so as to overcome weak instrument bias.
To further investigate the impact of weak instrument bias using MR-GxE we perform additional simulations, evaluating the performance of MR-GxE using a single non-zero interaction covariate of varying strength (simulation 2). Specifically, first-stage F-statistics of approximately 1, 5, 10, 25, 50, and 100 are considered, generating 1, 000 data sets for each F-statistic value and presenting mean MR-GxE effect estimates and 95% confidence intervals. In each case the genetic instrument G is generated so as to satisfy assumption IV1 (γ 1 = 1), with a causal effect of X on Y again equal to 1 (β 1 = 1). Fig 5B shows a forest plot including the mean MR-GxE effect estimate and 95% confidence interval for each interaction covariate with a mean F-statistic as indicated on the y-axis. The precision of MR-GxE increases substantially as the mean F-statistic increases, and there does not appear to be evidence of directional bias using weak interactions.
In simulation 1 MR-GENIUS appears to perform well when many gene-by-covariate interactions are present, with the potential to outperform MR-GxE when individual stronger interactions are not observed (see S1 Table). To explore the extent to which MR-GENIUS is reliant on a global non-zero mean first-stage interaction, an additional simulation is conducted varying the proportion of non-zero interactions present in the data (simulation 3). A total of K = 100 interactions were generated, such that the number of non-zero interactions represent 1%, 5%, 10%, 50%, and 100% of all candidate interaction covariates. For each predefined proportion, 1,000 independent data sets were generated, using previous parameter definitions from simulation 1. MR-GxE effect estimates were obtained using a single randomly sampled non-zero interaction covariate, while MR-GENIUS estimates do not specify an observed interaction covariate. The mean MR-GxE and MR-GENIUS estimates for each proportion are presented in Table 1.
From Table 1 it appears the precision of MR-GENIUS estimates improves as the mean interaction strength across all leveraged instruments ( � g K ) increases in magnitude, indicated by the increase in mean F-statistic across the set of candidate interaction covariates. This suggests that in cases where few gene-by-covariate interactions of moderate strength are available,

PLOS ONE
Interaction-based Mendelian randomization with measured and unmeasured gene-by-covariate interactions MR-GxE can furnish more precise estimates than MR-GENIUS, though MR-GENIUS can outperform MR-GxE as the number of non-zero gene-by-covariate interactions increases. It is also important to highlight that, just as was the case for MR-GxE, weak instrument strength for MR-GENIUS does not appear to induce observed directional bias. This reliance of MR-GENIUS upon mean interaction strength across candidate interactions has two important implications. First, the precision of MR-GENIUS estimates is a function of both interaction strength and the number of candidate non-zero interactions present, and consequently it is possible for MR-GENIUS to outperform MR-GxE as the number of strong gene-by-covariate interactions increases. Second, it is possible that interactions of similar magnitude acting in opposite directions can counteract each other, such that cov(G i , var(X i |G i )) � 0. This scenario is unlikely to occur beyond a simulation setting, and motivates using a data generating model where first-stage interactions are generated so as to be in the same direction, to ensure � g K 6 ¼ 0.
Simulation set 2: Interaction exogeneity. The MR-GxE approach relies upon the selected gene-by-covariate interaction being independent of all confounders U of the exposure X and outcome Y. As previously discussed, this is most likely to be the case where either the association between G and U varies across levels of Z k , or the association between Z k and U varies across levels of G. To demonstrate how such associations can introduce bias into both MR-GxE and MR-GENIUS effect estimates, we present a simulation with a similar structure to simulation 3, in this case varying the proportion of interactions for which θ 3 k 6 ¼ 0 (simulation 4). Specifically, proportions of 0%, 1%, 5%, 10%, 50%, and 99% are considered for which θ 3 k = 1, generating 1, 000 independent data sets for each proportion. For each data set, the mean MR-GENIUS and MR-GxE estimates were obtained across all interaction covariates. Additionally, a mean estimate using a randomly sampled interaction for which θ 3 k 6 ¼ 0 is also presented, to illustrate how assumption GxE2 is interaction covariate specific for MR-GxE. The simulation results are given in Table 2.
From Table 2 both MR-GxE and MR-GENIUS exhibit bias in the direction of the interaction coefficient θ 3k . MR-GENIUS appears to be more robust to GxE2 violation compared to MR-GxE, though estimates decrease in precision as the magnitude of θ 3k increases. When selecting a single interaction covariate for which θ 3k = 0, MR-GxE provides an unbiased causal effect estimate, in contrast to MR-GENIUS. This can be explained by MR-GENIUS implicitly relying upon � y K � 0 when an interaction covariate is not specified. As a consequence, MR-GxE appears to be capable of producing results with markedly less bias using a single GxE2 satisfying interaction, compared to MR-GENIUS where � y K ¼ 0. Simulation set 3: Constant pleiotropy. To demonstrate the impact of GxE3 violation, as well as the utility of employing an adapted Sargan test as a sensitivity analysis, we present a two

PLOS ONE
Interaction-based Mendelian randomization with measured and unmeasured gene-by-covariate interactions simulated examples. Initially, we generated data as in simulations 3-4, instead varying the proportion of candidate interactions with non-zero second stage interactions (β 4k 6 ¼ 0) (simulation 5). The first-stage interaction coefficient across all interaction was set to γ 3k = 1, with designated invalid interactions having a second stage interaction(β 4k = 1). For each proportion the mean MR-GENIUS estimate was obtained, as well as the mean MR-GxE estimate across all candidate interactions. In addition, a mean MR-GxE estimate using a single randomly sampled interaction for which β 4 = 0 is also provided, with results presented in Table 3.
In this scenario it can be seen that MR-GxE and MR-GENIUS exhibit bias in the direction of GxE2 violation, while utilising a single interaction for which GxE3 is satisfied provides unbiased estimates. This would suggest that MR-GENIUS is reliant upon � b 4k ¼ 0 across all implicitly leveraged interactions.
To further demonstrate the impact of GxE3 violation, as well as the utility of employing an adapted Sargan test as a sensitivity analysis, we present a further simulation shown in Table 4 (simulation 6). In this case, a score analogous to a PRS was used as a single IV, comprised of 1, 000 individual sub-instruments of approximately equal strength. Mirroring the previous simulated example, the true causal effect was defined as β 1 = 1 with a horizontal pleiotropic effect β 2 = 0.05. Sub-instruments violating assumption GxE3 were estimated to have a value β 4 = 0.2, varying the proportion of invalid sub-instruments. The mean F-statistic across all iterations simulations was 98.80 (Breusch-Pagan 31.80, p − value = 0.013), and MR-GENIUS estimates are presented for comparison.
As shown in Table 4, both MR-GxE and MR-GENIUS produce biased causal effect estimates when the constant pleiotropy assumption is violated. Violation of the constant pleiotropy assumption is also detected by applying a Sargan test, provided all sub-instruments do

Estimating the effect of adiposity on systolic blood pressure within the UK Biobank
To demonstrate each of the sensitivity analyses previously described, we performed MR analyses estimating the causal effect of adiposity (measured using BMI) on SBP using data from the UK Biobank. The UK Biobank obtained written consent from all participants, and received ethical approval from the Research Ethics Committee (REC reference for UK Biobank is 11/ NW/0382). This analysis was approved by the UK Biobank access committee as part of project 8786. Consent was sought by UK Biobank as part of the recruitment process. This serves as a re-examination of the original applied example in Spiller et al. (2019) who first proposed the MR-GxE model [8]. The UK Biobank has approval from the North West Multi-centre Research Ethics Committee (MREC) as a Research Tissue Bank (RTB) approval, and consequently separate ethical clearance was not required for this project which was conducted under the RTB approval. In this study we evaluate each underlying assumption using the diagnostic tools described above, and contrasting the results with MR-GENIUS [8,13]. After performing quality control, removing participants with missing data, and restricting the sample to unrelated individuals of European ancestry, a total of 358, 928 participants were included in the analyses. MR-GxE was implemented by constructing a weighted PRS informed using genetic variants previously identified from the GIANT consortium [22]. As the GIANT consortium represents a subset of the most recent UK Biobank release, subsequent analyses have been conducted in a one-sample framework. A total of 95 independent genetic variants were used after performing linkage disequilibrium (LD) pruning, and removing tri-allelic or palindromic variants. Finally, we standardized BMI, SBP, and the weighted PRS using a z-score transformation prior to performing analyses. In previous work we found evidence of a positive association between BMI and SBP using OLS and TSLS regression approaches [8,[23][24][25].
Initially, a discovery subset (N = 100, 000) was randomly sampled from the UK Biobank data for use in identifying interactions for MR-GxE analyses. Causal effect estimates and sensitivity analyses were performed using the remaining data. Candidate gene-by-covariate interactions were detected by estimating the first-stage F-statistic for 576 candidate interaction covariates within the UK Biobank. After applying a multiple testing correction, the 20 interaction covariates with the strongest association were selected and utilised in subsequent analyses. Table 5 shows MR-GxE estimates of causal effect and corresponding sensitivity analyses with respect to each interaction covariate. The strength of each interaction across the set of candidate interaction covariates is illustrated in Fig 6, where annotations give the UK Biobank field ID for each interaction covariate.
To assess assumption GxE3, we created 9 sub-instruments sampling from the 95 SNPs used to create the initial PRS instrument. Fitting the MR-GxE model using multiple sub-instruments allows for over identification tests to be performed, testing the extent to which causal effect estimates differ when using individual sub-instruments. In each case, a failure to reject the null can be considered to be evidence of interaction exogeneity as previously outlined. To implement this approach, the set of SNPs were randomly assorted into 9 sub-instruments of approximately equal strength, quantified using the F-statistic with respect to BMI. Repeating this procedure using sub-instruments containing differing SNPs yielded similar results. We also present the mean F-statistic across the set of sub-instruments to emphasise their strength. As shown in Table 5, there exists substantial disagreement across the range of selected interaction covariates, suggesting that one or more violate underlying assumptions of the MR-GxE approach.
Considering assumption GxE2, several of the identified gene-by-covariate interactions are proxy measures of adiposity, specifically waist circumference, weight in kilograms, and comparative body size at age 10. Such interaction covariates are often problematic, as associations between the genetic variants and the interaction can result in collider bias where the interaction covariate is downstream of the exposure (see Materials and methods). In this case, higher estimates of ρ(G, Z k ) for these variables supports this interpretation, and their subsequent exclusion from further analyses. A similar argument can also be made with respect to interaction covariates downstream of BMI, including diabetes diagnosis, vascular/heart problem diagnosis, and diastolic blood pressure (DBP).
By applying Sargan tests, a number of interaction covariates related to cognition, physical activity, and age appear to violate assumption GxE3. This could be explained by the gene-bycovariate interactions relating to one or more underlying risk factors, which are not adjusted for in the corresponding MR-GxE models.
After applying sensitivity analyses, three interaction covariates can be identified as appropriate choices for estimation using MR-GxE. This selection was made using Sargan test and correlation p-value thresholds of p-value<0.0025, applying a multiple testing correction. Selected covariates include alcohol intake frequency and physical activity, both days walked and moderate levels of exercise. Considering alcohol intake and physical activity, the lack of a substantial correlation between each interaction covariate and the PRS suggests that violation of GxE2 is unlikely.
In previous work Townsend deprivation index (TDI) was selected as an interaction covariate in a summary MR-GxE analysis and returned estimates in agreement with both alcohol consumption and physical activity measures identified above. However, it is important to note that TDI shows evidence of a non-zero instrument-interaction covariate correlation, potentially highlighting a violation of assumption GxE2. This can be explained by TDI being plausibly downstream of both BMI and the instrument, representing situation in which the correlation does not invalidate estimates of causal effect.
Crucially, adopting alcohol and physical activity as interaction covariates yields causal effect estimates which appear biologically plausible, and support evidence from both observational and MR studies suggesting a positive association between BMI and SBP. Estimates using each interaction covariate are presented in Fig 7. As a final analysis, we implemented MR-GENIUS using the PRS, BMI, and SBP measures from UK Biobank. This resulted in a more precise estimate in comparison to MR-GxE, however, the effect estimate appears to strongly disagree with evidence from MR-GxE and alternate approaches. Given MR-GENIUS implicitly relies upon analogous assumptions to MR-GxE, it seems reasonable to assume that such a discrepancy could arise from bias due to violations stemming from one or more unmeasured interactions. This is further supported by MR-GxE estimates of similar direction and magnitude which appear to show evidence of bias, such as vigorous physical activity which shows evidence of GxE3 violation.

Discussion
In this paper we examine two related interaction-based MR approaches: MR-GxE and MR-GENIUS. Both MR-GxE and MR-GENIUS rely upon similar underlying assumptions,  Table 5. Observation f.4253.0.5 has been omitted for clarity. Red points indicate analyses for which assumptions may likely be violated, while blue points show potentially valid interaction covariates using accompanying sensitivity analyses. Observational, two-stage least squares (TSLS), and MR-GENIUS estimates are also shown as black points.
https://doi.org/10.1371/journal.pone.0271933.g007 whilst differing based on whether a gene-by-covariate interaction needs to be explicitly incorporated within the estimation model. Specifically, MR-GxE relies upon at least a single measured gene-by-covariate interaction which satisfies assumptions GxE 1-3, whilst MR-GENIUS does not require such an interaction to be observed. However, as a consequence of implicitly leveraging multiple underlying interactions, the MR-GENIUS approach requires assumptions GxE 1-3 to hold globally. Essentially, stronger assumptions are required to mitigate the absence of an observed gene-by-covariate interaction. It should be emphasised however, that evaluation of MR-GENIUS in this paper does not consider the inclusion of observed gene-bycovariate interactions.
Through an examination of the MR-GxE assumptions, several approaches aiming to evaluate assumptions GxE 1-3 have been outlined. Interaction strength (GxE1) can be evaluated using the first-stage F-statistic for the interaction term, analogous to evaluating instrument strength in conventional MR. The corresponding global test for interaction strength using MR-GENIUS and a continuous exposure is the Breusch-Pagan test for heteroskedasticity [13][14][15][16].
Assumption GxE2 can initially be evaluated by estimating the correlation between Z i and both G i and X i respectively. Where Z i is observed to be correlated with G i , it is possible that a confounding relationship exists violating assumption GxE2. Further, the simultaneous association of Z i with G i and X i can result in bias where Z i is downstream of X i . However, as the existence of such correlations does not necessarily imply that this assumption is violated, a more promising approach may be to adopt an interaction covariate Z i which is highly likely to be exogenous (see Materials and methods). For example, one could employ genetic variants which instrument a likely interaction covariate. Future work will explore this possibility.
The constant pleiotropy assumption (GxE3) can be tested in cases where the initial instrument G i is a composite instrument, that is, comprised of multiple sub-instruments such as genetic variants within a PRS. Heterogeneity in effect estimates obtained using sub-instruments can be considered as evidence of violation of the constant pleiotropy assumption, analogous to heterogeneity in two-sample summary MR [7,26]. In principle, a similar approach can be applied using sub instruments with MR-GENIUS, though such an examination is beyond the scope of this paper. A summary of the MR-GxE assumptions and proposed tests is given in Table 6.
In the applied analysis the effect of BMI on SBP was estimated using MR-GENIUS and a range of interaction covariates in conjunction with MR-GxE. We identified three suitable interaction covariates, which suggest a positive effect of BMI upon SBP in agreement with previous observational and MR analyses. Importantly, we highlight interaction covariates which Table 6. A summary of the MR-GxE assumptions and proposed sensitivity analyses.

Assumption
Description Consequence of violation Tool to assess plausibility

Interaction strength (GxE1)
An observed gene-by-covariate interaction GZ should be selected, such that the association between the instrument G and exposure X varies across levels of Z Insufficient precision to detect causal effects and directional bias when multiple interactions are used.
Estimating the first stage F-statistic for GZ and adopting an interaction covariate such that F � 10.
Interaction exogeneity (GxE2) The gene-by-covariate interaction GZ should be independent of confounders of the exposure X and outcome Y.
Inflated type-I error rates when evaluating instrument validity and biased effect estimates for the effect of X on Y.
Estimating the association between the instrument G and interaction covariate Z, selecting an interaction such that G and Z are independent.
Constant pleiotropy (GxE3) The direct effect of an instrument G on the outcome Y should remain constant across levels of the interaction covariate Z.
Estimates of the effect of the exposure X on Y will be biased in the direction of the effect of GZ on Y.
Using a Sargan test when sub-instruments can be constructed from a composite instrument G. https://doi.org/10.1371/journal.pone.0271933.t006 violate the MR-GxE assumptions and link these issues to the possibly biased effect estimates obtained using MR-GENIUS. Several limitations remain with respect to MR-GxE which warrant further explanation. Firstly, reliance upon an observed gene-by-covariate interaction limits the extent to which the method can be applied in contrast to MR-GENIUS. We advocate the use of MR-GENIUS in cases where no interaction covariate is available, though care needs to be taken in justifying the more stringent assumptions MR-GENIUS entails if an interaction covariate is not specified. Second, evaluating GxE2 using the correlation of between Z i and G i does not provide a clear indication of whether the assumptions hold. It is possible that GxE2 can be violated when Z i and G i appear to be independent, and assuming the direction of effect between Z i and X i relies upon a priori knowledge regarding the direction of association. It is therefore critical to identify plausible biological mechanisms underpinning the observed relationships in the MR-GxE model.
Finally, whilst an overidentification test has been presented for evaluating GxE3, there is not at present a method aiming to correct for violation of the constant pleiotropy assumption. It is likely that pleiotropy robust methods, such as median or modal regression, could be utilised to correct for resulting bias, and the application of such methods will be fully explored in future work.

Conclusion
MR-GxE and MR-GENIUS are two interaction-based MR approaches which leverage geneby-covariate interactions to estimate causal associations, while correcting for instrument invalidity. MR-GxE can be adapted to the individual level data setting and allows for the underlying assumptions of the approach to be tested provided a gene-by-covariate interaction is explicitly identified. In contrast, MR-GENIUS does not require such an interaction to be identified, but instead relies upon a more stringent set of assumptions analogous to MR-GxE. The use of each method should therefore reflect the specific research questions considered, as each approach is especially suited to particular research contexts. However, it is essential that the strengths and limitations of each approach are given sufficient consideration prior to their application.
Supporting information S1 Table. Simulated results and effect estimates for subset of interaction (denoted Z) identified from Fig 5A (simulation 1). (PDF)