A Comparison of Three Methods of Mendelian Randomization when the Genetic Instrument, the Risk Factor and the Outcome Are All Binary

The method of instrumental variable (referred to as Mendelian randomization when the instrument is a genetic variant) has been initially developed to infer on a causal effect of a risk factor on some outcome of interest in a linear model. Adapting this method to nonlinear models, however, is known to be problematic. In this paper, we consider the simple case when the genetic instrument, the risk factor, and the outcome are all binary. We compare via simulations the usual two-stages estimate of a causal odds-ratio and its adjusted version with a recently proposed estimate in the context of a clinical trial with noncompliance. In contrast to the former two, we confirm that the latter is (under some conditions) a valid estimate of a causal odds-ratio defined in the subpopulation of compliers, and we propose its use in the context of Mendelian randomization. By analogy with a clinical trial with noncompliance, compliers are those individuals for whom the presence/absence of the risk factor X is determined by the presence/absence of the genetic variant Z (i.e., for whom we would observe X = Z whatever the alleles randomly received at conception). We also recall and illustrate the huge variability of instrumental variable estimates when the instrument is weak (i.e., with a low percentage of compliers, as is typically the case with genetic instruments for which this proportion is frequently smaller than 10%) where the inter-quartile range of our simulated estimates was up to 18 times higher compared to a conventional (e.g., intention-to-treat) approach. We thus conclude that the need to find stronger instruments is probably as important as the need to develop a methodology allowing to consistently estimate a causal odds-ratio.


Introduction
The method of instrumental variable has been introduced nearly one century ago in econometrics [1]. It can be used for estimating a causal effect of a risk factor (predictor, phenotype) X on some outcome Y in observational studies in epidemiology, where unknown and unmeasured confounding effects U are often unavoidable. It can also be used to correct for noncompliance in clinical trials [3]. The method uses an ''instrument'' Z which needs to be (i) correlated with X; (ii) independent from U; and (iii) conditionally independent from Y given X and U [3,4], as illustrated in Figure 1. In general, conditions (ii) and (iii) are the problematic ones, since they cannot be verified from the data and should be justified based on subject-matter knowledge. Examples of instruments are the random group assignment in a clinical trial, or a genetic variant associated to the risk factor of interest in an observational study. In the latter case, the method of instrumental variable is often referred to as Mendelian randomization [5].
The method of instrumental variable has been devised to provide a consistent estimate of a causal effect of X on Y when the relationship is linear, and thus typically applies when the outcome is continuous. It also applies to a binary outcome if the causal effect can be expressed as a risk difference. For a binary outcome, however, a relationship is usually described via an odds-ratio, not a risk difference. Some adaptation of the method of instrumental variable have been proposed to estimate a causal odds-ratio, such as the downloadable qvf function [6] implemented in Stata (Stata Corp, College Station, Texas), or its adjusted version proposed by Nagelkerke et al. [7] and by Palmer et al. [8]. However, it is not yet totally clear in which situations and to which extent these adaptations are valid. In their review, Bochud and Rousson [9] identified 37 observational studies which have used the method of Mendelian randomization between 2004 and 2010, where 23 (i.e. about 60%) considered a binary outcome. They concluded their review stating that ''Considering the clear interest for epidemiologists to apply this concept for dichotomous outcomes such as diseases, it would be important, and even urgent, to clarify the issues of the validity of the instrumental variable approach in this context''. Some recent clarification in this regard have been made in Didelez, Meng and Sheehan [10], in Vansteelandt et al. [11] and in Palmer et al. [12].
One conclusion of Palmer et al. [12] was that the above adaptations of the method of instrumental variable should not be used for estimating a causal odds-ratio when Z, X and Y are all binary. However, another estimate of a causal odds-ratio, which also uses an instrumental variable, has recently been proposed by Lui and Chang [13] in the context of a clinical trial with noncompliance. In the present paper, we compare via simulations the usual qvf method and its adjusted version with the method of Lui and Chang [13], confirming that the latter method provides an approximately unbiased estimate of a causal odds-ratio defined in the subpopulation of ''compliers'', while illustrating the bias of the former two methods. Thus, we suggest to use the latter method rather than the former two methods in the context of a Mendelian randomization when the genetic instrument Z, the risk factor X and the outcome Y are all binary, and we illustrate its use with an applied example.
The paper is organized as follows. In the Methods section, we recall how the method of instrumental variable can be used to estimate a causal risk difference, and we present the qvf method and its adjusted version for estimating a causal odds-ratio, as well as the method of Lui and Chang [13] which is derived in some details. Our simulations are presented in the Results section, where we also give an example and provide further comparison with other possible estimates of a causal odds-ratio, in particular the logistic structural mean model estimate proposed by Vansteelandt and Goetghebeur [14]. Some concluding remarks take place in the Discussion section.

Estimating a Causal Risk Difference
In this subsection, we recall how it is possible to estimate a causal risk difference using the method of instrumental variable. Let X be a risk factor, Y an outcome, and Z an instrument satisfying the conditions (i), (ii) and (iii) outlined in the Introduction. In this paper, we consider the case where X, Y and Z are all binary (with possible values 0 and 1). Although we shall later switch to the problem of Mendelian randomization, we first consider the case of a randomized clinical trial comparing two treatments with respect to a binary outcome (for which the method of Lui and Chang [13] has been originally derived). There Z denotes the random group assignment, while X denotes the treatment which is actually taken by the participants. The variables X and Z may differ for some individuals if noncompliance occurs. In what follows, we consider a sample of n individuals, where n ijk denotes the number of individuals with Z~i, X~j and Y~k for i,j,k~0,1. We thus have the situation presented in Table 1.
We first review in this subsection how it is possible to estimate the causal effect of the treatment X on the outcome Y defined via a risk difference. A naive estimate, in what follows the ''as-treated'' estimate, would simply compare the empirical proportions of Y~1 in the groups X~1 and X~0 as follows: b d d AT~n 011 zn 111 n 010 zn 110 zn 011 zn 111 { n 001 zn 101 n 000 zn 100 zn 001 zn 101 : On the other hand, another well-known estimate, the ''intention-to-treat'' estimate, compares the empirical proportions of Y~1 in the groups Z~1 and Z~0 as follows: b d d ITT~n 101 zn 111 n 100 zn 110 zn 101 zn 111 { n 001 zn 011 n 000 zn 010 zn 001 zn 011 : Note that b d d AT and b d d ITT can also be defined as the estimated slopes in a linear regression of Y on X, respectively of Y on Z. On the other hand, the ''instrumental variable'' estimate is a ratio of two slope estimates, from a linear regression of Y on Z and from a linear regression of X on Z. The numerator is therefore the intention-to-treat estimate, while the denominator compares the empirical proportion of X~1 in the groups Z~1 and Z~0. The instrumental variable estimate is thus given by b d d IV~b d d ITT n 110 zn 111 n 100 zn 110 zn 101 zn 111 { n 010 zn 011 n 000 zn 010 zn 001 zn 011

:
To see which population parameter is hence estimated, we shall distinguish among four categories of individuals, as done in Angrist, Imbens and Rubin [15]. The ''compliers'' are those individuals for whom a random assignment Z~0 would imply X~0 and a random assignment Z~1 would imply X~1. Noncompliers include the ''always-takers'' (for whom X~1 whatever the value of Z), the ''never-takers'' (for whom X~0 whatever the value of Z) and the ''defiers'' (for whom Z~0 would imply X~1 and Z~1 would imply X~0). Given the data of a clinical trial, however, it is not possible to tell which of these four categories an individual belongs to, since one cannot infer from the data what he/she would have done if he/she would have been assigned to the other group. In the absence of noncompliance, the three above estimates are identical. When noncompliance occurs, however, these estimates usually differ from each other and converge (as sample size increases) towards different population parameters. Let v C , v A , v N and v D be the proportion of compliers, always-takers, never-takers and defiers in the target population (such that v C zv A zv N zv D~1 ). Let p 0C , p 0A , p 0N and p 0D be the  proportions of Y~1 for compliers, always-takers, never-takers and defiers in the group Z~0, and similarly let p 1C , p 1A , p 1N and p 1D be the proportions of Y~1 in the group Z~1. Finally, let r be the proportion of individuals with Z~1 (e.g., r~0:5 in a clinical trial comparing two groups of equal size). Using condition (ii) in the Introduction section, one can easily derive the following results (in a similar spirit as was done e.g. in Greenland [2]): NThe estimate b d d AT converges towards population parameter We here compare a group formed of compliers, always-takers and defiers with a group formed of compliers, never-takers and defiers. Since always-takers may have quite different characteristics than never-takers, it is not obvious to provide this parameter with any useful interpretation.
NThe estimate b d d ITT converges towards population parameter This parameter can be interpreted as the average causal effect of Z on Y which can be interesting to assess the effect of a public health policy, noncompliance in the sample mimicking the fact that not every person in the target population will strictly follow the official recommendations, for example.
NThe estimate b d d IV converges towards population parameter To calculate the denominator, note that the empirical proportion of X~1 in the group Z~1 is an estimate of v C zv A , while the empirical proportion of X~1 in the group Z~0 is an estimate of v D zv A , the difference being hence v C {v D .
To get a causal interpretation for the latter estimate, note first that condition (iii) in the Introduction section implies p 1A~p0A and p 1N~p0N (the outcome Y for always-takers and never-takers is not influenced in any respect by the value of Z), whereas condition (i) ensures that its denominator does not converge towards zero. One may then make the additional assumption that (A1) there are no defiers ( v D~0 ): In the terminology of Angrist, Imbens and Rubin [15], (A1) is the ''monotonicity assumption''. Using this additional assumption, the estimate b d d IV converges thus towards population parameter which can be interpreted as the average causal effect of X on Y in the subpopulation of compliers [15].
In the context of a Mendelian randomization, the instrument Z will be a genetic variant associated to a risk factor X, and the causal parameter d which is estimated using the method of instrumental variable can be interpreted as the risk difference that one would get if one could intervene and change the risk factor X from 0 to 1 in the subpopulation of compliers. By analogy with a clinical trial with noncompliance, a complier is an individual for whom the presence/absence of the risk factor X is determined by the presence/absence of the genetic variant Z, i.e. for whom we would observe X~Z whatever the alleles randomly received at conception. This definition of a complier actually refers to a causal link between Z and X and we shall also make this assumption in what follows.

Estimating a Causal Odds-ratio
It is however more common to define the effect of a binary risk factor X on a binary outcome Y as an odds-ratio rather than a risk difference. Restricting our attention to the subpopulation of compliers, the parameter of interest would thus become Again, in the context of a Mendelian randomization approach, this causal parameter y can be interpreted as the odds-ratio which one would get if one could intervene and change X from 0 to 1 in the subpopulation of compliers. The naive, or as-treated, estimate of y could be expressed as the odds of having Y~1 in the group X~1 divided by the odds of having Y~1 in the group X~0, yielding b y y AT~( n 011 zn 111 )(n 000 zn 100 ) (n 001 zn 101 )(n 010 zn 110 ) , while the intention-to-treat could be expressed as the odds of having Y~1 in the group Z~1 divided by the odds of having Y~1 in the group Z~0, yielding b y y ITT~( n 101 zn 111 )(n 000 zn 010 ) (n 001 zn 011 )(n 100 zn 110 ) : In general, both estimates are not consistent for y if noncompliance occurs. On the other hand, estimating the parameter y with the classical method of instrumental variable is not obvious. In the qvf function of Stata, one estimates log (y) as the ratio of two slope estimates, from a logistic regression of Y on Z and from a linear regression of X on Z, yielding Alternatively, log ( b y y IV ) is the estimated slope in a ''second stage'' logistic regression of Y on , where represents the fitted values calculated from a ''first stage'' linear regression of X on Z, that is~(n 110 zn 111 )=(n 100 zn 110 zn 101 zn 111 ) for those individuals with Z~1 and~(n 010 zn 011 )=(n 000 zn 010 zn 001 zn 011 ) for those individuals with Z~0. Thus, the b y y IV estimate is sometimes referred to as a two-stages estimate. Nagelkerke et al. [7], as well as Palmer et al. [8], proposed to improve this estimate by considering in the second stage a logistic regression with Y as the response and with two explanatory variables: and R~X { (or equivalently, X and R). The estimated slope associated to (respectively to X) in this second stage regression is another estimate of log (y)9, yielding by exponentiation the adjusted instrumental variable estimate, in what follows b y y ADJ . In the econometrics literature, this estimate is known as the ''control function estimate''. Note that Palmer et al. [8] considered this estimate with a continuous X in the context of Mendelian randomization. Nagelkerke et al. [7] used this estimate b y y ADJ with a binary X in the context of a clinical trial with noncompliance, and interpreted it as an estimate of a causal oddsratio in the subpopulation of the compliers, i.e. as an estimate of y.
There is however another method to estimate this causal oddsratio y, as recently proposed by Lui and Chang [13] and as explained in what follows. While it is not possible to know for each person who is a complier, an always-taker or a never-taker, note that the n 00~n000 zn 001 individuals in the first column from Table 1 include compliers and never-takers, the n 01~n010 zn 011 individuals in the second column are always-takers, the n 10~n100 zn 101 individuals in the third column are never-takers, and the n 11~n110 zn 111 individuals in the last column include compliers and always-takers (recall that we assume no defiers). Since Z is an instrumental variable (which is independent from all possible confounding variables), we expect the same proportions of compliers, always-takers and never-takers in both groups (Z~0 and Z~1). In the group Z~0, it is hence possible to estimate v A without bias using b v v A~n01 =(n 00 zn 01 ). In the group Z~1, it is possible to estimate v N without bias using b v v N~n10 =(n 10 zn 11 ).
This implies Similarly, let p 0CN be the proportion of Y~1 expected in the first column (Z~X~0) which is equal to This implies It is then possible to estimate without bias p 1CA and p 0CN using the data from the last and the first column, respectively yielding b p p 1CA~n111 =n 11 and b p p 0CN~n001 =n 00 . It is also possible to estimate without bias p 1A and p 0N using the data from the second and the third columns, respectively yielding b p p 1A~n011 =n 01 and b p p 0N~n101 =n 10 . It follows that consistent estimates of p 1C and p 0C are given by which can also be expressed as b y y LC (n 00 zn 01 )n 111 {(n 10 zn 11 )n 011 f g (n 10 zn 11 )n 000 {(n 00 zn 01 )n 100 f g (n 00 zn 01 )n 110 {(n 10 zn 11 )n 010 f g (n 10 zn 11 )n 001 {(n 00 zn 01 )n 101 f g : This estimate has been proposed by Lui and Chang [13], without providing all the details about the intermediate estimates given here, which also provide useful information as illustrated in our example below. In the special case where noncompliance occurs only in one group, this estimate coincides with the estimate proposed by Sommer and Zeger [16]. In a more general context involving a multinomial outcome, Baker [17] showed that the estimates b p p 1C and b p p 0C above are the maximum likelihood estimates of p 1C and p 0C if they are lying between 0 and 1.

Simulations
In this subsection, we present the results of simulations which were run to assess the performance of b y y IV , b y y ADJ and b y y LC as estimates of the causal parameter y above. Estimates b y y AT and b y y ITT were also included in the comparison. In our simulation design, we considered all possible combinations of the following five factors: 1. Proportion of compliers (low, middle, high) ? The proportion of compliers was set to v C~0 :1,0:5 or 0.9, while we took equal proportions of always-takers and never-takers, i.e. v A~vN~( 1{v C )=2.
For each of these 72 possible combinations of levels, we repeated 2000 simulations. In each replication, the five estimates have been calculated. All simulations were performed using R (version 2.11.1) [18].
We encountered the following technical problems when calculating b y y LC . When the true proportion of compliers was low (v C~0 :1), the estimated proportion of compliers b v v C was smaller than zero in 7.7% of the replications when n~200 (the problem never happened when n~2000). In those cases, we set b y y LC~1 since no effect can be estimated without compliers. Another technical problem was that b p p 0C and b p p 1C were sometimes outside the range of possible values (0.1). In those cases, values smaller than 0 were set to 0 and values larger than 1 were set to 1, yielding estimated odds-ratios of zero or plus infinity. This situation arose in about 60% of the replications when v C~0 :19 and n~200. This problem remained in about 15% of the replications when increasing the sample size to n~2000, but disappeared when increasing the proportion of compliers to v C~0 :5.
In each replication, the F-statistic from the first stage regression in the instrumental variable approach (the linear regression of X on Z) was also calculated. According to Stock, Wright and Yogo [19], a value of F v10 suggests a weak instrument, for which the validity of the inference is not guaranteed. For v C~0 :1 and n~200, this happened in 95% of the replications. Since the method of instrumental variable is not valid in that setting, we shall not present those results in what follows, leaving us with 60 combinations of levels (this also removed most of the technical problems mentioned above). Note that we still had a value of F v10 in about 10% of the replications for v C~0 :1 and n~2000, but these replications were kept to avoid a possible selection bias, as explained in Burgess and Thompson [20]. Results are shown in Table 2. To get a robust estimate of the bias and to cope with the estimated odds-ratios of zero or infinity, we report in this table, for each combination of levels and for each estimate, the median of 2000 estimates divided by the true odds-ratio (i.e. median ( b y y)=y). This ratio should be approximately 1 for an unbiased method. In addition, Spearman correlations among the three estimates of main interest b y y IV , b y y ADJ and b y y LC calculated over the 2000 estimates are also reported.
As is well known, the intention-to-treat estimate consistently underestimated the true odds-ratio (even in some situations with 90% of compliers), whereas the as-treated estimate might be biased in both directions, also in cases with no effect. Among the three estimates of main interest, we first notice that b y y IV and b y y ADJ did not differ much, b y y ADJ being usually slightly higher than b y y IV and the correlation between the two estimates being most of the time above 0.98. Both methods were often biased, sometimes downwards and sometimes upwards. The bias was usually larger with higher odds ratios, larger confounding effects and a smaller proportion of compliers. Importantly, the situation did not improve with a larger sample size. By contrast, the bias of the b y y LC estimate was pretty small, the ratio above being comprised between 0.95 and 1.19 in all considered situations, and the bias would still be smaller by further increasing the sample size.
Besides the bias, we also investigated the variability of the estimates. Figure 2 shows how the inter-quartile range (IQR) calculated from the 2000 estimates of the log odds-ratios depends on the proportion of compliers in the case n~2000 and for different combinations of levels. For this, additional simulations have been carried out with v C~0 :2,0:3 and 0.4 (in addition to v C~0 :1,0:5 and 0.9). The IQR for the different estimates were divided by the IQR achieved by b y y AT , which is the reference method in this figure (would be represented by a horizontal line drawn at the value 1). The three instrumental variable approaches showed a much higher variability than the as-treated and the intention-to-treat estimates, especially when the proportion of compliers was small. For v C~0 :1, the IQR of b y y IV and b y y ADJ were up to 10 times higher than the IQR of b y y AT , whereas the IQR of b y y LC was up to 18 times higher in the case of a small prevalence. For a medium baseline prevalence, the IQR of the three estimates b y y IV , b y y ADJ and b y y LC became more comparable with each other. Increasing the level of the odds-ratio or of the confounding effect did not change the results much. The complete results on the IQR for the different estimates are available from the first author upon request. Figure 3 illustrates both the bias and the variability of the different methods with the boxplots of the 2000 estimates of the log odds-ratios calculated under various settings in the case n~2000 and v C~0 :5. One can retrieve our conclusions above. We also performed simulations using other combinations of levels, e.g., where the proportion of Y~1 was taken higher for compliers than for always-takers and never-takers, and similar conclusions could be drawn (apart from the direction of the bias for b y y AT , b y y IV and b y y ADJ ).

Example
In this subsection, we illustrate how the method of Lui and Chang [13] can be used in a context of Mendelian randomization using a partly fictitious example. We consider the effect of alcohol consumption X on hypertension Y. In what follows, X~1 refers to individuals who drink alcohol, and Y~1 refers to people with hypertension. It is well known that the aldehyde dehydrogenase 2 (ALDH2) genotype is strongly associated with alcohol consumption since it encodes an enzyme involved in alcohol metabolism and this relationship might reasonably assumed to be causal [21]. The presence of a protective allele in one of the markers of the ALDH2 gene has been used as an instrument Z, since it is supposed to be responsible for a decrease in alcohol consumption. In what follows, Z~0 refers to individuals with this protective allele. In this context, a complier is an individual whose phenotype would be determined by his/her genotype (i.e. no alcohol consumption if the protective allele were present (Z~X~0), and consumption if the protective allele were absent (Z~X~1)). In contrast, an always-     taker is an individual who would drink alcohol, and a never-taker is an individual who would not drink alcohol, whatever his/her genotype. Recall also that we assume no defiers, i.e. there is no one who would drink alcohol if and only if the protective allele were present (i.e. with X~1 if Z~0 and with X~0 if Z~1), which seems to us tenable (it is in fact not obvious to imagine a subpopulation in which a causal gene would systematically produce the contrary of what it is expected to, although this cannot be verified from the data).
In the study analyzed by Amamoto et al. [22], 51.2% of individuals own the protective allele (i.e. with Z~0). Among persons with Z~0, 38.3% suffer from hypertension, whereas this proportion is 48.2% in the group Z~1. According to the population studied by Yamada et al. [23], the proportion of individuals who drink alcohol in the group Z~1 is 90.8% while it is 71.1% in the group Z~0. These proportions allow to calculate the margins of a 26262 table summarizing the distribution of (Z,X ,Y ). Unfortunately, we did not find comparable data allowing to complete all cells of the table. For the sake of illustration, we complete it by fixing the prevalence of hypertension at 39% in the group with X~1 and Z~0, and at 48.5% in the group with X~1 and Z~1. Considering a total sample size of n~2000 (to match our simulations), this leads to the fictitious data summarized in Table 3.    We note in particular that b p p 1C is much higher than b p p 1A , which is informative on the importance of confounding. Finally, the odds-ratio measuring the causal effect of X on Y for compliers is estimated as  [3.128; 21.887]). The confidence interval associated to b y y IV is obtained using the qvf function in Stata, while the confidence interval associated to b y y ADJ is here computed from 10000 bootstrap replications. Consistent with our simulations, these confidence intervals are somewhat narrower than the confidence interval associated to b y y LC , but one cannot here infer anything because of the unknown bias of these methods. We also note that the upper bound of the narrow confidence interval associated to the intention-to-treat estimate, whose bias is known to be in the conservative direction, is still smaller than the lower bound of the confidence interval associated to b y y LC .

Estimating Another Causal Odds-ratio
We have considered so far as target parameter the causal oddsratio y for the subpopulation of compliers. Besides being not identifiable, this subpopulation might admittedly be difficult to apprehend in the context of Mendelian randomization and it will depend on the chosen genetic instrument. Other causal odds-ratios have thus been considered as target parameters in the statistical literature.
In particular, the logistic structural mean model (LSMM) estimate described in Vansteelandt and Goetghebeur [14] has been introduced to estimate a causal odds-ratio y Ã for a subpopulation of individuals being at risk, i.e. for whom one would naturally observe X~1. There, the assumption (A1) is replaced by another one: (A2) the causal odds{ratio is the same for individuals with X~1 and Z~0 or with X~1 and Z~1 : Although the LSMM approach does not rely on (A1), we further assume in what follows that there are no defiers to allow some interesting comparison between the different estimates in that case. According to the terminology employed here, individuals with X~1 and Z~0 are representative of the always-takers, whereas individuals with X~1 and Z~1 are representative of a subpopulation composed of both the compliers and the alwaystakers. Thus, using this approach, one is estimating y Ã assuming that where p Ã 0A denotes the proportion of Y~1 for always-takers that one would get if one could intervene and set X~0, is as in the Methods section. To calculate the LSMM estimate, one may first calculate the estimate b p p Ã 0A of p Ã 0A as the value satisfying this assumption, that is : As far as we know, this is an original formulation of the LSMM estimate in the context considered here. One can check that it provides the same result as the equivalent explicit formulation for b y y LSMM given in the Appendix of Vansteelandt et al. [11]. Applied to our example from the previous subsection, one gets b p p Ã 0A~0 :116 and b y y LSMM~4 :893. We note however that the assumption of having the same causal odds-ratio in the subpopulation of alwaystakers and in the subpopulation of both compliers and alwaystakers is a very special one (and that its validity will also depend on the chosen genetic instrument). Because of the non-collapsibility of the odds-ratio (when pooling two subpopulations with the same odds-ratio, one does not in general obtain the same odds-ratio, see e.g. Greenland, Robins and Pearl [24]), it does not even imply that the causal odds-ratio is the same for compliers and for alwaystakers.
Actually, one could alternatively assume that (A3) the causal odds{ratio is the same for compliers, always{takers and never{takers and hence that where p Ã 1N denotes the proportion of Y~1 for never-takers that one would get if one could intervene and set X~1. Making this latter assumption, it would then become possible to estimate the causal odds-ratio in the entire population, which we shall denote by y z . Similarly to the LSMM estimate above, one would first calculate estimates b p p z 0A and b p p z 1N of p Ã 0A and p Ã 1N as the values satisfying b y y LC~b p p 1C (1{b p p 0C ) b p p 0C (1{b p p 1C )~b , which are given by b p p z 0A~b p p 1A =(b p p 1A z b y y LC (1{b p p 1A )) and b p p z 1N~b y y LC b p p 0N =( b y y LC b p p 0N z1{b p p 0N ). The proportion p z 1 of Y~1 in the entire population that one would get if one could intervene and set X~1 is then estimated as whereas the proportion p z 0 of Y~1 in the whole population that one would get if one could intervene and set X~0 is then estimated as An estimate b y y z of y z is then simply obtained by b y y z~b p p z 1 (1{b p p z 0 ) b p p z 0 (1{b p p z 1 ) : We did not find mention of such an estimate in the literature and it might be interesting to study its statistical properties (although it would rely on both assumptions (A1) and (A3) instead of only either (A1) or (A2)). Applied to our example from the previous subsection, one gets b p p z 0A~0 :060, b p p z 1N~0 :889, b p p z 1~0 :523, b p p z 0~0 :148 and b y y z~6 :282. Interestingly, Balke and Pearl [25] have derived bounds for p z 1 and p z 0 given the (observed) distribution of (Z,X ,Y ), which can then be used to derive bounds for y z . Note also that y z will be necessarily smaller in magnitude than y because of the non-collapsibility of the odds-ratio.

Discussion
In this paper, we have considered the problem of estimating a causal odds-ratio for assessing the effect of a risk factor X on an outcome Y using Mendelian randomization with a genetic instrument Z in the special case where X, Yand Z are all binary. We have confirmed via simulations that the usual adaptations of the method of instrumental variable such as the qvf function of Stata, or the adjusted version considered by Nagelkerke et al. [7] and by Palmer et al. [8] are not valid estimates of the causal oddsratio in the subpopulation of compliers since a large bias may occur, even with a large sample size. Palmer et al. [12] also recognized that these estimates are not consistent for any causal odds-ratio in this context. By contrast, the method recently proposed by Lui and Chang [13], while being more variable than the two methods above, does not suffer from this bias. While Palmer et al. [12] noted that ''estimation of complier causal effects on the odds-ratio scale is more problematic'', it is hence encouraging to have a valid solution in the simple case considered here (i.e. binary X, Y and Z). Further work is needed to estimate whether and how this solution may be extended to more complicated cases.
We have also recalled and illustrated that an instrumental variable approach with a weak instrument, in our context with a low proportion of compliers, might not be very useful because of the huge variability of the estimate. With 10% of compliers, as there are in many examples from the literature, the variability of the estimate of Lui and Chang [13] (measured via the interquartile range) can be up to 18 times higher than that of the conventional as-treated or intention-to-treat estimates. With 30% of compliers, the variability can still be up to 5 times higher. In our example, we had about 20% of compliers and the confidence interval obtained for the causal odds-ratio was still rather wide even with a sample size of n~2000. Thus, the need to find stronger instruments is probably as important as the need to develop a methodology allowing to consistently estimate a causal odds-ratio.
Another limitation of the considered approach is that the subpopulation of compliers on which we restrict our attention is in the context of Mendelian randomization ''at the least unnatural and a lot harder to grasp'' than in the context of a clinical trial with noncompliance, as noted by one anonymous reviewer. While we agree with this statement, the question is whether there really is a viable alternative. If one does not restrict one's attention to the compliers, one is considering always-takers and never-takers. As it is by definition not possible to observe what would be the outcome of an always-taker if he/she had X~0, or what would be the outcome of a never-taker if he/she had X~1, one has to make some speculative assumption in this regard. For example, the assumption which is made when using the logistic structural mean model estimate of Vansteelandt and Goetghebeur [14], for which the target parameter is the causal odds-ratio in a subpopulation of persons being at risk, is that the effect of the risk factor on the outcome is the same in the subpopulation containing the alwaystakers (and the defiers, if any) and in the subpopulation containing the compliers and the always-takers. While such an assumption might be defendable in a context where the effect is assessed via a risk difference, it seems to us much more questionable in a context where the effect is measured via an odds-ratio (because of the noncollapsibility of the odds-ratio, and unless it is equal to one, it would be quite special to have the same odds-ratio in two subpopulations which partly, but not exactly coincide). This is why we would personally prefer to assume instead that there are no defiers and to use the estimate of Lui and Chang [13] (even if the no defiers assumption is certainly also questionable; we are looking forward to hearing more opinion of geneticists about situations where this assumption might be verified and situations where it might not).
In conclusion, we suggest that the approach of Lui and Chang [13] might be a valuable solution for estimating a causal odds-ratio between a binary risk factor and a binary outcome in the context of a Mendelian randomization with a binary instrument, if we are ready to assume no defiers and despite having to restrict our attention to the compliers. About this latter restriction, we believe that having a valid estimate of the causal effect in a subpopulation of human beings is of scientific interest. Most physiological phenomena have indeed been discovered in a restricted set of people and are usually widely applicable to larger sets of people. That the set of compliers is not an identifiable one should not invalidate this principle.