Skip to main content
  • Loading metrics

Mendelian Randomisation and Causal Inference in Observational Epidemiology

The Problem of Inferring Causality in Epidemiology

The notion of risk is central to epidemiological research, both in its original context of studying conditions thought to be caused by a particular factor and, more broadly, in predicting the probability of a condition for prognostic purposes. For prognostic research, all factors associated with the outcome are of interest, whether they are causal or not. In aetiological research, on the other hand, causality is meaningful. Here, the focus is often on assessing the effect of some modifiable exposure on a disease with a view to informing health interventions at the individual or population level, or health advice for particular risk groups. For such intervention or advice to be effective, it is important to verify that the observed association between the exposure and disease means that the exposure is in fact causal for the disease. For example, once the relationship between periconceptual maternal folate supplementation and risk of neural tube defects was established [1,2], the United States, Canada, and Chile implemented mandatory fortification of cereal flour and related foods with folic acid and reported reductions in neural tube defect incidence between 27% and just over 50% [3]. However, observational research has had several high-profile failures when exposures that seemed to affect disease risk were later shown to be non-causal in follow-up randomised controlled trials (RCTs). For instance, observational evidence that seemed to suggest that vitamin E is protective for cardiovascular disease, beta-carotene for cancer, and, more recently, oestrogen for dementia, has now been refuted [4]. Since only candidate causes with the strongest observational support tend to be followed up in RCTs when these are possible, it is likely that many more reported observational findings are not actually causal [5].

Five Key Papers in the Field

Chen et al., 2008 [9] A recent application of the method that combines information from several studies and uses a genetic variant as a proxy for an exposure that is difficult to measure.

Hernán and Robins, 2006 [8] A recent overview of what can and what cannot be done in epidemiological studies with instrumental variables.

Davey Smith et al., 2005 [5] A comment on the wider picture of where genetic epidemiology can contribute to public health research.

Davey Smith and Ebrahim, 2003 [7] The first main paper detailing the relevance of the method to epidemiological research and providing many examples.

Katan, 1986 [10] This briefly outlines the original idea behind the method of Mendelian randomisation as it is commonly used now.

Inferring causality from observational data is problematic as it is not always clear which of two associated variables is the cause and which the effect, or whether both are common effects of a third unobserved variable, or confounder (see Glossary). The direction of causality can sometimes be determined by temporal criteria (e.g., the cause must precede the effect) or from knowledge of the underlying biology. Confounding is more difficult to deal with because it is mainly due to social, behavioural, or physiological factors that are difficult to measure and control for. In practice, one can never be sure that the relevant confounders have been identified and accounted for. Besides the fact that RCTs are not feasible or ethical for many exposures of public health relevance, such as toxins, physical activity, or complex nutritional regimes, observational studies also have some advantages over RCTs; for example, the subjects in the latter are not always representative of the population for which an intervention is being considered [6]. “Mendelian randomisation” provides an alternative way of dealing with the problems of observational studies [6–9], especially for the case where confounding is believed to be present but cannot be controlled for because it is not fully understood.

Mendelian Randomisation

We outline the idea now known as “Mendelian randomisation” using the example provided by Katan [10] in his early description of the concept in 1986, although the first implementation of this basic idea in an epidemiological setting under the flag of “Mendelian randomisation” was more recent [11]. Details of the derivation of the approach and its nomenclature are provided in a recent review [12].

In the mid-1980s, there was considerable debate over the hypothesis that low serum cholesterol levels might directly increase the risk of cancer. Alternative explanations for the observed association were that cholesterol levels were lowered by the presence of latent tumours in future cancer patients (reverse causation), or that both cancer risk and cholesterol levels might be affected by confounding factors like diet and smoking. The observation that individuals with abetalipoproteinaemia, and hence negligible levels of serum cholesterol, did not seem to be predisposed to cancer led Katan to the idea of finding a larger group of individuals genetically inclined towards lower cholesterol levels. The apolipoprotein E (ApoE) gene was known to affect serum cholesterol, the ApoE2 variant being associated with lower levels. Katan's idea was that many individuals will carry the ApoE2 variant and thus will naturally have lower cholesterol levels from birth. Crucially, since genes are randomly assigned during meiosis (which gives rise to the name “Mendelian randomisation”), these ApoE2 carriers will not be systematically different from carriers of the other ApoE alleles in any other respect, and in consequence there should be no confounding. Only if low serum cholesterol is really causal for the disease should cancer patients have more ApoE2 alleles than controls. Otherwise the distribution of ApoE alleles should be similar in both groups. This can be easily checked from the observed distributions.

Katan's reasoning corresponds exactly to what is known as an instrumental variable method in econometrics [13–16]. The genetic variant acts as a so-called instrumental variable (or instrument) and helps to disentangle the confounded causal relationship between intermediate phenotype and disease. Once this theoretical connection had been made, epidemiologists were able to learn from and adapt the methods that were so well known in econometrics [7,17].

The three key assumptions for Katan's idea to work, and hence for a genetic variant to qualify as an instrumental variable, are illustrated graphically in Figure 1 and interpreted as follows.

  1. The genetic variant is unrelated to (independent of) the typical confounding factors, i.e., the graph has no arrow (in either direction) connecting ApoE with the confounders.
  2. The genetic variant is (reliably) associated with the exposure, i.e., there is an arrow connecting ApoE to serum cholesterol and we can accurately quantify the relationship this represents.
  3. For known exposure status (cholesterol level) and known confounders (if the confounders were observable), i.e., conditional on exposure and confounders, the genetic variant is independent of the outcome, i.e., ApoE does not provide any additional information for the prediction of cancer once these two variables are measured. An equivalent way of expressing this, which is less precise but perhaps more intuitive, is to say that there is no direct effect of genotype on disease (no single arrow between ApoE and cancer) nor any other mediated effect other than through the exposure of interest (no other routes in the graph between ApoE and cancer).

Figure 1. The ApoE Genotype as an Instrumental Variable in a Mendelian Randomisation Application

The arrows can be thought of as representing causal relationships, but this is not what matters here. What is essential is the absence of an arrow between ApoE and the confounders and between ApoE and cancer, as detailed in the three key assumptions in the text.

Note that these assumptions have to be justified from background knowledge of the underlying biology. Neither the first nor the third assumption can be tested statistically since they depend on the confounding factors, which, by definition, are unobserved. The first assumption means that you must have reasonable belief that your genetic variant is unaffected by the sort of confounding that might generally be expected of such an exposure–disease relationship. Fortunately, the very basis of Mendelian randomisation rests on the knowledge that alleles are randomly assigned from parental alleles at meiosis (see above), and this implies that, across the population, genetic effects are relatively robust, although not immune to confounding [7,18]. Furthermore, the type of information needed to explore this assumption is often available in practice, as it is usually well-studied genetic variants that are proposed as instruments. Assumption 3 demands a comprehensive understanding of the underlying biological and clinical science, and may appropriately be considered in a sensitivity analysis. Unlike the first and third, the second assumption can formally be tested using the observed data, and the method works better the stronger the association between gene and exposure.

If the three assumptions seem reasonable (i.e., Figure 1 is believable), then it can be shown that, as Katan originally hypothesised, a simple statistical test of association between the ApoE genotype and cancer amounts to a test for causal effect of cholesterol levels on cancer [19].

The idea of using a gene as an instrument to test for a causal effect of an intermediate phenotype on a disease has been used for a range of other traits, some of which are summarised in Table 1 [9,20–28]. For example, raised plasma fibrinogen levels have been associated with an elevated risk of coronary heart disease (CHD) in large-scale prospective studies, prompting suggestions that methods to reduce fibrinogen levels should be sought [29]. If the fibrinogen–CHD relationship were causal, then such interventions could have considerable clinical and public health benefits. However, interventions to lower plasma fibrinogen levels would not be warranted if the association was explained by confounding or reverse causation. Doubts about a causal link between fibrinogen and CHD have been raised by evidence that the association is considerably attenuated by adjustment for smoking, body mass index, and plasma apolipoprotein B/A1 ratio [20], and that there are many known correlates of fibrinogen, only some of which are typically measured and adjusted for in individual studies [30]. Furthermore, bezafibrate was found to reduce plasma fibrinogen in randomised controlled trials, but it did not have a greater effect on CHD risk than could already be explained by its cholesterol-lowering effect [31].

Additional light can be cast on this relationship from relevant genetic studies. A recent large meta-analysis of genetic association studies of fibrinogen promoter region polymorphisms (G-455A and C-148T) showed that there was a mean increase in fibrinogen of 0.12 g/l (95% confidence interval [CI] 0.09 to 0.14) per copy of the A or T allele. However, these same alleles were not associated with CHD risk: the odds ratio per allele was 0.98 (95% CI 0.92 to 1.04) [21]. Since the 95% confidence interval includes the null hypothesis value of 1, we cannot reject the null hypothesis at the 5% level and hence conclude that the data provide little or no evidence for a causal effect of fibrinogen on CHD. This could be due to random error or lack of power of the statistical test, which is a problem with genetic association studies when relatively small effects are being sought. The findings are also consistent with the hypothesis that the associations shown previously in observational studies are partially or wholly explained by reverse causation or confounding. Of course, as with any test, the fact that an exposure appears to be non-causal does not necessarily mean that it is not clinically useful. Clearly, it would be dangerous to stop investigating the role of fibrinogen in CHD risk because of such an outcome. What is implied, however, is that more investigation is required before making any great investment in intervening on fibrinogen levels.

Mendelian randomisation can also be applied when the exposure of interest is a modifiable behaviour rather than an intermediate phenotype. For example, Chen et al. [9] consider the causal effect of alcohol intake on blood pressure. An RCT would be problematic here, and measurement of alcohol intake is prone to error. Hence, observational data have to be considered in a setting where the causal relationship of interest is known to be heavily confounded. In some populations, a particular variant (*2) of the ALDH2 gene is quite common. The *2 variant is associated with accumulation of acetaldehyde, and therefore unpleasant symptoms, after drinking alcohol. Carriers of this variant tend to limit their alcohol consumption, and alleles at the ALDH2 locus can hence be used as a surrogate or proxy for alcohol intake [9]. Based on this assumption, a Mendelian randomisation meta-analysis approach, combining evidence from several studies, indicated that previous observational evidence on the beneficial effects of moderate drinking on blood pressure were possibly misleading. Exploring biological complexity is another important application of the method, although we have not focused on this aspect here. Li et al. [32] use a Mendelian randomisation approach to infer parts of biological causal pathways, for example.

Problems and Limitations

The limitations of Mendelian randomisation fall into two main categories. Firstly, the key assumptions for a genotype to be an instrument (see above) may not be plausible, in which case any inference about the causal effect will typically be biased. Such limitations include the presence of linkage disequilibrium, genetic heterogeneity, pleiotropy, population stratification, canalisation, or lack of knowledge about the confounding factors. These limitations have received a lot of attention in the literature [6,7,33]. However, graphs can be used as a visual check, and some apparent violations may not actually be problems in practice [19].

For example, Figure 2 addresses the case where the chosen instrument, Gene1, is in linkage disequilibrium with another gene, Gene2, which has not been observed. Here, Gene2 directly affects the disease level or risk, and hence Gene1 is not an instrument due to violation of the third key assumption. However, if Gene2 only affects the disease via its effect on the same intermediate exposure, as shown in Figure 3, there is no such violation and Gene1 can be used as an instrument in a Mendelian randomisation analysis. Note that Gene1 would also qualify as an instrument if its association with the exposure was only via its association with Gene2 (Figure 4). Hence, it does not really matter whether Gene1 or Gene2 is the causal variant for the exposure when they are in linkage disequilibrium, as either one qualifies as an instrumental variable in this case.

Figure 2. A Mendelian Randomisation Study Where the Chosen Instrument Is in Linkage Disequilibrium with a Variant Associated With, or Causal For, the Disease

Note that the direction of the arrow depicting the statistical association between the two genes is interchangeable.

Figure 3. A Mendelian Randomisation Study Where the Chosen Instrument Is in Linkage Disequilibrium with a Variant That Is Also Causal for the Intermediate Exposure

Figure 4. A Mendelian Randomisation Study Where the Chosen Instrument Is Not Directly Causal for the Exposure, But Is in Linkage Disequilibrium with Another Variant That Is

A similar check for violations can be applied to the situation described in Lawlor et al. [34], where the hypothesised causal effect of maternal adiposity on offspring adiposity is investigated using maternal FTO genotype as an instrument. The reason that one must also adjust for offspring FTO genotype in the relevant regression in order to perform a Mendelian randomisation analysis can be illustrated quite simply by the graph in Figure 5. Without adjusting for (conditioning on) offspring FTO, key assumption 3 would be violated due to the existence of an alternative path to the outcome via this genotype. Note that this situation is specific to the graph in Figure 5, which assumes that there is no other confounder of offspring FTO and offspring adiposity (such as paternal FTO).

Figure 5. Maternal FTO Only Qualifies as an Instrument Conditional On Offspring FTO

If the three key assumptions of an instrumental variable are satisfied by the genetic variant, testing for a causal effect of phenotype on disease by testing for an association between genotype and disease is straightforward for most practical purposes. Any statistical test that is appropriate for the variables being considered will suffice. However, calculation of the magnitude of the causal effect requires additional strong assumptions, such as linearity of all relationships (e.g., constant increase of disease with exposure) and no interactions. If these assumptions are satisfied, we can obtain an estimate of the causal effect from a mathematically simple combination of the observed genotype–disease and genotype–exposure associations [13]. The second class of limitations of Mendelian randomisation concerns the validity of such additional assumptions. These limitations have not generated so much discussion to date, although in many observational studies the outcome is a binary variable, and, under the mathematical models that are typically applied—e.g., logistic or probit regression—conventional linearity is not satisfied [19]. In consequence, the estimate that is valid in the all-linear case should not really be applied to binary outcome data, although it has sometimes been advocated [17,26]. Generalisations of the instrumental variable method to the non-linear case can be found in the literature [8,15,35–39], but are typically aimed at very different kinds of applications. Their usefulness in the context of Mendelian randomisation has yet to be investigated. It is, perhaps, important to stress that these extra distributional assumptions are only an issue for estimation of the magnitude of the causal effect and not for testing for the presence of such an effect.

The Future for Mendelian Randomisation

A Mendelian randomisation analysis does not aim to identify genetic factors that are causal for disease risk in order to target individuals on the basis of their genotype. On the contrary, the focus is on the causal association between an exposure and a disease with a view to informing the potential impact of non-genetic interventions on that exposure. To that end, such analysis exploits a well-studied genetic factor for its known relationship with the exposure.

In order to widen the applicability of the approach, more general methods for the common, but statistically nonstandard, case with a binary disease outcome need to be developed. In particular, the relevance to observational epidemiology of related methods in other areas, especially in terms of the particular assumptions required, is currently being investigated. We should also stress the importance of obtaining good estimates from genetic association studies, in particular ensuring sufficiently large sample sizes with adequate power to detect the typically modest effects one might expect for the determinants of common multifactorial diseases [6,20,40]. The need to formally combine information from different sources, such as the large biobanks that are currently being set up worldwide, is also essential [41].

Mendelian randomisation has received its fair share of criticism (e.g., [42]). One objection is that good genetic instruments are not easy to find, but recent rapid advances in genetic epidemiology are addressing this issue [5]. Most criticisms concern the violations of the key assumptions implicit in Figure 1. Confounding of the genotype–disease relationship is one such violation that has received some attention. However, it has recently been re-emphasised that this violation may not be as serious as may at first appear because, as outlined above, Mendelian randomisation analyses are fundamentally less susceptible to confounding than conventional epidemiology analyses [18].


It is often unavoidable (and sometimes desirable) to use observational data to infer causality, but it may then be difficult to disentangle causation from association, especially in the presence of confounding. We would argue that some of the confusion and misleading interpretations of results from observational studies are partly due to the lack of a clear formal approach to distinguish between association and causation. Causal terminology is often used loosely in the medical literature. It is intended to convey more than a simple association between potential risk factors and their effects, but this is rarely made explicit. More formal approaches are based on the idea of a hypothetical intervention [43,44], which seems particularly suited to the present context where we have potential health interventions in mind. These formal approaches highlight the usefulness of Mendelian randomisation studies for inferring causality and enable precise specification of the key assumptions (as depicted in Figure 1) necessary for the method to be valid.


Genetic Terms

Alleles are the different variants of a gene at a locus. They are sometimes called polymorphisms.

Canalisation is a developmental compensation that can atone for disruptive environmental or genetic forces.

Genetic heterogeneity refers to the situation where a phenotype is influenced by several alleles, usually at different genetic loci.

Linkage disequilibrium refers to (statistical) association between alleles at different loci. One reason for such an association is that the relevant genetic loci are physically close on the chromosome, and the alleles tend to be inherited together.

Pleiotropy refers to the situation where a genetic variant has more than one specific phenotypic effect.

Population stratification occurs when allele frequencies and disease rates, or allele frequencies and exposure rates, vary widely between different subgroups of the population and cause an association between the two at the overall population level.

Statistical Terms

Conditional independence: For variables X, Y, and Z, we say that X and Y are conditionally independent given Z if knowledge of X (or alternatively Y) does not improve our prediction of Y (alternatively X) once we actually know Z.

Confounding: The effect of a variable X on another variable Y is said to be confounded if the observed association between X and Y does not correspond to the causal effect. Confounding is often due to the existence of another cause of Y that is also associated with X.

Interaction: Variables X1 and X2 interact in their association with Y if the association of X1 with Y varies for different values or levels of X2.

Linear relationship: The relationship between variables X and Y is linear if the change in Y caused by a unit change in X is constant for all values or levels of X. Any departure from this criterion is a nonlinear relationship.

Given the tendency of high-profile findings to persist in the literature, and influence public health and clinical policy, long after they have been formally refuted by RCT analyses [4], and given the expense and the scientific and ethical constraints of RCTs, it is fortunate that advances in biology, biotechnology, and epidemiology have provided us with an alternative tool, in the shape of Mendelian randomisation, that can help us to formally assess causality based on observational data. But the approach demands a sound understanding both of the underlying biomedicine and of the statistical assumptions invoked in its application. If it is used wisely, Mendelian randomisation could make a major contribution to our understanding of the aetiological architecture of complex diseases; but if it is used unthinkingly, it could sow seeds of confusion and set back progress in bioscience. This short article is aimed at encouraging the former and avoiding the latter.


  1. 1. MRC Vitamin Study Research Group (1991) Prevention of neural tube defects: Results of the Medical Research Council vitamin study. Lancet 338: 131–137.
  2. 2. Scholl TO, Johnson WG (2000) Folic acid: Influence on the outcome of pregnancy. Am J Clin Nutr 71(Suppl): 12955–3035.
  3. 3. Scientific Advisory Committee on Nutrition (SACN) (2006) Folate and disease prevention. London: The Stationary Office (TSO).
  4. 4. Tatsioni A, Bonitis NG (2007) Persistence of contradicted claims in the literature. JAMA 298: 2517–2526. JPA Ioannidis.
  5. 5. Davey Smith G, Ebrahim S, Lewis S, Hansell AL, Palmer LJ, et al. (2005) (a) Genetic epidemiology and public health: Hope, hype, and future prospects. Lancet 366: 1484–1498.
  6. 6. Lawlor DA, Harbord RM, Sterne JAC, Timpson N, Davey Smith G (2008) Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology. Stat Med 27: 1133–1328.
  7. 7. Davey Smith G, Ebrahim S (2003) Mendelian randomization: Can genetic epidemiology contribute to understanding environmental determinants of disease. Int J Epidemiol 32: 1–22.
  8. 8. Hernán MA, Robins JM (2006) Instruments for causal inference. An epidemiologist's dream. Epidemiology 17: 360–372.
  9. 9. Chen L, Davey Smith G, Harbord RM, Lewis SJ (2008) Alcohol intake and blood pressure: A systematic review implementing a Mendelian randomization approach. PLoS Med 5: e52.
  10. 10. Katan MB (1986) Apolipoprotein E isoforms, serum cholesterol, and cancer. Lancet I: 507–508.
  11. 11. Youngman L, Keavney B (2000) Plasma fibrinogen and fibrinogen genotypes in 4685 cases of myocardial infarction and in 6002 controls: Test of causality by “Mendelian randomisation”. Circulation 102(suppl II): 31–32.
  12. 12. Davey Smith G (2007) Capitalizing on Mendelian randomization to assess the effects of treatments. J R Soc Med 100: 432–435.
  13. 13. Bowden RJ, Turkington DA (1984) Instrumental variables. Cambridge (UK): Cambridge University Press.
  14. 14. Angrist JD, Imbens GW, Rubin DB (1996) Identification of causal effects using instrumental variables. J Am Stat Assoc 91: 444–455.
  15. 15. Greenland S (2000) An introduction to instrumental variables for epidemiologists. Int J Epidemiol 29: 722–729.
  16. 16. Pearl J (2000) Causality. Cambridge (UK): Cambridge University Press.
  17. 17. Thomas DC, Conti DV (2004) Commentary: The concept of “Mendelian randomization”. Int J Epidemiol 33: 21–25.
  18. 18. Davey Smith G, Lawlor DA, Harbord R, Timpson N, Day I, et al. (2007) Clustered environments and randomized genes: A fundamental distinction between conventional and genetic epidemiology. PLoS Med 4: e352.
  19. 19. Didelez V, Sheehan NA (2007) Mendelian randomisation as an instrumental variable approach to causal inference. Stat Meth Med Res 16: 309–330.
  20. 20. Keavney BD, Danesh J, Parish S, Palmer A, Clark S, et al. (2006) Fibrinogen and coronary heart disease: Test of causality by ‘Mendelian randomization’. Int J Epidemiol 35: 935–943.
  21. 21. Davey Smith G, Harbord R, Milton JM, Ebrahim S, Sterne JA (2005) Does elevated plasma fibrinogen increase the risk of coronary heart disease? Evidence from a meta-analysis of genetic association studies. Arterioscler Thromb Vasc Biol 25: 2228–2233.
  22. 22. Casas JP, Bautista LE, Smeeth L, Sharma P, Hingorani AD (2005) Homocysteine and stroke: Evidence on a causal link from Mendelian randomisation. Lancet 365: 224–232.
  23. 23. Kivimaki M, Lawlor DA, Eklund C, Smith GD, Hurme M, et al. (2007) Mendelian randomization suggests no causal association between C-reactive protein and carotid intima-media thickness in the young Finns study. Arterioscler Thromb Vasc Biol 27: 978–979.
  24. 24. Casas JP, Shah T, Cooper J, Hawe E, McMahon AD, et al. (2006) Insight into the nature of the CRP-coronary event association using Mendelian randomization. Int J Epidemiol 35: 922–931.
  25. 25. Timpson NJ, Lawlor DA, Harbord RM, Gaunt TR, Day INM, et al. (2005) C-reactive protein and its role in metabolic syndrome: A Mendelian randomisation study. Lancet 366: 1954–1959.
  26. 26. Davey Smith G, Lawlor DA, Harbord R, Rumley A, Lowe GDO, et al. (2005) (b) Association of C-reactive protein with blood pressure and hypertension. life course confounding and Mendelian randomisation tests of causality. Arterioscler Thromb Vasc Biol 25: 1051–1056.
  27. 27. Herder C, Klopp N, Baumert J, Muller M, Khuseyinova N, et al. (2008) Effect of macrophage migration inhibitory factor (MIF) gene variants and MIF serum concentrations on the risk of type 2 diabetes: Results from the MONICA/KORA Augsburg Case-Cohort Study, 1984–2002. Diabetologia 51: 276–284.
  28. 28. Frayling TM, Rafiq S, Murray A, Hurst AJ, Weedon MN, et al. (2007) An interleukin-18 polymorphism is associated with reduced serum concentrations and better physical functioning in older people. J Gerontol A Biol Sci Med Sci 62: 73–78.
  29. 29. Kamath S, Lip GY (2003) Fibrinogen: Biochemistry, epidemiology and determinants. QJM 96: 711–729.
  30. 30. The Fibrinogen Studies Collaboration (2007) Associations of plasma fibrinogen levels with established cardiovascular disease risk factors, inflammatory markers, and other characteristics: Individual participant meta-analysis of 154,211 adults in 31 prospective studies: The fibrinogen studies collaboration. Am J Epidemiol 166: 867–879.
  31. 31. Meade T, Zuhrie R, Cook C, Cooper J (2002) Bezafibrate in men with lower extremity arterial disease: Randomised controlled trial. BMJ 325: 1139.
  32. 32. Li W, Wang M, Irigoyen P, Gregersen PG (2006) Inferring causal relationships among intermediate phenotypes and biomarkers: A case study of rheumatoid arthritis. Bioinformatics 22: 1503–1507.
  33. 33. Nitsch D, Molokhia M, Smeeth L, DeStavola BL, Whittaker JC, et al. (2006) Limits to causal inference based on Mendelian randomization: A comparison with randomised controlled trials. Am J Epidemiol 163: 397–403.
  34. 34. Lawlor DA, Timpson NJ, Harbord RM, Leary S, Ness A, et al. (2008) Exploring the developmental overnutrition hypothesis using parental–offspring associations and FTO as an instrumental variable. PLoS Med 5: e33.
  35. 35. Mullahy J (1997) Instrumental variable estimation of Poisson regression models: Application to models of cigarette smoking behaviour. Rev Econ Stat 79: 586–593.
  36. 36. Rosner B, Spiegelman D, Willett WC (1990) Correction of logistic regression relative risk estimates and confidence intervals for measurement error: The case of multiple covariates measured with error. Am J Epidemiol 132: 734–745.
  37. 37. (1994) Counterfactual probabilities: Computational methods, bounds and applications. In: Mantaras RL, Poole D, editors. pp. 46–54. editors. Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence; 29–31 July 1994; Seattle, Washington, United States. pp.
  38. 38. (2000) Causal inference from graphical models. In: Barndorff-Nielsen OE, Cox DR, Kluppelberg C, editors. Chapter 2. Complex stochastic systems. Boca Raton (FL): Chapman & Hall. pp. 63–107. editors.
  39. 39. (2003) Causal inference using influence diagrams: The problem of partial compliance. In: Green PJ, Hjort NL, Richardson S, editors. Highly structured stochastic systems. Oxford: Oxford University Press. pp. 45–81. editors.
  40. 40. Minelli C, Thompson JR, Tobin MD, Abrams KR (2004) An integrated approach to the meta-analysis of genetic association studies using Mendelian randomization. Am J Epidemiol 160: 445–452.
  41. 41. (2008) Biobanks and biobank harmonization. In: Davey Smith G, Burton PR, Palmer LJ, editors. An introduction to genetic epidemiology. Bristol: Bristol Policy Press. In press.
  42. 42. Bochud M, Chiolero A, Elston RC, Paccaud F (2007) A cautionary note on the use of Mendelian randomization to infer causation in observational epidemiology. Int J Epidemiol 37: 414–416.
  43. 43. Pearl J (1995) Causal diagrams for empirical research. Biometrika 82: 669–710.
  44. 44. (2007) Mendelian randomisation: Why epidemiology needs a formal language for causality. In: Russo F, Williamson J, editors. Causality and probability in the sciences. Volume 5 of Texts in Philosophy. London: London College Publications. pp. 263–292. editors.