Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Exploration of a Polygenic Risk Score for Alcohol Consumption: A Longitudinal Analysis from the ALSPAC Cohort

  • Michelle Taylor ,

    Affiliations MRC Integrative Epidemiology Unit at the University of Bristol (IEU), University of Bristol, Bristol, United Kingdom, School of Social and Community Medicine, Faculty of Medicine and Dentistry, University of Bristol, Bristol, United Kingdom, UK Centre for Tobacco and Alcohol Studies, School of Experimental Psychology, University of Bristol, Bristol, United Kingdom


  • Andrew J. Simpkin,

    Affiliations MRC Integrative Epidemiology Unit at the University of Bristol (IEU), University of Bristol, Bristol, United Kingdom, School of Social and Community Medicine, Faculty of Medicine and Dentistry, University of Bristol, Bristol, United Kingdom

  • Philip C. Haycock,

    Affiliations MRC Integrative Epidemiology Unit at the University of Bristol (IEU), University of Bristol, Bristol, United Kingdom, School of Social and Community Medicine, Faculty of Medicine and Dentistry, University of Bristol, Bristol, United Kingdom

  • Frank Dudbridge,

    Affiliation Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, United Kingdom

  • Luisa Zuccolo

    Affiliations MRC Integrative Epidemiology Unit at the University of Bristol (IEU), University of Bristol, Bristol, United Kingdom, School of Social and Community Medicine, Faculty of Medicine and Dentistry, University of Bristol, Bristol, United Kingdom

Exploration of a Polygenic Risk Score for Alcohol Consumption: A Longitudinal Analysis from the ALSPAC Cohort

  • Michelle Taylor, 
  • Andrew J. Simpkin, 
  • Philip C. Haycock, 
  • Frank Dudbridge, 
  • Luisa Zuccolo



Uncertainty remains about the true extent by which alcohol consumption causes a number of health outcomes. Genetic variants, or combinations of variants built into a polygenic risk score (PGRS), can be used in an instrumental variable framework to assess causality between a phenotype and disease outcome of interest, a method known as Mendelian randomisation (MR). We aimed to identify genetic variants involved in the aetiology of alcohol consumption, and develop a PGRS for alcohol.


Repeated measures of alcohol consumption from mothers and their offspring were collected as part of the Avon Longitudinal Study of Parents and Children. We tested the association between 89 SNPs (identified from either published GWAS data or from functional literature) and repeated measures of alcohol consumption, separately in mothers (from ages 28–48) and offspring (from ages 15–21) who had ever reported drinking. We modelled log units of alcohol using a linear mixed model and calculated beta coefficients for each SNP separately. Cross-validation was used to determine an allelic score for alcohol consumption, and the AVENGEME algorithm employed to estimate variance of the trait explained.


Following correction for multiple testing, one SNP (rs1229984) showed evidence for association with alcohol consumption (β = -0.177, SE = 0.042, p = <0.0001) in the mothers. No SNPs showed evidence for association in the offspring after correcting for multiple testing. The optimal allelic score was generated using p-value cut offs of 0.5 and 0.05 for the mothers and offspring respectively. These scores explained 0.3% and 0.7% of the variance.


Our PGRS explains a modest amount of the variance in alcohol consumption and larger sample sizes would be required to use our PGRS in an MR framework.


Alcohol is a leading preventable cause of ill health in Europe [1], with Europeans accounting for more than a quarter of the total worldwide alcohol consumption (despite making up 15% of the global population) [2]. Despite this, uncertainty remains about the true extent by which alcohol consumption in the general population causes a number of health outcomes including type 2 diabetes [3] and cardiovascular disease [4] mainly because of bias in conventional epidemiological studies.

Genetic variants can be used in an instrumental variable framework to improve evidence on causality between an exposure and disease outcome of interest, a method known as Mendelian randomisation (MR) [5]. Details of the rationale and assumptions of MR have been discussed in detail elsewhere [6]. In brief, the allocation of genetic variants is random at conception, therefore the frequency of those variants associated with an exposure of interest should be approximately the same in groups of individuals with different confounding factors. Furthermore, as genotype is determined at conception, it cannot be susceptible to reverse causation [5, 7], which is particularly problematic when studying long-term effects of alcohol use (“sick-quitter effect” [3, 8]). Holmes et al (2014) [9] used rs1229984 (a genetic variant in ADH1B) in an MR framework to examine the causal impact of alcohol on cardiovascular disease in European populations while other examples can be found in East Asian populations [1012]. Polygenic risk scores (PGRS) can also be used in an MR framework, accounting for a greater proportion of the variance in the exposure phenotype of interest, thus increasing power. Use of PGRS in MR can avoid or alleviate weak instrument bias [13], which is a common problem in MR. Furthermore, in the instance that researchers wish to use a PGRS in a large cohort that has genotyping availability but no GWAS data, a finite number of SNPs from the PGRS can be easily and cost effectively genotyped for that purpose.

There are clear advantages of using PGRS in MR studies of alcohol consumption, however to date there are no known variants robustly associated with alcohol drinking in populations of European origin, other than the relatively rare ADH1B used by Holmes et al [9]. This is despite estimates of heritability for alcohol use disorders and consumption reaching approximately 50% at their peak [1417], and linkage and genome wide association studies (GWAS) suggesting a variety of potential loci that might be implicated [1820]. The majority of GWAS for alcohol phenotypes focus on dependence rather than heaviness of use [2124], and among the top findings are often alcohol dehydrogenase genes (ADH) and aldehyde dehydrogenase (ALDH2) [2527], which have also been reported in candidate gene studies of metabolic reactions following ingestion [28].

We therefore aimed to identify genetic variants likely to play a role in the aetiology of alcohol consumption. Our end goal was to develop a polygenic risk score for average alcohol consumption in the general population, which could be specific to consumption and explain a larger proportion of the variance than the known ADH1B variant. We used a multi-step approach, by [a] Identifying genetic variants that could plausibly be associated with alcohol consumption from genome wide association studies (GWAS) and the functional literature, [b] Estimating their association with alcohol consumption in mothers (heritability is estimated to be higher after college years) and offspring from the Avon Longitudinal Study of Parents and Children (ALSPAC), and [c] Creating PGRSs (based on the initial set of SNPs) and estimating the proportion of variance explained for both mothers and offspring’s phenotypes. We fitted both cross-sectional and longitudinal models, thus taking advantage of the repeated measures of alcohol consumption available at different time points in life to minimise noise in the definition/reporting of alcohol use.


Study Population

Data were taken from the Avon Longitudinal Study of Parents and Children (ALSPAC), a longitudinal study situated in South West England. ALSPAC recruited 14,541 pregnant women between 1991 and 1992, with over 14,062 live births resulting from these pregnancies. Comparison with the 1991 census shows the sample was broadly representative of the British population [29]. Both mothers and offspring have been followed up with a series of questionnaires, clinics and lab-based assessments over the past 25 years, which has allowed for a wide range of phenotypic and biological measures to be collected. Ethical approval was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Further information of the recruitment process is available elsewhere [2931]. The study website contains details of all data through a searchable data dictionary [32].

We used data from 10 postal questionnaires completed by the mothers in the cohort over 18 years (ranging from a mean age of 28 at baseline to a mean age of 48 at 18 years post pregnancy) and questionnaires completed by ALSPAC offspring over 6 years (ranging from age 15 to 21 years). All individuals who had available genetic data (outlined below) and answered alcohol related questions at these time points were included in the analysis (Mother N = 1609 to 3912; Offspring N = 2604 to 7989, Table 1).

Table 1. Summary information of weekly alcohol consumption across all time points.

Phenotypic Measures

Weekly alcohol consumption (units).

In all alcohol related questions used in this research, participants were informed that “one drink referred to ½ pint of beer/cider, a small (125ml) glass of wine or a single (25ml) measure of spirit”, with each of these drinks containing approximately one UK unit of alcohol. Weekly alcohol consumption was treated as a continuous measure of UK units.

Mothers’ weekly alcohol consumption between zero and three years post pregnancy and at six years post pregnancy were calculated using the question “How often have you drunk alcoholic drinks”. Participants selected one of the following responses: “Never”; “Less than 1 glass a week”; “At least 1 glass a week”; “1–2 glasses every day”; “At least 1–9 glasses every day”; “At least 10 glasses every day”. At zero years post pregnancy, this question related to the amount of alcohol consumed before the current pregnancy. At four years post pregnancy and between seven and 12 years post pregnancy, weekly alcohol consumption was calculated from self-reported beers/ciders, wines, spirits, other alcohol or low alcohol beverages consumed on each day of the previous week. Weekly alcohol consumption at 18 years post pregnancy was calculated by multiplying the number of days the mother generally drank by the number of drinks consumed on a typical drinking data (S1 Table).

Offspring’s weekly alcohol consumption between ages 15 and 21 years was calculated by multiplying the frequency at which the child drank by the number of drinks consumed on a normal drinking day (S2 Table).

Potential covariates.

Mothers’ covariates included: Cigarettes per day treated as a continuous measure at one, two, three and six years post pregnancy; age in years; social class (III manual skilled, IV and V unskilled manual or casual workers or those who rely on state for their income/I and II professional occupations and managerial and technical occupations and III non-manual skilled workers); highest level of education (certificate of secondary education/vocational qualification/O level/A level/Degree); cannabis, antidepressant, amphetamine and opiate use in the past year (No/Yes) at one, two, three and six years post pregnancy.

Offspring’s covariates included: Sex; ethnicity (white/other); DSM-IV classification of anxiety, depression, conduct disorder and ADHD at ages 7, 10, 13 and 15 years; Binge eating in the past year at ages 13, 14, 16 and 18 years; antisocial behaviour at ages 11, 13, 14, 15, 18, 19 and 21 years (all coded No/Yes); and mothers highest level of education (certificate of secondary education/vocational qualification/O level/A level/Degree).

Genetic Measures

ALSPAC offspring were genotyped using the Illumina HumanHap550 quad chip genotyping platforms by 23andMe subcontracting the Wellcome Trust Sanger Institute, Cambridge, UK and the Laboratory Corporation of America, Burlington, NC, US. ALSPAC mothers were genotyped using the Illumina human660W-quad array at Centre National de Génotypage (CNG). Following quality control (individual call rate > 0.97, SNP call rate >0.95, MAF > 0.01, HWE > 1E-7, cryptic relatedness within mothers and within offspring IBD < 0.1, non-European clustering individuals removed) 9237 offspring and 8196 mothers were retained with 477482 SNP genotypes in common between them. SNPs were flipped to forward strand and haplotypes were estimated on the combined sample using ShapeIT (v2.r644). Imputation was performed using Impute V2.2.2 against all 2186 reference haplotypes (including non-Europeans) in the December 2013 release of the 1000 genomes reference haplotypes (Version 1 Phase 3). This gave 8237 eligible offspring and 8196 eligible mothers with available genotype data.

To identify SNPs with some a priori evidence of association with alcohol consumption/ potentially associated with alcohol consumption, we searched the NHGRI-EBI GWAS catalogue [33] for published GWAS of alcohol-related phenotypes, using the following search terms: “Alcohol consumption”; “Alcohol dependence”; and “Alcoholism”. 68 SNPs were associated with at least one of the latter phenotypes at a P value < 1.0 x 10−5. We supplemented this list with a search for candidate gene or functional studies of alcohol consumption, identifying a further 23 SNPs, bringing the total number of SNPs to 91 (note that three SNPs were identified from functional studies that had already been identified by the GWAS search). We then extracted genotype data for the 91 SNPs from ALSPAC. 31 SNPs were directly genotyped. For the remaining 60 SNPs, imputed genotypes with imputation r2>0.8 were available (two SNPs with imputation r2<0.8 were excluded). 89 SNPs were therefore included in the analyses. A full summary of selected SNPs is provided in S3 Table.

Statistical Analysis

All analyses were conducted using Stata13 [34]. We tested for association between weekly alcohol consumption and the 89 SNPs in both mothers and offspring separately. For alcohol consumption, an abundance of zeros was expected, since no consumption would be reported both by those offspring and mothers who never drink, and by offspring and mothers who had not drank in the preceding week. To analyse cross-sectional and repeated measures of these data we log-transformed units and focussed analysis only on those who had ever reported drinking (i.e. dropping non-drinkers). We modelled log units using a linear mixed model and calculated the beta coefficient of each SNP as a function of the number of copies of the minor allele. The outcome was also assessed cross-sectionally using a log-linear regression at each time point separately. In each model, we adjusted for age and controlled for population stratification using the first 10 principal components. To test for pleiotropic effects, we examined the associations between the 89 SNPs and 48 potential confounders detailed above.

We used the Bonferroni method to correct for multiple testing. Evidence for association was taken at p = 0.00056 (0.05/89) for repeated measures analyses and p = 0.000037 (0.05/number of time points*89) for the cross sectional analyses.

To determine a PGRS for alcohol consumption, we randomly separated the individuals into 80% training and 20% discovery sets. Repeated measures of log units were modelled in the training set, separately for each of the 89 SNPs with beta coefficients, their standard error and corresponding p-values recorded. Using p-value thresholds of 0.01, 0.05, 0.1, 0.2, 0.4 and 0.5 for inclusion, we then created a weighted PGRS for each threshold. These scores were used to predict repeated measures of log-units in the 20% discovery set, with R-squared recorded for the score corresponding to each p-value threshold. The process was repeated five times, and the p-value threshold with the highest R-squared was taken as optimal. This was done independently for mothers and offspring data. Finally, the AVENGEME [35] algorithm was employed to estimate variance of the trait (i.e. alcohol units consumed in a week) explained by the PGRSs.

To demonstrate a possible use of the PGRS, we tested the association between our PGRS on proxy measures for cardiovascular disease [9] which were available both in mothers and in their offspring, namely HDL cholesterol, systolic blood pressure and diastolic blood pressure. These were modelled against the two PGRS while controlling for age at measurement.

Sensitivity Analysis

The above statistical methods were used to conduct the following sensitivity analyses using the mother’s data: (a) excluding individuals who were pregnant at completion of the questionnaire; and (b) excluding weekly alcohol consumption measures at four, seven, eight and 12 years post pregnancy in the mothers as these questions were phrased differently from the other time points (S1 Table).


ALSPAC Mothers

For the ALSPAC mothers, there were 67988 questionnaire responses to units consumed, between the ages of 28 to 48 years. Alcohol consumption increased over the course of the questionnaires and over age (Table 1).

In the repeated measures analysis, six SNPs had a p value < 0.05. Following correction for multiple testing, one SNP (rs1229984) showed evidence for association with alcohol consumption (β = -0.177, SE = 0.042, p = 0.00002) (Table 2 and S4 Table). In the cross sectional analyses, 27 SNPs had a p value < 0.05 at a minimum of one time point. The top ranked SNP was rs1229984 with the alcohol consumption variable measured at 12 years post pregnancy (difference in units per week for each additional copy of the minor allele = -0.326, SE = 0.084, P = 0.0001). This SNP also showed evidence for association (multiple testing threshold) at baseline (difference in units per week for each additional copy of the minor allele = -0.19, SE = 0.051, P = 0.0002) and was consistently negatively associated with alcohol consumption across all time points (Fig 1). (S5 Table). None of the mother’s covariates showed evidence for association with any of individual SNPs following correction for multiple testing (S6 Table).

Fig 1. Association between rs1229984 and alcohol consumption over time.

Table 2. Top 20 SNPs for alcohol consumption (ranked by p value) for both mothers and offspring.

ALSPAC Offspring

Units of alcohol consumed were measured 17998 times over 5 questionnaires during adolescence for the ALSPAC offspring, from age 15 to 20.5. The average number of units peaked at 13 per week at age 18 years, dropping to 8 per week by age 21 years (Table 1).

In the repeated measures analysis, six SNPs had a p value < 0.05. Following correction for multiple testing, no SNPs showed evidence for association (Table 2 and S4 Table). In cross sectional analyses, 28 SNPs had a p value < 0.05 at a minimum of one time point. The top ranked SNP in the offspring was rs2228093 with the alcohol consumption variable measures at age 18 years (difference in units per week for each additional copy of the minor allele = -0.105, SE = 0.036, P = 0.004), however this did not meet the p value threshold for multiple testing (S5 Table). In contrast to the ALSPAC mothers, rs1229984 did not pass multiple testing thresholds for alcohol consumption in repeated measures analysis and did not show a consistent pattern across all time points tested (Fig 1). None of the offspring’s covariates showed evidence for association with any of individual SNPs following correction for multiple testing (S6 Table).

Sensitivity Analysis

When excluding non-pregnant women and data from questionnaires at four, seven, eight and 12 years post pregnancy, rs1229984 remained the only SNP associated with weekly alcohol consumption (excluding pregnant women: increase in units per week for each additional copy of the minor allele = -0.159, SE = 0.042, P = 0.0001; excluding questionnaires: increase in units per week for each additional copy of the minor allele = -0.149, SE = 0.039, P = 0.0001) (S7 Table).

Polygenic Risk Score

The best allelic score was generated using p value cut offs of 0.5 and 0.05 for the mothers and offspring, respectively, which resulted in including a total of 42 SNPs (out of 89) for mothers and 6 SNPs (out of 89) for offspring. For the ALSPAC mothers, the variance in units of alcohol per week explained was 0.3% (95% CI 0.13% to 0.76%). For the ALSPAC offspring the variance explained was 0.66% (95% CI 0.22% to 1.3%). Neither of the PGRS showed evidence for association with any of the confounders, i.e. there was no evidence to suggest that the PGRS could be violating the second assumption of instrumental variables (that the instrument is independent of the confounders of the original exposure-outcome association) and therefore being invalid as an instrumental variable to proxy for alcohol intake (S6 Table).

When modelling the effect of our PGRS on cardiovascular disease risk factors, there was evidence for association between the offspring PGRS and offspring diastolic blood pressure in the expected direction (beta = 1.61, 95% CI 0.25 to 2.97, p = 0.020). However, our estimates of the association between the offspring PGRS and both HDL cholesterol and systolic blood pressure provided no strong evidence of association (HDL cholesterol: beta = 0.06, 95% CI -0.03 to 0.14, p = 0.177; systolic blood pressure: beta = 1.44, 95% CI -0.84 to 3.72, p = 0.216). Similarly, there was no statistical evidence for association between the mothers PGRS and any of the cardiovascular disease risk factors (HDL cholesterol beta = 0.39, 95% CI -0.10 to 0.17, p = 0.565; systolic blood pressure beta = -0.68, 95% CI -7.05 to 5.69, p = 0.0.835; diastolic blood pressure beta = 2.14, 95% CI -2.34 to 6.62, p = 0.350).


The aim of this study was to develop a polygenic risk score for alcohol consumption, in view of using this to assess the causal impact of alcohol on health related outcomes such as cardiovascular disease. Literature searches of published GWAS and functional studies identified 89 candidate SNPs that had previously shown some evidence of association with alcohol-related phenotypes. Using repeated measures analysis of alcohol behaviour over the course of a 20 year period, we found strong evidence confirming that rs1229984 plays a role in alcohol consumption, confirming previous results [9]. This SNP was associated with a decrease of 0.84 units of alcohol per week, on average. It was found to be associated in cross-sectional analyses of questionnaires measured 20 years apart and effect estimates were stronger in the repeated measures analysis, strengthening the evidence that it relates to alcohol consumption throughout the life course. The PGRS derived through cross-validation only explained a modest proportion of the variance in alcohol consumption (0.3% for mothers and 0.66% for the offspring in our sample).

The score could in principle be used to conduct MR analyses for example in the field of cardiovascular disease (CVD), although large sample sizes would be required. The British Heart Foundation estimates that there are 7m people living with CVD in the UK (~10% of the population) [36], if the odds ratio for alcohol consumption on CVD incidence was 0.75 (OR for CVD mortality used) [37], we would require a sample size of 126,500 (assuming a 1:1 ratio, 80% power, alpha = 0.05 and R2 = 0.3% from the mothers PGRS result), or 54,200 (assuming a 1:1 ratio, 80% power, alpha = 0.05 and R2 = 0.7% from the offspring’s PGRS result)[38]. Similarly, the incidence odds ratio for coronary heart disease (CHD) is 0.71 [37], with 2.3 million in the UK living with CHD. To perform an MR analysis to examine the effect of alcohol consumption on CHD using the mothers PGRS we would need 89,300 with 38,300 individuals needed for the offspring’s PGRS. The required sample sizes are much larger than those in our sample, therefore our tests of association between the PGRSs and proxy measures of cardiovascular disease are underpowered. As such, we cannot be certain that the lack of associations are representative of null results.

Strengths and limitations

ALSPAC is a well characterised birth cohort with repeated measures of alcohol use, which have been used in several other studies of substance use [3941]. Moreover, the available data were collected on mothers and their offspring over the course of 20 years. As such, they are an excellent resource to investigate alcohol behaviour over time. The nature of this dataset allowed for the use of repeated measures to strengthen the phenotype. Furthermore, the wealth of additional data allowed for detailed sensitivity analyses and examining of a wide range of potential covariates to test for pleiotropic effects of the alcohol variants and the derived PGRSs. An additional strength comes from the way our PGRS was constructed. By taking a limited number of SNPs that have previously shown some evidence of association with alcohol behaviours (either from GWAS or functional literature) we have developed a PGRS that has a reduced number of SNPs compared to the numbers that might be required if using p value cut offs from whole GWAS. The advantage to this approach is that the resulting PGRS is less likely to have pleiotropic effects than one from a deep GWAS list. Furthermore, this finite number of SNPs would therefore be more cost effective to genotype and could, therefore, be feasibly used in a study that does not have access to genome wide data.

There are also some limitations which need to be considered when interpreting the results of this study. First, the set of SNPs identified through searches came from GWAS analyses of alcohol dependence [2124] and so might not show an association with our phenotype (units of alcohol per week). Meanwhile, functional literature reports the role of genetic variants in metabolism, however the effects of these genes are not taken as far as alcohol consumption. Second, alcohol consumption questions were not uniform over time, however sensitivity analyses excluding data from a different version of the questionnaire returned similar results. Third, our data on alcohol consumption are based on self-report and so may be subject to misclassification. However, there are currently no reliable biological alternatives for alcohol use in a general population sample [42], with current biomarkers only being able to identify long term heavy use [43, 44]. One might expect to find that the direction of bias differs in the two populations of mothers and offspring, as mothers might underreport their use (negatively impacting estimates), while offspring might over-report their consumption (positively impacting estimates) [4547]. Fourth, there was loss to follow up, with greater proportions of missing data in later questionnaires, which reduced statistical power and could lead to selection bias if alcohol consumption is related to the loss to follow up. This drop in sample size also meant that stratifying the offspring analysis by gender would reduce the power, however we did adjust for gender in this analysis. Furthermore, we were unable to examine the association between SNPs and alcohol consumption in adult males (i.e. fathers) as their genetic data was not available. Finally, we found no suitable independent cohort study with life course alcohol consumption data for testing PGRS performance and hence we used cross-validation in ALSPAC. However, it has previously been reported that sample sizes such as those used in this analysis are adequate when using two separate ‘training’ and ‘testing’ samples [48].

Findings in relation to other research

Burgess and colleagues suggested that variants with known biology are better for use in MR studies [49]. The underlying biology of some of the SNPs (those selected from functional literature) included in our analysis is known, and linked to changes in alcohol metabolism, and as such, would be better for use in a PGRS MR analysis. One SNP (rs1229984 in ADH1B) was consistently identified as being associated with alcohol consumption in ALSPAC mothers. This SNP has previously been used in an MR framework, after Holmes and colleagues validated it as a genetic instrument by providing solid evidence for association with various alcohol phenotypes (including units of alcohol per week) in a sample of >200,000 participants [9]. In their estimate, carriers of the minor allele consumed 17.2% fewer units per week than non-carriers, which is very similar to our result of 0.177 fewer log units per week (equivalent to 16.2% fewer units per week).

The set of SNPs included in the mothers PGRS and offspring PGRS were different, possibly due to age and gender effects. Previous literature has suggested that the heritability of alcohol consumption changes across the life course. Estimates start to increase at the age of 15 years and peak in the mid-20’s [1417]. It is therefore possible that the offspring are so young that their genetic potential to abuse or avoid alcohol is not yet fully expressed. Conversely, the age of the ALSPAC mothers at baseline ranged from 14 to 46 years (mean = 28 years), with these individuals being followed up for 20 years. Since the mothers’ longitudinal analyses cover a wide range of ages across the life course, it is not possible to make assumptions about the impact of age on the PGRS composition or the proportion of the variance it explains, in relation to the offspring’s PGRS. Additionally, gender differences may also have a role if there are systematic differences in alcohol consumption by gender. However, stratifying by gender here would reduce power in the analysis.

PGRS have previously been used in an MR framework to evaluate the causal effect of a number of traits/exposures, with proportions of the variance explained in the trait in the range or 1.5–3% (e.g. BMI: 1.5% - 2.5% [5052], type 2 diabetes: ~2% [53], schizophrenia: ~3% [54]). However, the proportion of the variance explained by our PGRS is comparable to the variance of age of onset of alcohol consumption explained by a previously reported PGRS [55]. In our analysis, the variation explained by the initial 89 SNPs selected was estimated to be between 0.13% and 0.76% for the ALSPAC mothers. A previous PGRS for tobacco (cigarettes smoked per day) was shown to explain 0.4–0.5% of the variance in glasses of alcohol per week [56], which is comparable in magnitude to the variance explained by our PGRS for alcohol consumption. This provides additional evidence that some genetic risk factors are shared between substances, suggesting that incorrect effect estimates could be introduced through pleiotropy. However, the lack of evidence for association between our PGRSs and potential confounders (including tobacco and other drug use) suggesting minimal evidence for pleiotropy. These comparisons are limited by design differences (i.e. genome wide analysis in previous literature compared to the candidate gene approach here). However, in theory, our selection process identified ‘a-priori’ candidates and we would therefore expect a higher proportion of the variance to be explained in this analysis. This highlights how little we know about the genetic contribution to alcohol consumption.


The PGRSs developed in our analyses explained a modest proportion of the variance in alcohol consumption for both ALSPAC mothers and offspring. For future MR analyses examining the causal effects of drinking alcohol in the general population, the mothers’ PGRS reported here is most likely a more suitable genetic proxy as it is based on a breadth of ages, although one limitation to this discovery sample is the inclusion of women only. Very large sample sizes, such as those from multi-study consortia, would be required if these PGRSs were to be used as genetic instruments in MR analyses.

Supporting Information

S1 Table. Mothers questionnaire information–Alcohol consumption.


S2 Table. Offspring questionnaire information–Alcohol consumption.


S4 Table. All results for repeated measures alcohol consumption in ALSPAC mothers and offspring.


S5 Table. All results for cross sectional alcohol consumption, mothers (time points M0 to M18) and offspring (times points C15 to C21).


S6 Table. Associations between PGRS, individual SNPs and potential confounders (Bonferroni corrected p-value = 0.00011).


S7 Table. Sensitivity analysis for repeated measures analysis in ALSPAC mothers.



We are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses. The UK Medical Research Council and the Wellcome Trust and the University of Bristol provide core support for ALSPAC. This publication is the work of the authors and all authors will serve as guarantors for the contents of this paper. GWAS data was generated by Sample Logistics and Genotyping Facilities at the Wellcome Trust Sanger Institute and LabCorp (Laboratory Corporation of American using support from 23andMe.

Author Contributions

  1. Conceptualization: LZ PH.
  2. Data curation: MT AS.
  3. Formal analysis: AS MT FD.
  4. Investigation: AS MT FD.
  5. Methodology: AS MT FD.
  6. Project administration: LZ.
  7. Resources: MT LZ PC AS FD.
  8. Supervision: LZ.
  9. Validation: MT LZ PC FD AS.
  10. Visualization: AS MT.
  11. Writing – original draft: MT.
  12. Writing – review & editing: MT AS LZ FD PC.


  1. 1. Rehm J, Shield K, Rehm M, Gmel G, Frick U. Alcohol consumption, alcohol dependence and attributable burden of disease in Europe. Centre for Addiction and Mental Health. 2012.
  2. 2. Organization WH. Global status report on alcohol and health, 2014. 2014.
  3. 3. Koppes LL, Dekker JM, Hendriks HF, Bouter LM, Heine RJ. Moderate Alcohol Consumption Lowers the Risk of Type 2 Diabetes A meta-analysis of prospective observational studies. Diabetes care. 2005;28(3):719–25. pmid:15735217
  4. 4. Glymour M. Alcohol and cardiovascular disease. Bmj. 2014;349:g4334. pmid:25011451
  5. 5. Lawlor DA, Harbord RM, Sterne JA, Timpson N, Davey Smith G. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Statistics in medicine. 2008;27(8):1133–63. pmid:17886233
  6. 6. Smith GD, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Human molecular genetics. 2014;23(R1):R89–R98. pmid:25064373
  7. 7. Smith GD, Lawlor DA, Harbord R, Timpson N, Day I, Ebrahim S. Clustered environments and randomized genes: a fundamental distinction between conventional and genetic epidemiology. PLoS medicine. 2007;4(12):e352. PubMed Central PMCID: PMC2121108. pmid:18076282
  8. 8. Rehm J, Irving H, Ye Y, Kerr WC, Bond J, Greenfield TK. Are lifetime abstainers the best control group in alcohol epidemiology? On the stability and validity of reported lifetime abstention. American journal of epidemiology. 2008;168(8):866–71. pmid:18701442
  9. 9. Holmes MV, Dale CE, Zuccolo L, Silverwood RJ, Guo Y, Ye Z, et al. Association between alcohol and cardiovascular disease: Mendelian randomisation analysis based on individual participant data. Bmj. 2014;349:g4164. pmid:25011450
  10. 10. Irons DE, McGue M, Iacono WG, Oetting WS. Mendelian randomization: a novel test of the gateway hypothesis and models of gene-environment interplay. Development and psychopathology. 2007;19(4):1181–95. pmid:17931442
  11. 11. Yeung SLA, Jiang C, Cheng KK, Liu B, Zhang W, Lam TH, et al. Is aldehyde dehydrogenase 2 a credible genetic instrument for alcohol use in Mendelian randomization analysis in Southern Chinese men? International journal of epidemiology. 2013;42(1):318–28. pmid:23243119
  12. 12. Taylor AE, Lu F, Carslake D, Hu Z, Qian Y, Liu S, et al. Exploring causal associations of alcohol with cardiovascular and metabolic risk factors in a Chinese population using Mendelian randomization analysis. Scientific reports. 2015;5.
  13. 13. Burgess S, Thompson SG. Use of allele scores as instrumental variables for Mendelian randomization. International journal of epidemiology. 2013;42(4):1134–44. pmid:24062299
  14. 14. Bevilacqua L, Goldman D. Genes and addictions. Clinical pharmacology and therapeutics. 2009;85(4):359. pmid:19295534
  15. 15. Verhulst B, Neale M, Kendler K. The heritability of alcohol use disorders: a meta-analysis of twin and adoption studies. Psychological medicine. 2015;45(05):1061–72.
  16. 16. Bergen SE, Gardner CO, Kendler KS. Age-related changes in heritability of behavioral phenotypes over adolescence and young adulthood: a meta-analysis. Twin Research and Human Genetics. 2007;10(03):423–33.
  17. 17. Swan GE, Carmelli D, Rosenman RH, Fabsitz RR, Christian JC. Smoking and alcohol consumption in adult male twins: genetic heritability and shared environmental influences. J Subst Abuse. 1990;2(1):39–50. pmid:2136102
  18. 18. Li MD, Burmeister M. New insights into the genetics of addiction. Nature Reviews Genetics. 2009;10(4):225–31. pmid:19238175
  19. 19. Edenberg HJ. The genetics of alcohol metabolism: role of alcohol dehydrogenase and aldehyde dehydrogenase variants. Alcohol Research & Health. 2007;30(1):5–14.
  20. 20. Prescott C, Sullivan P, Kuo P, Webb B, Vittum J, Patterson De , et al. Genomewide linkage study in the Irish affected sib pair study of alcohol dependence: evidence for a susceptibility region for symptoms of alcohol dependence on chromosome 4. Molecular psychiatry. 2006;11(6):603–11. pmid:16534506
  21. 21. Gelernter J, Kranzler H, Sherva R, Almasy L, Koesterer R, Smith A, et al. Genome-wide association study of alcohol dependence: significant findings in African-and European-Americans including novel risk loci. Molecular psychiatry. 2014;19(1):41–9. pmid:24166409
  22. 22. Bierut LJ, Agrawal A, Bucholz KK, Doheny KF, Laurie C, Pugh E, et al. A genome-wide association study of alcohol dependence. Proceedings of the National Academy of Sciences. 2010;107(11):5082–7.
  23. 23. Treutlein J, Cichon S, Ridinger M, Wodarz N, Soyka M, Zill P, et al. Genome-wide association study of alcohol dependence. Archives of general psychiatry. 2009;66(7):773–84. pmid:19581569
  24. 24. Heath AC, Whitfield JB, Martin NG, Pergadia ML, Goate AM, Lind PA, et al. A quantitative-trait genome-wide association study of alcoholism risk in the community: findings and implications. Biological psychiatry. 2011;70(6):513–8. pmid:21529783
  25. 25. Park BL, Kim JW, Cheong HS, Kim LH, Lee BC, Seo CH, et al. Extended genetic effects of ADH cluster genes on the risk of alcohol dependence: from GWAS to replication. Human genetics. 2013;132(6):657–68. pmid:23456092
  26. 26. Dick DM, Foroud T. Candidate genes for alcohol dependence: a review of genetic evidence from human studies. Alcoholism: Clinical and Experimental Research. 2003;27(5):868–79.
  27. 27. Treutlein J, Rietschel M. Genome-wide association studies of alcohol dependence and substance use disorders. Current psychiatry reports. 2011;13(2):147–55. pmid:21253885
  28. 28. Birley AJ, James MR, Dickson PA, Montgomery GW, Heath AC, Martin NG, et al. ADH single nucleotide polymorphism associations with alcohol metabolism in vivo. Human molecular genetics. 2009;18(8):1533–42. pmid:19193628
  29. 29. Golding J, Pembrey M, Jones R, Team AS. ALSPAC—the Avon Longitudinal Study of Parents and Children. I. Study methodology. Paediatr Perinat Epidemiol. 2001;15(1):74–87. pmid:11237119
  30. 30. Boyd A, Golding J, Macleod J, Lawlor DA, Fraser A, Henderson J, et al. Cohort Profile: the 'children of the 90s'—the index offspring of the Avon Longitudinal Study of Parents and Children. International journal of epidemiology. 2013;42(1):111–27. PubMed Central PMCID: PMC3600618. pmid:22507743
  31. 31. Fraser A, Macdonald-Wallis C, Tilling K, Boyd A, Golding J, Davey Smith G, et al. Cohort Profile: the Avon Longitudinal Study of Parents and Children: ALSPAC mothers cohort. International journal of epidemiology. 2013;42(1):97–110. PubMed Central PMCID: PMC3600619. pmid:22507742
  32. 32. ALSPAC. Data Dictionary; archived at
  33. 33. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic acids research. 2014;42(D1):D1001–D6.
  34. 34. StataCorp. Stata Statistical Software: Release 13. 2013.
  35. 35. Palla L, Dudbridge F. A fast method that uses polygenic scores to estimate the variance explained by genome-wide marker panels and the proportion of variants affecting a trait. The American Journal of Human Genetics. 2015;97(2):250–9. pmid:26189816
  36. 36. BHF. Cardiovascular disease statistics 2015: British Heart Foundation; 2015 [09 June 2016]. Available from:
  37. 37. Ronksley PE, Brien SE, Turner BJ, Mukamal KJ, Ghali WA. Association of alcohol consumption with selected cardiovascular disease outcomes: a systematic review and meta-analysis. Bmj. 2011;342:d671. pmid:21343207
  38. 38. Brion M-JA, Shakhbazov K, Visscher PM. Calculating statistical power in Mendelian randomization studies. International journal of epidemiology. 2013;42(5):1497–501. pmid:24159078
  39. 39. Heron J, Macleod J, Munafo MR, Melotti R, Lewis G, Tilling K, et al. Patterns of alcohol use in early adolescence predict problem use at age 16. Alcohol and alcoholism. 2012;47(2):169–77. PubMed Central PMCID: PMC3284685. pmid:22215001
  40. 40. Melotti R, Heron J, Hickman M, Macleod J, Araya R, Lewis G, et al. Adolescent alcohol and tobacco use and early socioeconomic position: the ALSPAC birth cohort. Pediatrics. 2011;127(4):e948–55. Epub 2011/03/16. pmid:21402626
  41. 41. MacArthur GJ, Smith MC, Melotti R, Heron J, Macleod J, Hickman M, et al. Patterns of alcohol use and multiple risk behaviour by gender during early and late adolescence: the ALSPAC cohort. Journal of public health (Oxford, England). 2012;34 Suppl 1:i20–30. PubMed Central PMCID: PMC3284864.
  42. 42. Lees R, Kingston R, Williams T, Henderson G, Lingford-Hughes A, Hickman M. Comparison of Ethyl Glucuronide in Hair with Self-Reported Alcohol Consumption. Alcohol and alcoholism. 2012;47(3):267–72. pmid:22336766
  43. 43. Peterson K. Biomarkers for alcohol use and abuse-a summary. Alcohol Research and Health. 2004;28(1):30. pmid:19006989
  44. 44. Achur RN, Freeman WM, Vrana KE. Circulating cytokines as biomarkers of alcohol abuse and alcoholism. Journal of Neuroimmune Pharmacology. 2010;5(1):83–91. pmid:20020329
  45. 45. Midanik L. The validity of self‐reported alcohol consumption and alcohol problems: A literature review. British journal of addiction. 1982;77(4):357–82. pmid:6762224
  46. 46. Stockwell T, Donath S, Cooper‐Stanbury M, Chikritzhs T, Catalano P, Mateo C. Under‐reporting of alcohol consumption in household surveys: a comparison of quantity–frequency, graduated–frequency and recent recall. Addiction. 2004;99(8):1024–33. pmid:15265099
  47. 47. Edwards AL. The social desirability variable in personality assessment and research1957.
  48. 48. Dudbridge F. Power and predictive accuracy of polygenic risk scores. PLoS genetics. 2013;9(3):e1003348. pmid:23555274
  49. 49. Burgess S, Butterworth AS, Thompson JR. Beyond Mendelian randomization: how to interpret evidence of shared genetic predictors. Journal of clinical epidemiology. 2016;69:208–16. pmid:26291580
  50. 50. Clarke T, Hall L, Fernandez-Pujals A, MacIntyre D, Thomson P, Hayward C, et al. Major depressive disorder and current psychological distress moderate the effect of polygenic risk for obesity on body mass index. Translational psychiatry. 2015;5(6):e592.
  51. 51. Hung C-F, Breen G, Czamara D, Corre T, Wolf C, Kloiber S, et al. A genetic risk score combining 32 SNPs is associated with body mass index and improves obesity prediction in people with major depressive disorder. BMC medicine. 2015;13(1):1.
  52. 52. Walter S, Kubzansky LD, Koenen KC, Liang L, Tchetgen Tchetgen EJ, Cornelis MC, et al. Revisiting mendelian randomization studies of the effect of body mass index on depression. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics. 2015;168(2):108–15.
  53. 53. Shen L, Walter S, Melles RB, Glymour MM, Jorgenson E. Diabetes Pathology and Risk of Primary Open-Angle Glaucoma: Evaluating Causal Mechanisms by Using Genetic Information. American journal of epidemiology. 2016;183(2):147–55. pmid:26608880
  54. 54. Taylor AE, Burgess S, Ware JJ, Gage SH, Richards JB, Smith GD, et al. Investigating causality in the association between 25 (OH) D and schizophrenia. Scientific reports. 2016;6:26496. pmid:27215954
  55. 55. Chou Y, Madden P, Bierut L, Heath A, Bucholz K, Agrawal A. Genome-wide polygenic scores for age at onset of alcohol dependence and association with alcohol-related measures. Translational Psychiatry. 2016;22:e761.
  56. 56. Vink JM, Hottenga JJ, de Geus EJ, Willemsen G, Neale MC, Furberg H, et al. Polygenic risk scores for smoking: predictors for alcohol and cannabis use? Addiction. 2014;109(7):1141–51. PubMed Central PMCID: PMC4048635. pmid:24450588