Phenotype Refinement Strengthens the Association of AHR and CYP1A1 Genotype with Caffeine Consumption

Two genetic loci, one in the cytochrome P450 1A1 (CYP1A1) and 1A2 (CYP1A2) gene region (rs2472297) and one near the aryl-hydrocarbon receptor (AHR) gene (rs6968865), have been associated with habitual caffeine consumption. We sought to establish whether a more refined and comprehensive assessment of caffeine consumption would provide stronger evidence of association, and whether a combined allelic score comprising these two variants would further strengthen the association. We used data from between 4,460 and 7,520 women in the Avon Longitudinal Study of Parents and Children, a longitudinal birth cohort based in the United Kingdom. Self-report data on coffee, tea and cola consumption (including consumption of decaffeinated drinks) were available at multiple time points. Both genotypes were individually associated with total caffeine consumption, and with coffee and tea consumption. There was no association with cola consumption, possibly due to low levels of consumption in this sample. There was also no association with measures of decaffeinated drink consumption, indicating that the observed association is most likely mediated via caffeine. The association was strengthened when a combined allelic score was used, accounting for up to 1.28% of phenotypic variance. This was not associated with potential confounders of observational association. A combined allelic score accounts for sufficient phenotypic variance in caffeine consumption that this may be useful in Mendelian randomization studies. Future studies may therefore be able to use this combined allelic score to explore causal effects of habitual caffeine consumption on health outcomes.


Introduction
Caffeine is one of the most widely-consumed psychoactive substances world-wide, and while coffee and tea consumption dominate, it is also present in some soft drinks [1]. There is also considerable inter-individual variability in preference for caffeine [2], in part due to genetic factors. Twin studies have consistently indicated substantial (,50%) heritability of caffeine consumption (typically assessed as coffee consumption) [3][4][5][6][7][8][9]. Recently, a number of genome-wide association studies have identified variants robustly associated with caffeine consumption (again, typically assessed as coffee consumption) [10][11][12]. In particular, two loci, one in the cytochrome P450 1A1 (CYP1A1) and 1A2 (CYP1A2) gene region on chromosome 15 and one near the arylhydrocarbon receptor (AHR) gene on chromosome 7, have been found to be associated with habitual caffeine consumption across a number of studies [10][11][12][13]. Two single nucleotide polymorphisms, rs2472297 in between CYP1A1 and CYP1A2, and rs6968865 51 kb upstream of AHR, provide the strongest signals, each with an effect equivalent to an increased consumption of ,0.2 cups per day per risk (T) allele. The genes are biologically plausible candidates for caffeine consumption phenotypes as they both encode members of the same biochemical pathway. AHR is known to induce CYP1A1 and CYP1A2 by binding to the DNA in the region between these two genes [12], and low CYP1A2 activity has been associated with higher caffeine toxicity [14].
A limitation of studies to date is that they have typically used a single measure of caffeine consumption (e.g., coffee). One study [11] measured total caffeine consumption, but coffee contributed towards 80% of this, and data on other sources of caffeine were not reported separately. While coffee represents the major source of caffeine consumption in some countries, other sources of caffeine can be important. We have previously shown that phenotypic assessments which more accurately capture the exposure of interest can improve the precision of genetic association studies [15], particularly when the exposure (e.g., caffeine consumption) is strongly influenced by behaviour or behavioural choices (e.g., preference for coffee or tea). We therefore sought to establish whether using a more comprehensive phenotypic assessment of caffeine consumption, using measures of coffee, tea and cola consumption, would provide stronger evidence of association with rs2472297 and rs6968865. We were also interested in whether a combined allelic score comprising these two variants would further strengthen the association with caffeine consumption.

Study Sample
The Avon Longitudinal Study of Parents and Children (ALSPAC) sample is a longitudinal birth cohort that comprises 20,248 pregnancies. The mothers of 14,541 (71.8%) pregnancies were recruited antenatally during 1990-92 (Phase I). Post-natal recruitment to the 'Focus@7' clinical assessment at the age of ,7 years recruited a further 456 children from 452 (2.2% of eligible) pregnancies (Phase II). Recruitment during ages 8-18 years (Phase III) added a further 257 children from 254 (1.2% of eligible) pregnancies, giving an overall total of 15,247 (75.3% of eligible) enrolled pregnancies; from these pregnancies there were 14,775 live-born children of which 14,701 were alive at one year of age. The phases of enrolment are described in more detail in the cohort profile paper [16]. The ALSPAC website contains details of all the data that are available through a fully searchable data dictionary: http://www.bristol.ac.uk/alspac/researchers/data-access/ data-dictionary/. Ethics approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees (Bristol and Weston Health Authority, Southmead Health Authority, Frenchay Health Authority).

Measures of Caffeine Consumption
Data on coffee and tea consumption were collected via selfreport during pregnancy at 8, 18 and 32 weeks gestation and 2, 47, 85, 97 and 145 months after delivery. Participants were asked to report ''current daily coffee and tea drinking'', as number of drinks, separately for weekdays and weekends. Similar questions were asked for cola consumption in drinks per week. For cola consumption, questions were open format at 8, 18, and 32 weeks gestation, and 2 months after delivery, and closed format at later time points (''never or rarely'', ''once in 2 weeks'', ''1 to 3 times a week'', ''4 to 7 times a week'', ''once a day or more''). Closed format responses were recoded to 0, 0.5, 2, 5.5 and 7 drinks per week, and cola consumption values further recoded to reflect daily consumption. Outlying daily consumption values (.10 drinks for coffee, .15 drinks for tea and .21 drinks for cola) were coded as missing data. Similar questions were also asked for decaffeinated coffee, tea and cola consumption at the same time points, and coded in the same way. In order to obtain a measure of total daily caffeine consumption, number of cups of tea and coffee were summed with drinks per day of cola, weighted with respect to approximate caffeine content (coffee 75; tea 40; cola 34.5) [17,18]. The distribution of total caffeine consumption, and coffee and tea consumption, is shown in Figures S1-S3.

Genotyping
Genotypes at the CYP1A1 (rs2472297) and AHR (rs6968865) loci were available from GWAS genotyping data. A total of 10,015 ALSPAC mothers were genotyped on the Illumina 660K quad chip at the Centre National de Genotypage, Paris, resulting in 557,124 directly genotyped SNPs before quality control. Genotypes were called with Illumina GenomeStudio and PLINK (v1.07) was used to carry out quality control steps.
Individuals were excluded from further analysis on the basis of having incorrect sex assignments; minimal or excessive heterozygosity, disproportionate levels of individual missingness (.5%); evidence of cryptic relatedness (.10% identical by descent) and being of non-European ancestry (as detected by a multidimensional scaling analysis seeded with HapMap 2 individuals). SNPs with a minor allele frequency of ,1% and call rate of ,95% were removed. Furthermore, only SNPs which passed an exact test of Hardy-Weinberg equilibrium (P.5610 26 ) were considered for further use. Population stratification was assessed by means of multidimensional scaling of genome-wide identity by state (IBS) pairwise distances using the four (YOR, CEU, CHB, JPT) HapMap populations as a reference. Cryptic relatedness was assessed using estimates of the proportion of SNPs expected to be identical by descent given estimates of IBS. Subject with a relatedness of 0.1 or higher were excluded. Genotypes were imputed with Markov Chain Haplotyping software (MaCH 1.0.16) (45) using CEPH individuals from phase 2 of the HapMap project as a reference set (release 22). SNP rs2472297 was directly genotyped, had a MAF of 0.27, HWE P-value of 0.1 and 0.02% missingness before imputation. SNP rs6968865 was imputed with an imputation quality of 0.96, and MAF of 0.39. After imputation genotypes were available for 8,340 subjects. The frequencies of the T allele were 0.27 in rs2472297 and 0.61 in rs6968865.

Statistical Analysis
Data on total caffeine consumption, and consumption of tea, coffee, cola and their decaffeinated counterparts, were analysed in a linear regression on number of T alleles in a univariate analysis of each SNP. Linear regression was carried out using the lm package in R (v. 2.14.0). Best-guess genotypes were used for analysis.
To obtain joint effects to take into account genotypes at both SNPs simultaneously, following Sulem and colleagues [12], the number of T alleles were summed across SNPs to derive a combined SNP score of the total number of T alleles per subject which was then used in a regression with phenotype data. For rs6968865 the T allele is the major allele, so that the SNP score contained one minor allele and one major (i.e., reference) allele. Weighting alleles using effect sizes obtained from Sulem and colleagues [12] (rs2472297 by 0.31, rs6968865 by 0.26) provided similar results and we present the results for the unweighted SNP score for simplicity.
We examined within-locus non-additivity by testing the significance of a second heterozygote term, and between-locus non-additivity by testing for a joint effect beyond the sum of the effects of both SNPs individually. Our results indicated that these SNPs act additively, and their effects are independent (although we cannot rule out more complicated interactions between these SNPs in the presence of other factors).
Data used for this submission will be made available on request to the ALSPAC executive committee (alspac-exec@bristol.ac.uk). The ALSPAC data management plan (available here: http:// www.bristol.ac.uk/alspac/researchers/data-access/) describes in detail the policy regarding data sharing, which is through a system of managed open access.

Characteristics of Participants
The total sample available for analysis comprised between 4,460 and 7,520 women (see Figure 1 for a summary of how this sample was arrived at). Levels of missingness were low unless questions on caffeine consumption were not included in one or more versions of the questionnaire at that time point. More information on ALSPAC mothers' response rates has been published previously [16].
Consumption of coffee tended to increase roughly linearly across time points (means 1.18 to 2.30 drinks per day). Consumption of tea (means 2.73 to 3.18 drinks per day) and cola (means 0.60 to 2.31 drinks per week) varied across time points, but with no clear pattern of change. As a result, total daily caffeine consumption tended to increase across time points (means 206.8 mg to 306.1 mg). These data are shown in Tables 1-4. In general, cola consumption was considerably less than tea and coffee consumption, reflecting approximately 4% to 11% of total caffeine consumption in drinks per day.
In general, the proportion of phenotypic variance explained across all time points was small, as would be expected for the association of common variants with complex behavioural phenotypes. For CYP1A1, the proportion of phenotypic variance explained ranged from 0.15% to 0.88%, while for AHR it ranged from 0.04% to 0.48%. However, the combined SNP score accounted for a somewhat higher proportion of phenotypic variance on average, ranging from 0.16% to 1.28%.
Estimates of the proportion of phenotypic variance obtained using GCTA [19] for the two SNPs in the 2-SNP score were  Table 1. Association of CYP1A1 rs2472297, AHR rs6968865 and combined SNP score with total caffeine consumption (mg).  Table 2. Association of CYP1A1 rs2472297, AHR rs6968865 and combined SNP score with coffee consumption.  Table 3. Association CYP1A1 rs2472297, AHR rs6968865 and combined SNP score with tea consumption. broadly similar to those obtained using linear regression (0.10% to 1.10% vs 0.16% to 1.28%). GCTA analysis for the remaining directly-genotyped SNPs available accounted for additional phenotypic variance, although these estimates may be unreliable due to relatively small sample size (see Table S1). Stratified analyses further indicated that these associations were present for consumption of coffee (combined SNP score: bs = 0.047 to 0.120, Ps = 2.34610 22 to 5.46610 25 ) and tea (combined SNP score: bs = 0.076 to 0.209, Ps = 2.58610 22 to 1.23610 28 ), but not cola (combined SNP score: bs = 20.046 to 0.032, Ps = 9.15610 21 to 5.51610 22 ) (Tables 2-4). Interestingly, associations for tea consumption were generally stronger than for coffee consumption. Removing participants who reported zero consumption of coffee, tea and/or cola did not alter these results substantially.
There was no evidence that either AHR or CYP1A1 genotypes, or the combined SNP score, was associated with consumption of decaffeinated coffee, tea or cola (see Tables S2-S4), indicating that the associations observed are specific to caffeinated drinks. Again, removing participants who reported zero consumption of coffee, tea and/or cola did not alter these results substantially. We also did not observe any association with measures of aversion to coffee, tea or cola taken during pregnancy (data available on request).

Potential Confounders
Next we assessed the association of the combined SNP score with potential confounders (year of birth, educational attainment, measures of socioeconomic position, alcohol use, tobacco use). These indicated no evidence of association ( Table 5), suggesting that the combined SNP score may be a useful instrumental variable in Mendelian randomization analyses [20,21]. This is in contrast with the association of total caffeine consumption with the same potential confounders, which shows very strong evidence of association at multiple time points (Table 6). A full description of these variables is provided in the ALSPAC cohort profile [16].

Discussion
Our results confirm that two SNPs in AHR and CYP1A1 are associated with caffeine consumption, and extend previous findings in two important ways. First, our results are the first to show association in a sample where caffeine consumption via caffeinated beverages other than coffee is common. Moreover, we show that a combined caffeine consumption phenotype derived from measures of consumption of three caffeinated beverages (coffee, tea and cola) provides a stronger signal than any one of these measures separately. Second, our results also confirm that these results are due to caffeine consumption, rather than some other common characteristic of caffeinated beverages. By using measures of consumption of decaffeinated drinks as negative controls we show no evidence of association with either AHR or CYP1A1. While our results hold for both SNPs individually, our strongest results are obtained when both SNPs are combined to create a 2-SNP genetic risk score.
Observationally, caffeine (or, more commonly, coffee) consumption has been shown to be associated with a number of health outcomes [22]. Evidence from longitudinal studies suggests that long-term coffee consumption may in fact be protective against cardiovascular disease [22,23] and lower the risk of all-cause mortality [24]. Coffee consumption also shows an inverse association with diabetes, although this may be due to antioxidant compounds within coffee rather than caffeine itself [23]. Obser- Table 4. Association of CYP1A1 rs2472297, AHR rs6968865 and combined SNP score with cola consumption.  Table 5. Association of combined SNP score with potential confounders. Housing tenure was coded as: bought/mortgaged/owned with no mortgage to pay, rented from private landlord, rented from council/housing association. Crowding index was coded as number of people living in household divided by the number of rooms. Highest educational level was coded as the equivalent of: none, vocational, school to age 16, school to age 18, degree or higher. Alcohol consumption was measured in drinks per week. Tobacco consumption was measured in times per day. Linearity was imposed on the categorical variables (housing tenure, educational level). Measures of alcohol and tobacco consumption shown were taken at 18 weeks gestation, but results were similar at the other time points. doi:10.1371/journal.pone.0103448.t005 Table 6. Association of total caffeine consumption (mg) with potential confounders.  Housing tenure was coded as: bought/mortgaged/owned with no mortgage to pay, rented from private landlord, rented from council/housing association. Crowding index was coded as number of people living in household divided by the number of rooms. Highest educational level was coded as the equivalent of: none, vocational, school to age 16, school to age 18, degree or higher. Alcohol consumption was measured in drinks per week. Tobacco consumption was measured in times per day. Linearity was imposed on the categorical variables (housing tenure, educational level vational studies suggest that coffee consumption may have further beneficial health effects, including reducing risk of several cancers, such as endometrial, liver and prostate cancer [25][26][27] and protecting against depression, attention deficit hyperactivity disorder and Alzheimer disease [28][29][30]. Conversely, it is recommended that caffeine consumption is restricted during pregnancy due to its association with adverse pregnancy outcomes such as intrauterine growth retardation and miscarriage [31,32]. Observational studies also suggest that caffeine consumption may be detrimental to bone health, leading to increased fracture risk [33]. However, these studies all suffer from the usual problems of residual confounding and reverse causality which limit the causal inferences that can be drawn from observational data. Mendelian randomization (MR) offers one approach to better understanding the causal nature of the observed associations between caffeine consumption and health outcomes. Genetics variants are randomly assorted during gamete formation and conception, and therefore should be unrelated to other lifestyle factors associated with coffee consumption which may confound observational associations [34]. Health outcomes cannot affect the genes that an individual has, so we know that associations from MR analyses are not due to reverse causality [34]. This may be particularly important in observational studies of the effects of caffeine as individuals may alter levels of caffeine consumption in response to ill health. In addition, caffeine consumption is difficult to measure accurately as it is usually obtained from food frequency questionnaires [35], so observational estimates may be biased by random or non-random measurement error. In contrast, MR can provide accurate estimates of the magnitude of lifelong exposure to a risk factor [36].
Critically, we have shown that the two SNPs in AHR and CYP1A1, and our 2-SNP genetic risk score, are not associated with a range of potential confounders that may give rise to spurious associations in studies of health-outcomes putatively related to caffeine consumption. This, together with the clear evidence of association with caffeine consumption, indicates that the 2-SNP genetic risk score could be used as an instrumental variable in MR analyses. The greater variance explained by the combined score would increase statistical power and reduce the sample size required to detect associations with health outcomes, compared to using either SNP individually. The risk score explains up to 1.3% of the variance in caffeine consumption, which although small in absolute terms is relatively large by the standards of common genetic variants. This is comparable to the variance explained in body mass index (BMI) by variants in the FTO gene, and in cigarette consumption by variants in the CHRNA5-A3-B4 gene cluster [15,37], which have been used in MR studies of the causal effects of BMI and smoking on health outcomes [38][39][40][41]. The 2-SNP score for caffeine consumption may therefore be a suitable instrument to explore the causal effects of caffeine consumption on a range of health outcomes.
There are some limitations to this study that should be considered when interpreting our results. First, caffeine consumption was measured using a food frequency questionnaire, and these may have modest reliability and validity [35]. We were also only able to capture tea, coffee and cola drinks as sources of dietary caffeine, and not other sources (e.g., chocolate). However, tea, coffee and soft drinks (including cola) together account for ,90% of caffeine consumption in similar populations, and the levels of consumption we observed are similar to those observed in other studies [32]. While more detailed assessments of caffeine consumption are possible, these are difficult to obtain on the scale necessary for genetic association studies. Future studies could obtain more detailed phenotypic information on selected, genetically-informative individuals [42]. Second, levels of cola consumption were low in this sample, so that this, together with the relatively low levels of caffeine in cola drinks, may account for the lack of association observed. It is also possible that participants were responding to questions about ''cola'' consumption at least in part as questions about all soda consumption. To better understand whether this lack of association is genuine will require the study of populations where levels of cola consumption are higher. Third, our sample was restricted to women only. Rates of caffeine consumption may differ between men and women, although there are no clear reasons to expect that the pattern of results we observed would differ in males. While patterns of consumption during pregnancy may not be typical, our data extend to ,12 years post-pregnancy. It is likely that the women in our sample reverted to pre-pregnancy patterns of caffeine consumption over time. Fourth, we only included 2 SNPs in our analysis. These were chosen on the basis of being those for which there is the clearest evidence from recent GWAS of caffeine consumption. Future studies may extend our 2-SNP score by including further variants. Fifth, although we are optimistic that these genotypes, and the 2-SNP score, can be used as instrumental variables in MR analyses, potential pleiotropic effects will need to be considered. Metabolic enzyme genotypes typically relate to several metabolic differences with may give rise to associations with health outcomes. In principle, this can be tested by examining the association of genotype with health outcome separately in those who do and do not consume caffeinated drinks [43] -the genotype should not be associated with the outcome in the latter group if the association is mediated via caffeine consumption (although this can give rise to collider bias [44]). Finally, participants of non-European ancestry were excluded during preparation of GWAS data, given that differences in ancestry can bias genetic association studies. Therefore, genotypes were only available for participants of European ancestry. However, .95% of ALSPAC participants are of European ancestry, so we think it unlikely that this influenced our results.
In conclusion, our data confirm the association of AHR and CYP1A1 genotypes with caffeine consumption, and extend previous work by showing that this association holds for tea consumption as well as coffee consumption. Moreover, no association is observed for decaffeinated tea or coffee consumption. This strengthens the argument that the association is mediated via caffeine consumption, although it remains possible that other compounds present in both tea and coffee mediate this association. Future work, perhaps selecting participants on the basis of AHR and CYP1A1 genotype, could explore this possibility through the administration of caffeine in a laboratory setting. Finally, the relatively large proportion of variance in caffeine consumption accounted for by the combined SNP score, and the lack of association of this with potential confounders, means that it could be used in Mendelian randomization studies to explore the causal effects of habitual caffeine consumption on health-related outcomes.