Multiple Independent Loci at Chromosome 15q25.1 Affect Smoking Quantity: a Meta-Analysis and Comparison with Lung Cancer and COPD

Recently, genetic association findings for nicotine dependence, smoking behavior, and smoking-related diseases converged to implicate the chromosome 15q25.1 region, which includes the CHRNA5-CHRNA3-CHRNB4 cholinergic nicotinic receptor subunit genes. In particular, association with the nonsynonymous CHRNA5 SNP rs16969968 and correlates has been replicated in several independent studies. Extensive genotyping of this region has suggested additional statistically distinct signals for nicotine dependence, tagged by rs578776 and rs588765. One goal of the Consortium for the Genetic Analysis of Smoking Phenotypes (CGASP) is to elucidate the associations among these markers and dichotomous smoking quantity (heavy versus light smoking), lung cancer, and chronic obstructive pulmonary disease (COPD). We performed a meta-analysis across 34 datasets of European-ancestry subjects, including 38,617 smokers who were assessed for cigarettes-per-day, 7,700 lung cancer cases and 5,914 lung-cancer-free controls (all smokers), and 2,614 COPD cases and 3,568 COPD-free controls (all smokers). We demonstrate statistically independent associations of rs16969968 and rs588765 with smoking (mutually adjusted p-values<10−35 and <10−8 respectively). Because the risk alleles at these loci are negatively correlated, their association with smoking is stronger in the joint model than when each SNP is analyzed alone. Rs578776 also demonstrates association with smoking after adjustment for rs16969968 (p<10−6). In models adjusting for cigarettes-per-day, we confirm the association between rs16969968 and lung cancer (p<10−20) and observe a nominally significant association with COPD (p = 0.01); the other loci are not significantly associated with either lung cancer or COPD after adjusting for rs16969968. This study provides strong evidence that multiple statistically distinct loci in this region affect smoking behavior. This study is also the first report of association between rs588765 (and correlates) and smoking that achieves genome-wide significance; these SNPs have previously been associated with mRNA levels of CHRNA5 in brain and lung tissue.


Introduction
Smoking is associated with many different diseases. Lung cancer is the illness most identified with smoking, and its prevalence over time mirrors per capita tobacco consumption [1]. There has been a reduction in smoking in the United States, and a concomitant decline in the incidence of lung cancer is beginning to emerge. Nonetheless more people die from lung cancer each year than from any other cancer [2]. Chronic obstructive pulmonary disease (COPD), another serious lung disease largely attributable to smoking, is also among the leading causes of death.
Recently, genetic findings for nicotine dependence and smoking related diseases converged to implicate the chromosome 15q25.1 region, which includes the CHRNA5-CHRNA3-CHRNB4 cluster of cholinergic nicotinic receptor subunit genes. The nicotine dependence locus tagged by the single nucleotide polymorphism (SNP) rs16969968 and correlates has been replicated for smoking related traits including cigarettes-per-day and heavy smoking [3][4][5][6][7][8][9][10][11], and has been reported as the most significant association genome-wide in very recent meta-analyses [12][13][14]. This locus has also been associated with risk for lung cancer and COPD in several genome-wide association studies (GWAS) [6,[15][16][17][18]. This represents an exciting overlap of genetic findings for nicotine dependence and smoking related diseases. Though different SNPs may be reported by each study, the high correlation between the associated SNPs (r 2 .0.8 with rs16969968) implies that these statistical signals tag the same locus in European-ancestry populations. The SNP rs16969968 results in an amino acid change (D398N) in the alpha5 receptor subunit protein and has been shown to affect receptor function [19].
Extensive genotyping of the CHRNA5-CHRNA3-CHRNB4 region has provided potential evidence for at least two additional distinct signals for nicotine dependence [4,7,8,20]. A second locus, tagged by rs578776, is associated with nicotine dependence and smoking in several samples of European-ancestry, with the minor allele protective in the sense that it is elevated in controls; rs578776 has only low correlation with rs16969968 in European-ancestry populations (r 2 = 0.24 in the HapMap CEU panel), though the linkage disequilibrium (LD) coefficient |D'| is 1. A third important locus in this region is a group of highly correlated SNPs, tagged by rs588765, which are associated with mRNA levels of CHRNA5 in brain tissue [21,22] and lung tissue [23][24][25] from European-ancestry subjects. When rs16969968 and rs588765 (or correlates) are studied together, three common haplotypes are observed, each with distinct effects on risk [7,22]. There are hints that other, less common variants (minor allele frequency (MAF)#5%) also contribute to nicotine dependence in this region, including a fourth locus represented by rs12914008 which has shown a relatively strong odds ratio of 0.73 in European-American subjects [4].
With the support of the National Institute on Drug Abuse (NIDA), we formed the Consortium for the Genetic Analysis of Smoking Phenotypes (CGASP), which includes smoking, lung cancer, and COPD researchers, to enable the pursuit of several research goals. For this first analysis project we focused on the chromosome 15q25.1 region containing CHRNA5-CHRNA3-CHRNB4. Specifically, we focused on the four distinct loci discussed above, which have low correlation with each other and have demonstrated evidence for involvement in nicotine dependence. Analyses were undertaken to investigate two questions: first, are there multiple statistically distinct genetic loci in this region that exert independent effects on smoking, and second, are similar patterns of genetic risk shared across smoking, lung cancer, and COPD.

Ethics statement
This study was conducted according to the principles expressed in the Declaration of Helsinki and obtained informed consent from participants and approval from the appropriate institutional review boards.

Samples and study design
All subjects included in these meta-analyses were current or former smokers of European ancestry. Results from 34 datasets, which include a total of 38,617 unrelated subjects who were assessed for cigarettes-per-day, contributed to the meta-analyses. Eight of the datasets were drawn from family-based studies and contributed only a subset of unrelated individuals to these analyses. Table 1 gives sample sizes and demographics of each participating study sample. Text S1 describes additional details for each dataset, including ascertainment criteria and genotyping methods, and documents that four datasets are also members of other consortia. All datasets contributed to the analyses of smoking. A subset of these 34 datasets also had information on lung cancer cases and lung-cancer-free smoker controls (6 datasets, N = 13,614 smokers) and/or COPD cases and COPD-free smoker controls (4 datasets, N = 6,182 smokers). The data for these traits are described in Table 2 and Table 3 respectively.

Traits for analysis
The traits examined were smoking quantity, lung cancer, and COPD. Two smoking traits were derived from measurements of cigarettes smoked per day (CPD): a 4-level categorical trait (CPD#10, 10,CPD#20, 20,CPD#30, and CPD.30) and a dichotomous trait contrasting subjects from the lowest smoking category (CPD#10: light-smoking ''controls'') to those in the two highest categories combined (CPD.20: heavy smoking cases). The dichotomous trait of heavy versus light smoking was our primary trait for analysis. For one study (NAG-Finland), which used different boundaries to record CPD as detailed in the supplemental material, the distribution of CPD was examined to harmonize

Author Summary
Nicotine binds to cholinergic nicotinic receptors, which are composed of a variety of subunits. Genetic studies for smoking behavior and smoking-related diseases have implicated a genomic region that encodes the alpha5, alpha3, and beta4 subunits. We examined genetic data across this region for over 38,000 smokers, a subset of which had been assessed for lung cancer or chronic obstructive pulmonary disease. We demonstrate strong evidence that there are at least two statistically independent loci in this region that affect risk for heavy smoking. One of these loci represents a change in the protein structure of the alpha5 subunit. This work is also the first to report strong evidence of association between smoking and a group of genetic variants that are of biological interest because of their links to expression of the alpha5 cholinergic nicotinic receptor subunit gene. These advances in understanding the genetic influences on smoking behavior are important because of the profound public health burdens caused by smoking and nicotine addiction.
the phenotypes and select alternative boundaries. The numbers of subjects in each smoking category, total and by study, are given in Table 1. Lung cancer and COPD were analyzed as dichotomous traits. COPD cases were defined to have COPD as determined by post-bronchodilator spirometry as GOLD Stage II or worse (N = 1,719), or self-reported COPD, emphysema or chronic bronchitis.

SNPs for analysis
In European-ancestry populations, each of the four loci of interest can be represented by various highly correlated SNPs (SNPs having high r 2 with each other). For each locus, we chose one target SNP for analysis: rs16969968 (locus 1), rs578776 (locus 2), rs588765 (locus 3), and rs12914008 (locus 4); the pairwise correlations between any two of these loci are r 2 ,0.5 (Table S1). In samples for which a given target SNP was not available, we chose a highly correlated proxy SNP based on r 2 computed with Haploview [26] using downloaded HapMap CEU genotype data, Release 23 [27]. Table S2 lists the proxy SNPs used and their r 2 with the corresponding target SNPs. Figure S1 displays the SNPs for each of the 4 loci in relation to the CHRNA5-CHRNA3-CHRNB4 cluster.

Statistical analyses and meta-analysis
To ensure uniform analyses, SAS (SAS Institute, Cary, NC) and R [28] scripts for genetic association analyses were developed centrally and then distributed. The scripts were executed by each participating site, and the results returned to the coordinating group.
In each dataset, associations between the loci and the traits were evaluated using logistic regression. Our primary analysis model coded genotypes additively as the number of copies of the minor allele according to the HapMap CEU reference population. This allele is referred to as the ''coded allele'' (C) and the major allele is referred to as the ''reference allele'' (R). To confirm the appropriateness of the additive model, for each locus a 2 degree of freedom model including the additive term and a heterozygote deviation term was evaluated. The analyses of the 4-level CPD trait used generalized logistic regression to obtain separate effect estimates (beta coefficients) for each category with respect to the lowest smoking category as the referent. All these association analyses included sex and age as covariates. In addition, lung cancer and COPD analyses included categorical cigarettes-per-day as an unordered covariate.
Association results from each dataset, including the beta coefficient and standard error, were provided to the coordinating team. Meta-analysis was carried out using PLINK [29] to obtain overall summary odds ratios (ORs) and statistics. The R package rmeta [30] was used to verify results and create plots. There was no evidence of significant heterogeneity across datasets for these analyses (minimum heterogeneity p = 0.21 for dichotomous CPD, 0.07 for lung cancer, 0.24 for COPD; for categorical CPD a nominally significant p was seen only for category 3 and locus 1 (p = 0.007)). Because of varying study designs, ascertainment strategies, and representative SNPs, we nevertheless report results from random effects meta-analyses.
As noted earlier, locus 1 (representing rs16969968) is a highly replicated association finding and furthermore rs16969968 has been shown to have functional effects on the resulting alpha5containing receptor [19]. Therefore an important question is whether the remaining loci demonstrate additional independent effects on disease risk. Although loci 2, 3 and 4 are not highly correlated with rs16969968, |D'| is high. A high |D'| can correspond to a low r 2 if the alleles that tend to co-occur on the same haplotype have very different allele frequencies. Previous results in the COGEND data suggest that there may be independent or synergistic effects on nicotine dependence between locus 1 and locus 3 [4], and haplotype analyses in the Utah and LHS samples [7], and in the COGEND and CPS-II-CPD samples [22], also indicate effects of haplotypes containing loci 1, 2 and 3.
To test whether additional loci contribute to dichotomous smoking quantity over and above the effect of rs16969968, we included both locus 1 and each of the other loci in the logistic regression models adjusting for sex and age, with and without a SNP6SNP interaction term. For lung cancer and COPD the models also included categorical cigarettes-per-day as an unordered covariate. These results were then meta-analyzed as described above. The SNP6SNP interaction term was never significant in the meta-analysis (p.0.3), so we report results from the joint models without interactions. To allow comparison between single-SNP and joint results on comparable data, for  each locus pair we also repeated the univariate single-SNP metaanalyses on the subset of datasets that had genotypes available at both loci. For dichotomous smoking quantity we also tabulated pair-wise joint genotype by case status counts for locus 1 (rs16969968) versus each of the other three loci across the contributing datasets that had both loci.

Multiple test correction
Across the four target loci, multiple traits (4), the multiple models (additive and additive+heterozygote deviation), and the 2-SNP joint analyses (3 loci), our study was designed to perform fewer than 80 tests. A conservative Bonferroni correction would result in an uncorrected p-value threshold of 6.25610 24 corresponding to an experiment-wide alpha of 0.05. The results tables report uncorrected p-values which we compared to this threshold to determine statistical significance.

Results
We calculated allele frequencies within each sample to confirm that the coded allele (minor allele in HapMap CEU) was indeed the minor allele as expected in these European-ancestry subjects. Table S3 shows allele frequencies in each sample for the SNPs used. For each locus, frequencies are similar across studies and proxy SNPs, and similar to the frequencies in the HapMap CEU reference population.
All reported results are based on additive models. The additive model is appropriate because none of the tests for deviation from the additive assumption were significant. For each analysis, the tables and figures report the number of individuals successfully genotyped for the relevant SNP or SNPs.
Dichotomous CPD, single-SNP analysis Table 4 summarizes the meta-analysis results of dichotomous CPD (heavy/light smoking) in single-SNP analysis. Meta-analysis across all 34 samples clearly shows a highly significant association between dichotomous CPD and locus 1 (tagging rs16969968). Figure 1 displays a forest plot of the summary meta-analysis results for locus 1 (p = 5.96610 231 , OR = 1.33, 95% confidence interval (1.26-1.39)), and also the ORs in each contributing dataset.
The same analysis of locus 2 (tagging rs578776) yields a metaanalysis p-value of 1.38610 225 and an OR of 0.78 (0.74-0.81), indicating a protective association for the minor allele as has previously been reported ( Figure 2). Locus 3 (tagging rs588765) under the same model gives a p-value of 0.00027 and OR of 0.93 (0.89-0.97), which meets our threshold for multiple-test corrected significance but, unlike locus 1 and locus 2, does not surpass genome-wide significance ( Figure 3). Locus 4 (tagging rs12914008) does not show a main effect on dichotomous CPD (p = 0.45, OR = 1.05 (0.93-1.17). The forest plot for locus 4 is given in Figure S2.

Categorical CPD, single-SNP analysis
The categorical CPD analysis, which includes all 4 CPD levels in a generalized logit model, allows us to evaluate genetic effects for each CPD category with respect to the lowest smoking class (CPD#10). Table 5 shows the results.
For locus 1 (rs16969968), we see an ordinal effect with increasing CPD; that is, the odds ratio increases from 1.15 to 1.29 to 1.40 for categories 2, 3 and 4, with a corresponding decrease in p-value from 3.17610 28 to 2.12610 212 to 5.47610 240 . A similar ordinal effect is seen for locus 2 (rs578776), with the odds ratio decreasing from 0.88 to 0.79 to 0.77. For locus 3 (rs588765) we see an effect only with the highest smoking category (CPD.30). For locus 4 no effect is seen across smoking categories, consistent with the dichotomous CPD results.

Joint analysis for dichotomous CPD
To dissect the potential distinct effects of these loci on heavy versus light smoking, we carried out meta-analyses of joint SNP models that included sex, age, locus 1 and each of the other loci, coded additively.
In the joint analysis of locus 1 and locus 2, there is suggestive evidence of distinct effects, but the association at locus 2 is no longer genome-wide significant in the presence of locus 1. Both SNPs become less significant compared to their single locus models: in the joint model, locus 1 gives p = 2.15610 222 , OR = 1.27 (1.21-1.33) and locus 2 gives p = 4.50610 27 , OR = 0.87 (0.83-0.92). When each SNP is placed individually in the model and meta-analyzed across the 32 datasets that provided data for both loci, locus 1 gives p = 1.41610 232 , OR = 1.34 while locus 2 gives p = 1.38610 225 , OR = 0.76. The risk-increasing alleles at locus 1 (C) and locus 2 (R) are positively correlated, even though the minor alleles are negatively correlated.
In joint analysis of locus 1 and locus 3, locus 1 (rs16969968) yields a p-value of 3.52610 236 , OR = 1.47 (1.38-1.56); locus 3 (rs588765) gives p = 6.03610 29 , OR = 1.17 (1.11-1.23). Thus locus 3 attains genome-wide significance (p,5610 28 ) after adjusting for the effect of locus 1. Note that adjusting for locus 1 changes the direction of effect for locus 3 (OR.1) compared to the single-SNP results. In the 33 datasets that have both loci genotyped, we obtain p = 5.39610 229 , OR = 1.32 for locus 1 alone, and p = 0.00027, OR = 0.93 (0.89-0.97) for locus 3 alone. The evidence for association in the joint model is stronger than when each SNP is analyzed alone. In fact, when locus 1 is not taken into account, the effect of locus 3 is potentially masked, and the effect of the minor allele is in an opposite direction (protective versus risk).
To further examine these interesting results for locus 1 and locus 3, we show the number of heavy and light smokers in each joint genotype class, and corresponding odds ratios using the genotype that is homozygous for both reference (major) alleles as the reference group ( Table 6). The reference alleles (major in HapMap CEU) are labeled ''R'' and the coded alleles (minor in HapMap CEU) are labeled ''C''.
The first important observation is that there are very few subjects in certain cells, namely the cells corresponding to RC/CC at locus 1/locus 3, CC/RC, and CC at both loci. This table therefore reveals that the risk alleles at locus 1 (C) and locus 3 (C) are negatively correlated, and explains why the effect of rs588765 is seen only after adjusting for rs16969968. This pattern also reflects the high |D'| between the loci.
The second observation is that for the remaining, well populated cells, the coded allele at locus 3 increases risk on the

Lung cancer controlled for CPD
In Table 7 we report the single-SNP meta-analysis results for the six lung cancer datasets; recall that all subjects were smokers, and sex, age and categorical CPD were included as covariates. As with the CPD traits, locus 1 (rs16969968) shows highly significant evidence for association with lung cancer (p = 1.99610 221 ). The summary odds ratio of 1.31 (1.24-1.38) closely matches the dichotomous CPD odds ratio of 1.33 (1.26-1.39). Figure 4 shows the association results for locus 1 by dataset and the overall metaanalysis results.
Locus 2 (rs578776) also shows evidence of association with lung cancer in single-SNP analysis (p = 9.74610 210 ; OR = 0.82 (0.77-0.87)) ( Figure 5). Locus 3 results in a p-value of 0.0004 (OR = 0.90 (0.86-0.96)) ( Figure 6); as with categorical CPD, this meets our multiple-test-corrected threshold but is not genome-wide signifi- cant. Locus 4 shows no evidence for association with lung cancer; the forest plot is given in Figure S3.

Joint analyses for lung cancer controlled for CPD
Similar to our analyses of categorical CPD, we carried out joint analyses of locus 1 with each of the other 3 loci, with covariates for sex, age and dummy-coded CPD. After adjusting for the effect of locus 1, none of the other loci reached our multiple-test-corrected significance threshold.  Table 8 summarizes the meta-analysis results for the 3 datasets with the COPD trait; as with lung cancer, all subjects were smokers and sex, age, and categorical CPD were included as covariates. In these analyses, only locus 1 provides even suggestive evidence for association though it does not survive multiple test correction (uncorrected p = 0.01). The locus 1 odds ratio is 1.12 (1.02-1.23), a point estimate lower than that for CPD (1.33) and lung cancer (1.31) (Figure 7).

Discussion
The first goal of this meta-analysis project was to test whether distinct loci in the CHRNA5-CHRNA3-CHRNB4 gene cluster demonstrate independent effects on smoking behavior (heavy (CPD.20) versus light (CPD#10) smoking). We selected loci for study based on prior statistical and/or functional evidence for involvement. The second goal was to test whether similar patterns of association are seen across these loci in the smoking-related diseases of lung cancer and COPD. This meta-analysis marks the first large-scale effort to line up association results for these related traits -smoking, lung cancer, and COPD -using a uniform analysis protocol. Our results contribute important new insights about genetic risk for these traits. In particular, we demonstrate strong evidence that smoking behavior is influenced by multiple distinct loci in this region, including two loci that are associated with relevant biological effects in functional studies.
First, our results show that locus 1, representing the CHRNA5 amino acid change rs16969968 and correlates, demonstrates highly significant association with smoking behavior (OR = 1.33, p = 5.96610 231 ). Our strong evidence for the involvement of locus  The number of CPD cases and CPD controls with the specified two-locus genotype combination. 2 The odds ratio and 95% CI when the reference joint genotype is RR at locus 1 and RR at locus 3. doi:10.1371/journal.pgen.1001053.t006 1 with smoking across these samples marks the robustness of its genetic effect. The contributing datasets for the smoking analyses range from samples ascertained for nicotine dependence, lung cancer, or COPD, to adolescent samples, to populations ascertained for a variety of diseases including schizophrenia, alcohol or other substance dependence, breast cancer, type 2 diabetes, and heart disease. This meta-analysis represents a very diverse group, and yet the association between rs16969968 and smoking behavior is consistent. The second, and novel, finding from this meta-analysis is the evidence for an additional, distinct, locus in this region that is associated with heavy/light smoking and is genome-wide significant. We demonstrated that locus 3, representing rs588765 and correlates, attains a p-value of p = 6.03610 29 (OR = 1.17) when we adjust for locus 1 in a logistic regression model. It is notable that the association between locus 3 and CPD is not as apparent in the single-SNP analysis that does not control for locus 1 (e.g. metaanalysis p = 0.0003, OR = 0.93, which does not reach genomewide significance). The negative correlation between the risk alleles at locus 1 and locus 3 (r = 20.64) masks the effect at the latter locus in single-SNP analysis, a phenomenon known as suppression [31,32]. The association evidence for both SNPs is strengthened in the joint analysis, with a reversal of the direction of effect for locus 3. This evidence of statistically independent association for locus 3 with smoking in our analysis is compelling given that these SNPs have also been implicated in altered mRNA levels for CHRNA5 in brain and lung tissue from European-ancestry subjects [21,22,24]. Thus, both statistical and functional evidence indicate that at least one SNP correlated with CHRNA5 mRNA levels is involved in risk, and highlight locus 3 as an important group of SNPs for further investigation.
A third observation from this study is that locus 2 (rs578776 and correlates) shows evidence for involvement in heavy/light smoking. Locus 2 is genome-wide significant in the single-SNP analysis of dichotomous CPD without adjustment for locus 1, with the minor allele elevated in controls (meta-analysis p = 1.38610 225 , OR = 0.78). However the association is much weaker (p = 4.50610 27 , OR = 0.87) in the joint logistic regression model that includes locus 1 and locus 2. One interpretation is that part of the single-SNP association at locus 2 is driven by the effect of locus 1 (perhaps related to the high |D'|). Nevertheless, there is evidence for residual signal at locus 2.
We tested a fourth locus representing rs12914008, a relatively uncommon (MAF ,5%) non-synonymous SNP in CHRNB4 that has previously shown suggestive evidence for association in European-Americans [4]. In both the univariate analysis and the joint analysis with locus 1, locus 4 is not associated with smoking behavior after multiple test correction. Because of the low allele frequency of this variant, the power to detect an effect is lower than for the other three loci.
This meta-analysis therefore highlights locus 1, locus 2, and locus 3, and indicates dependencies in their effects on risk for heavy smoking. Haplotypes based on these three loci have been described [7,22] and are seen in HapMap CEU, where the observed haplotype patterns for rs16969968 (locus 1), rs578776 (locus 2), and rs588765 (locus 3) are: A-G-C (frequency 0.425), G-  Our association results together with the correlation patterns of these three loci suggest that future haplotype or diplotype analyses across large datasets could clarify the relative contributions of these loci. Our evidence that multiple distinct genetic loci affect smoking quantity is consistent with previous reports of risk and protective haplotypes for nicotine dependence in the Utah and LHS samples [7], and in the COGEND and CPS-II-CPD samples [22]. The Utah/LHS study haplotype included 5 SNPs: two that represent locus 1 (rs16969968 and rs1051730), two that represent locus 2 (rs569207 and rs578776), and one that represents locus 3 (rs680244). The COGEND and CPS-II-CPD haplotype analyses included up to 3 loci, one each for locus 1, 2 and 3. Across all these published studies, the high-risk haplotype carries the risk allele at rs16969968 (locus 1); because of the high |D'| between loci, only one haplotype carries that allele. Among the remaining haplotypes, a low risk haplotype is obtained when the minor allele at locus 2 or the major allele at locus 3, or both, is paired with the non-risk allele at rs16969968.
Taken together, our meta-analysis results argue strongly for the existence of at least two statistically distinct loci in this region that affect risk for heavy smoking. In particular, both locus 1 and locus 3, which have known functional effects, are genome-wide significant in joint, mutually-adjusted analysis. The minor allele at locus 3 shifts from a marginally significant protective factor when considered alone to a robust risk factor when considered in combination with locus 1. The statistical evidence and negatively correlated alleles at locus 1 and locus 3 are consistent with at least two mechanistic models: distinct effects of two loci where the minor allele at each locus increases risk across a constant background at the other locus, or a haplotype dose effect where alleles at the two loci act in concert on the same haplotype strand. In the latter model, the minor-major and major-minor haplotypes each increase risk relative to the major-major haplotype, as can be seen in Table 6 once it is recognized that the rarity of the minorminor haplotype implies that the double-heterozygote cell essentially represents the minor-major and major-minor diplotype. It is also possible that multiple rare variants underlie these findings, as has been suggested in general for disease associations with common SNPs [33]. It remains possible that these associations with locus 1, locus 2 and locus 3 are reflecting correlation with yet another underlying, untyped variant that alone explains the altered biology leading to risk. However, biological involvement of multiple loci appears more likely given that two of these loci represent two distinct, relevant functional consequences: namely,  locus 1 (the amino acid change at rs16969968) is associated with altered receptor response to a nicotine agonist in vitro [19], and locus 3 (rs588765 and correlates) is associated with altered mRNA levels of CHRNA5 in brain and lung tissue [22,24]. Further investigation via resequencing, biological/functional assays, and animal models is needed to dissect the causal biology that underlies the statistical evidence.
An important open question is the degree to which the associations between chr15q25 variants and lung cancer are due to their effects on smoking. When comparing smoking and lung cancer single-SNP results, the patterns of association (odds ratios and directions of effect) were similar across the loci studied. Locus 1 is associated with lung cancer even when controlling for amount smoked per day (p = 1.99610 221 , OR = 1.31). This result suggests possible direct genetic effects of locus 1 on this cancer, at least in the presence of smoking. However, CPD is not a sufficient proxy for carcinogen exposure [34], and in never-smokers there is a lack of association between locus 1 and lung cancer [35][36][37], so it is possible that more refined adjustment for smoking will reduce or abolish this association.
For lung cancer, after controlling for categorical CPD and effects of locus 1, we were not able to definitively demonstrate association at either locus 2 or locus 3 after correction for multiple tests. For the mutually adjusted analysis of locus 1 and locus 3 for lung cancer, we observed the same change in the direction for the locus 3 odds ratio that we observed in the joint-SNP analysis of smoking. However, unlike what was seen for smoking, for lung cancer the magnitude (and significance) of the effects did not increase. There are several possible reasons for this, including: chance, the smaller sample size for lung cancer, or qualitative differences in the relationship between these loci and smoking behavior versus the relationship between these loci and lung cancer (after adjusting for smoking quantity). This highlights the challenges posed when attempting to dissect the contributions of multiple loci of modest effect on complex, correlated traits. Further studies, and larger sample sizes, are needed.
For COPD, when controlling for cigarettes-per-day we did not find evidence for association with any of the loci after correction for multiple tests. For locus 1, the odds ratio of 1.12 (1.01-1.23) is lower than for smoking and lung cancer. The COPD analyses were based on smaller samples than those available for CPD or for lung cancer.
Very recently, three other large smoking genetics consortia published their meta-analysis findings that confirm locus 1 (representing not only rs16969918 but also rs1051730 and other SNPs) as the locus most associated with smoking quantity, genome-wide [12][13][14]. All three studies used linear regression to test for association with either quantitative CPD value [14] or categorical CPD (1-10, 11-20, 21-30, and 31+) [12,13]. Those consortia also report results from conditional analyses in which a locus 1 SNP was included as a covariate, paralleling our joint analyses.
In contrast to our novel finding in CGASP of genome-wide significance for locus 3 when analyzed jointly with locus 1, none of the other consortia report strong evidence for locus 3 when paired with locus 1. In the Oxford-GSK study [13], imputation using 1000 Genomes data detected the most significant single-SNP association for CPD at the locus 1 SNP rs55853698 (r 2 .0.96 with rs16969968). After conditioning on rs55853698, the strongest residual signal was detected at a locus 2 SNP, rs6495308 (p = 3.96610 25 ; r 2 = 0.825 with rs578776 in HapMap CEU); they do not report the association result for rs588765 in the conditioned analysis, although it must have been less significant than 3.96610 25 . In their single-SNP analysis, rs6495308 (locus 2) gave a p-value of 2.2610 210 . Their results for locus 2 are therefore consistent with our observation that in joint analysis of locus 1 and locus 2, the significance at locus 2 is reduced compared to the single-SNP analysis. They do not report on whether the evidence for locus 1 and locus 3 strengthens in the joint analysis compared to single-SNP analysis, as we observed in the CGASP datasets. They do note that there is no obvious residual association with a third SNP after conditioning on either the pairing of locus 1 (rs16969968) and locus 3 (rs588765), or the pairing of locus 1 (rs55853698) and locus 2 (rs6495308). That result is consistent with the correlation and haplotype structure of these three loci discussed previously.
In TAG [14], the conditional analyses indicated residual association at rs684513 (p = 6.3610 29 ), rs9788682 (p = 1.06610 28 ), and rs7163730 (p = 1.22610 28 ), which attain genome-wide significance. These SNPs are each correlated with locus 2, and much less correlated with locus 3 (r 2 = 0.7, 0.55 and 0.56 respectively with rs578776 in HapMap CEU; r 2 ,0.11 with rs588765). It is possible that differences in samples, phenotype definitions, or analysis methods may be contributing to the differences between our strong findings for locus 3 and the three other consortium reports. To further understand the genetic contributions in this region, more work is needed, and not only statistical evidence but also biological evidence will be important.
In summary, our meta-analysis demonstrates significant, robust association of locus 1, representing the non-synonymous CHRNA5 SNP rs16969968 as well as rs1051730 and rs55853698, with smoking heaviness across very diverse datasets. Our study also demonstrates strong evidence that at least one additional distinct locus in this region affects risk for heavy smoking. In particular, we have identified for the first time that locus 3 -representing the CHRNA5 expression-associated SNPs rs588765 and correlatessurpasses GWAS-level significance for association with heavy smoking in European-ancestry subjects; this effect is detectable after adjusting for the effect of rs16969968. This new result for locus 3 raises the corresponding SNPs (rs588765 and correlates) to the level of interest already accorded to the two loci which have previously been detected at GWAS-level significance in single-SNP analyses: locus 1 (rs16969968 and correlates) and locus 2 (rs578776 and correlates). Our result also has implications for all genetic association studies, as it illustrates that joint analysis of SNPs is an important tool for identifying genome-wide significant effects that, soberingly, may be obscured in single SNP analyses.
Our study used multiple highly correlated SNPs to represent each of the 4 tested loci, depending on availability in each dataset, and all subjects were of European ancestry. Hence this study is not designed to determine which SNP(s), among the highly correlated SNPs for each locus, are most likely to be biologically involved. Future work, involving large-scale meta-analysis of other populations (e.g. Asian or African ancestry) to capitalize on LD differences between populations, comprehensive functional annotation of genetic variants, DNA re-sequencing and variant discovery, and functional and animal studies may help narrow down these large sets of correlated SNPs to the most promising causal alleles. Figure S1 The CHRNA5-CHRNA3-CHRNB4 region containing the target SNPs rs16969968 (locus 1), rs578776 (locus 2) rs588765 (locus 3), and rs12914008 (locus 4). The SNPs used in this study to represent each locus are drawn with dotted lines connecting them to each other.    Text S1 Descriptions of contributing datasets. Numbered according to appearance in Table 1 and Figure 1.