Individual and Cumulative Effects of GWAS Susceptibility Loci in Lung Cancer: Associations after Sub-Phenotyping for COPD

Epidemiological studies show that approximately 20–30% of chronic smokers develop chronic obstructive pulmonary disease (COPD) while 10–15% develop lung cancer. COPD pre-exists lung cancer in 50–90% of cases and has a heritability of 40–77%, much greater than for lung cancer with heritability of 15–25%. These data suggest that smokers susceptible to COPD may also be susceptible to lung cancer. This study examines the association of several overlapping chromosomal loci, recently implicated by GWA studies in COPD, lung function and lung cancer, in (n = 1400) subjects sub-phenotyped for the presence of COPD and matched for smoking exposure. Using this approach we show; the 15q25 locus confers susceptibility to lung cancer and COPD, the 4q31 and 4q22 loci both confer a reduced risk to both COPD and lung cancer, the 6p21 locus confers susceptibility to lung cancer in smokers with pre-existing COPD, the 5p15 and 1q23 loci both confer susceptibility to lung cancer in those with no pre-existing COPD. We also show the 5q33 locus, previously associated with reduced FEV1, appears to confer susceptibility to both COPD and lung cancer. The 6p21 locus previously linked to reduced FEV1 is associated with COPD only. Larger studies will be needed to distinguish whether these COPD-related effects may reflect, in part, associations specific to different lung cancer histology. We demonstrate that when the “risk genotypes” derived from the univariate analysis are incorporated into an algorithm with clinical variables, independently associated with lung cancer in multivariate analysis, modest discrimination is possible on receiver operator curve analysis (AUC = 0.70). We suggest that genetic susceptibility to lung cancer includes genes conferring susceptibility to COPD and that sub-phenotyping with spirometry is critical to identifying genes underlying the development of lung cancer.


Introduction
Lung cancer and chronic obstructive pulmonary disease (COPD) are both lung diseases that result from the combined effects of smoking exposure and genetic susceptibility [1,2]. Epidemiological studies show that although tobacco smoke exposure accounts for nearly 90% of cases, only 10-15% of smokers develop lung cancer while 20%-30% develop COPD [3][4][5]. Genetic factors might explain these observations as heritability of lung cancer and reduced FEV 1 (forced expiratory volume in one second that defines COPD) is estimated to be 15-25% and 40-77% respectively [6,7]. The presence of COPD, a disease characterized by airflow limitation secondary to lung remodelling (emphysema and small airways fibrosis), confers a 4-6 fold increased risk of lung cancer compared to smokers (a) with normal lung function [8] or (b) randomly recruited from the community [9]. Studies also show that the distribution of FEV 1 is bi-modal in heavy smokers and uni-modal in light smokers, supporting a genetic basis to COPD and the lung remodelling (FEV 1 ) response to chronic smoking exposure [10][11][12]. Importantly, between 50-90% of those with lung cancer have pre-existing COPD, compared to 15% in randomly selected community-based smoking controls [8,[13][14][15]. This means lung cancer is not just a ''complex disease'' from a genetic perspective but that it is also a mixed phenotype that includes COPD as a sub-phenotype. The question that then arises is ''Are the genetic effects underlying COPD also important in susceptibility to lung cancer?'' Recent genome-wide association (GWA) studies in lung cancer, COPD and lung function (FEV 1 ) have reported significant associations at several chromosomal loci [16][17][18][19][20][21][22][23]. Interestingly, several of these loci (and implicated candidate genes) are common to both COPD and lung cancer, suggesting the possibility that shared pathogenetic pathways may underlie susceptibility to these diseases ( Table 1). The above epidemiological and genetic findings suggest that lung cancer and COPD are not discrete diseases related only through smoking exposure, but that many of the smokers who are susceptible to COPD might also be susceptible to lung cancer [8,12,[24][25][26][27][28]. Such a suggestion was made by Dr Tom Petty 5 years ago [24] and recently reviewed by Punturieri et al. [29]. Given the apparent overlap in susceptibility loci, it appears plausible that some of the genetic factors implicated in COPD might also be relevant in lung cancer [24][25][26][27][28][29]. This is analogous to the inter-related pathways underlying obesity and type 2 diabetes, where the FTO (Fat mass and obesity associated) gene has been implicated in both diseases [30]. In this context BMI is the physiological biomarker used to define the sub-phenotype of obesity just as FEV 1 defines COPD. The question that then arises is ''Given the possible overlap in genetic susceptibility between COPD and lung cancer, is there an alternative study design to current approaches that might better identify susceptibility genes in lung cancer?' ' The above observations suggest that an alternate genetic model to current case-control studies could be used for disease gene discovery in lung cancer [31]. This model would be different from that used in the recent GWA case-control studies [17][18][19], where genetic effects are explored in lung cancer cases and smoking controls with unknown, but likely different, COPD prevalence [26,27,32,33]. With regards to the latter, the possibility that coexisting COPD in lung cancer cases might introduce an interactive or confounding effect in lung cancer association studies has been raised [26,34]. To better understand the complex relationship between COPD and lung cancer, smokers in both cases and controls would ideally be matched for smoking exposure and subphenotyped for COPD using spirometry. Lung function testing is necessary to define this phenotype as COPD is insidious in onset and, due to a widespread underutilisation of spirometry, underdiagnosed in 50-80% of cases [9,33]. Sub-phenotyping for COPD would then define three smoking cohorts, those with normal lung function (''resistant'' controls), those with COPD and those with lung cancer sub-phenotyped for co-existing COPD. Using such an approach, the authors have shown that the chromosome 15q25 locus, originally associated with lung cancer in GWA studies [17][18][19], is also associated with COPD [26]. This observation has been subsequently replicated in both GWA [20] and candidate gene studies [35]. Using this same approach, the authors have also shown that the chromosome 4q31 locus, associated with a reduced risk of COPD [21][22][23], is also independently associated with a reduced risk of lung cancer [28].

Study subjects
The subjects in this study have been previously described [26]. In brief, subjects were of Caucasian ancestry based on their grandparents' descent (all four grandparents of Caucasian descent). Lung cancer and COPD cases were recruited from a tertiary hospital clinic between 2000 and 2007 in Auckland while healthy smoking controls were recruited from the same community after volunteering for screening spirometry. Inclusion criteria were Caucasian ancestry (see above), aged 40 years or more and past smoking history (see below) while those unable to adequately perform spirometry were excluded (approximate 5% failure rate in each group). All participants gave written informed consent, and underwent blood sampling for DNA extraction, pre-bronchodilator spirometry and an investigator-administered questionnaire. Spirometry was performed using a portable spirometer (Easy-One TM ; ndd Medizintechnik AG, Zurich, Switzerland). Lung function conformed to American Thoracic Society (ATS) standards for reproducibility (http://www.thoracic.org/statements/), with the highest value of the best three acceptable blows used for classification of COPD status. COPD was defined according to Global Initiative for Chronic Obstructive Lung Diseases (GOLD) stage 2 or more criteria (FEV1/FVC,70% and FEV1% predicted #80%) using pre-bronchodilator spirometric measurements [www.goldcopd.com]. A modified ATS respiratory questionnaire (http://www.thoracic.org/statements/ was used which collected demographic data including age, sex, medical history, family history of lung disease, history of active and passive tobacco exposure, respiratory symptoms and occupational aero-pollutant exposures. Lung cancer cohort. Subjects with lung cancer were recruited from a tertiary hospital clinic [26], aged .40 yrs and the diagnosis confirmed through histological or cytological specimens in 95% of cases. Non-smokers with lung cancer were excluded from the study and only primary lung cancer cases with the following pathological diagnosis were included: adenocarcinoma, squamous cell cancer, small cell cancer and non-small cell cancer (generally large cell or bronchoalveolar subtypes). Lung function measurement (pre-bronchodilator) was performed within 3 months of lung cancer diagnosis, prior to surgery and in the absence of pleural effusions or lung collapse on plain chest radiographs [8]. For lung cancer cases that had already undergone surgery, pre-operative lung function performed by the hospital lung function laboratory was sourced from medical records.
COPD cohort. Subjects with COPD were identified through hospital specialist clinics as previously described [26]. Subjects recruited into the study were aged 40-80 yrs, with a minimum smoking history of 20 pack-yrs and COPD confirmed by a respiratory specialist based on pre-bronchodilator spirometric criteria (GOLD stage 2 or more).
Control cohort. Control subjects were recruited based on the following criteria: aged 40-80 yrs and with a minimum smoking history of 20 pack-yrs. Control subjects were volunteers who were recruited from the same patient catchment area (residential area) as those serving the lung cancer and COPD hospital clinics through either (a) a community postal advertisement or (b) while attending community-based retired military/servicemen's clubs. Controls with COPD, based on pre-bronchodilator spirometry (GOLD stage 1 or more), who constituted 35% of the smoking volunteers, were excluded from further analysis.
The study was approved by the Multi Centre Ethics Committee (New Zealand).

Study design
The present cross-sectional case-control study compared smokers of the same ethnicity with comparable demographic variables (specifically age, sex and smoking history). The controls in the current study were carefully chosen to best represent the majority of smokers who have maintained normal or near-normal lung function despite decades of smoking (''resistant smoker'') as shown by many studies [4,5,[10][11][12]. Accordingly, the resistant smoker group best reflects those smokers least likely to develop lung cancer or COPD, thus minimising phenotype misclassification and improving the power to detect differences between affected and unaffected smokers [36]. We hypothesised that SNP associations might identify protective or susceptibility effects to one or a combination of COPD only (G1), COPD and lung cancer (G2), lung cancer only (G3) or neither disease (G0) (see Figure 1).

Genotyping
Genomic DNA was extracted from whole blood samples using standard salt-based methods and purified genomic DNA was aliquoted (10 ng?mL -1 concentration) into 96-well or 384-well plates. Samples were genotyped using either the Sequenom TM system (Sequenom TM Autoflex Mass Spectrometer and Samsung 24 pin nanodispenser) by the Australian Genome Research Facility (www.agrf.com.au) or by our university lab using TaqmanH SNP genotyping assays (Applied Biosystems, USA) utilising minor groove-binder probes. The Sequenom TM sequences were designed in house by AGRF with amplification and separation methods (iPLEX TM , www.sequenom.com) as previously described [26,27,32]. TaqmanH SNP genotyping assays were run in 384-well plates according to the manufacturer's instructions. PCR cycling was performed on both GeneAmpH PCR System 9700 and 7900HT Fast Real-Time PCR System (Applied Biosystems, USA) devices. SNP primers were designed by Applied Biosystems. Real-time amplification plots of selected plates were used to verify end-point allelic discrimination to establish reliability of the Taqman based genotyping.
The present study investigated the genotype frequencies of 11 SNPs. The rs16969968 SNP, situated within the nicotinic acetylcholine receptor (nAChR) gene on 15q25, the rs1052486 SNP, situated within the HLA-B associated transcript (BAT3) gene on 6p21, and the rs402710 SNP, situated within the cisplatinresistance regulated gene 9 (CRR9) gene on 5p15, were genotyped using the Sequenom TM system, whilst the remaining eight SNPs, the rs7671167 SNP, situated within the Family with sequence similarity 13A (FAM13A) gene on 4q22, the rs1489759 SNP, situated near the hedgehog-interacting protein (HHIP) gene on 4q31, the rs2202507 SNP, situated near the glycophorin A (GYPA) gene on 4q31, the rs2808630 SNP, situated near the Creactive protein (CRP) gene on 1q21, the rs10516526 SNP, situated within the glutathione S-transferase C-terminal domain (GSTCD) gene on 4q42, the rs1422795 SNP, situated within the A Disintegrin and Metalloproteinase 19 (ADAM19) gene on 5q33, the rs2070600 SNP, situated within the receptor for advanced glycation end-products (AGER) gene on 6p21, and the rs11155242 SNP, situated within the G-protein receptor 126 (GPR126) gene on 6q24, were genotyped by TaqmanH SNP genotyping assays. Failed samples were repeated until call rates of $95% for each SNP in each cohort were achieved. Genotype frequencies for each SNP were compared between the 3 primary groups (control smokers, COPD and lung cancer cohorts) and with sub-phenotyping the lung cancer cohort according to the presence or absence of COPD based on GOLD 2 criteria.

Algorithm and susceptibility score
The cumulative effect of those SNP genotypes identified as susceptible (Odds ratio, OR.1) or protective (OR,1), based on significant distortions in frequency (P,0.05) between the cases or sub-phenotypes and the control smokers, was examined using a previously published algorithm [27,32]. Only the lung cancer and control smoker cohorts were used for this analysis. In this algorithm, for each subject, a numerical value of 21 was assigned for each of the protective genotypes present among the protective SNPs and +1 for each of the susceptible genotypes present. Where an individual did not have either the protective or susceptibility genotype for that SNP, the score was 0 (i.e. did not contribute to the genetic score). This approach is consistent with a recently published study in prostate cancer [37]. As previously described [27,32], weighting the presence of specific susceptible or protective genotypes according to their individual odds ratios (ORs; from univariate regression) did not significantly improve the discriminatory performance of the cumulative SNP score (unpublished data).
The algorithmic approach used here involved deriving an overall ''susceptibility score'' for each subject (from the control and lung cancer cohorts) by combining genetic data (cumulative SNP scores) and clinical variables, identified in a multivariate analysis as previously described [27,32]. The clinical variables (and score) were age .60 years of age (+4), family history of lung cancer (+3) and prior diagnosis of COPD (+4) [32]. By using multivariate logistic and stepwise regression analysis, the 9-SNP panel was examined in combination with the pre-stipulated clinical variables above. As smoking exposure (pack-yrs) was a recruitment criterion for this study, and comparable between cases and controls, it was not included in the scoring system described here. The lung cancer susceptibility score (for the control and lung cancer cohorts) was plotted with (a) the frequency of lung cancer and (b) the floating absolute risk (FAR, equivalent to OR) across the combined smoker/ex-smoker cohort [38,39]. The FAR approach was adopted since it uses a 'floated' variance across all polychotomous risk categories rather than choosing on referent level and enables confidence intervals to be presented for all risk categories.

Analysis
Patient characteristics in the cases and controls were compared by ANOVA for continuous variables and Chi-squared test for discrete variables (Mantel-Haenszel, odds ratio (OR)). Genotype and allele frequencies were checked for each SNP by Hardy-Weinberg Equilibrium (HWE). Population admixture across cohorts was performed using structure analysis on genotyping data from 40 unrelated SNPs [40]. Distortions in the genotype and allele frequencies were identified by comparing lung cancer (subphenotyped by COPD) and/or COPD cases with ''resistant'' smoking controls using two-by-two contingency tables. Both the additive (allelic) and genotype based genetic models were tested although the latter is preferred [41]. Correction for multiple comparisons was not done as the SNPs were selected ''a priori'' from the GWA studies. Individual SNPs were not included in the combined risk model on the basis of statistical significance shown here but were included because they were identified by the GWA studies to be highly significantly associated with lung cancer. In this respect, this study was sufficiently powered to enable a small level of discrimination between cases and controls to be demonstrated for the resultant overall model rather than individual SNPs. With at least 450 cases and 450 controls this study achieves 80% likelihood to detect an area under the ROC curve of 0.55 using a two-sided z-test at the 5% significance level, ie we can conclude that the ROC curve for the SNP model offers better than chance association when the area under the receiver operating characteristics curve is at least 0.55 (Hintze, J (2006) PASS 2002, WWW.NCSS.COM) Genotype data (9-SNP panel) and the clinical variables were combined in a stepwise logistic regression to assess their relative effects on discriminating low and high risk (by point estimate and receiver operating characteristic (ROC) curve) by score quintile. The frequency distribution of the lung cancer susceptibility score was compared across the cases and controls. Its clinical utility was assessed using ROC analysis, which assesses how well the model predicts risk across the score (i.e. clinical performance of the score with respect to sensitivity, and specificity).

Demographic variables
Characteristics of the lung cancer cases, COPD cases and healthy control smokers are summarized in Table 2. The demographic variables and histological subtypes of the lung cancer cases are comparable to previously published data [42]. The staging at diagnosis was also comparable to this published series (data not shown) suggesting the lung cancer cohort is representative. The COPD cases have higher pack-year exposure than the lung cancer cases and healthy control smokers (P,0.05). This reflects outliers with high smoking histories in the COPD cohort that after log transformation of pack-years showed all groups were comparable (data not shown). All groups are comparable with respect to age started smoking, years smoked, years since quitting and cigarettes/day (Table 2). Overall, we believe the three groups are well matched for smoking exposure. We note a lower frequency of current smokers in the lung cancer and COPD cohorts, compared to healthy smokers (35% vs 40% vs 48% respectively) which may reflect an effect from their smoking-related diagnosis. Current smoking status had no effect on the lung function in the lung cancer cases group. The lung cancer cases, COPD cases and smoking controls were also comparable with respect to other aero-pollutant exposures ( Table 2). Those with lung cancer had a higher prevalence for a positive family history of lung cancer compared to the COPD cases and healthy smokers (19% vs 11% vs 9%). As expected, lung function was worse in the lung cancer and COPD cohorts compared to the healthy smoker controls. Testing lung function in the lung cancer cases (as described above) enabled stratification of results to test for an interactive or confounding effect of COPD.

Genotyping
The genotyping results for the 12 SNPs are shown in Table 3. The allele and genotype frequencies were comparable to those reported in the literature and from the International Hapmap Project (www.hapmap.org). The observed genotypes for the two Chr 4q31 SNPs (HHIP and GYPA) in this study were 65% concordant, in accordance with the reported degree of LD between these SNPs. The concordance for the other SNPs in ''close'' proximity (BAT3 and AGER on 6p21) showed very poor concordance as expected. As all SNPs were in Hardy-Weinberg equilibrium and amplification plots were used to ensure correct genotype calls, significant genotyping error is unlikely. We found no evidence for population stratification between the cohorts using 40 unlinked SNPs from unrelated genes (mean x 2 = 3.3, P = 0.58) [40]. Based on distortions in genotype frequency between the 3 groups, risk genotypes were assigned as generally conferring protection or susceptibility to COPD and/or lung cancer according to Figure 1.
Genotype associations according to sub-phenotyping for COPD (Table 3) The results below describe individual SNP associations between resistant smokers and those with COPD or lung cancer (total and subdivided by co-existing COPD). We found no effects from gender, height or smoking status (current vs former) on any of these associations. A relationship between SNP variants and lung function was only found for rs 16969968 in the lung cancer cases as previously published (26) but not for the other SNP variants (unpublished data). The numbers were considered too small to look at lung cancer sub-grouped by histology. The genotype results below are summarised in Table 3.
Rs1052486, 6p21 (BAT3). The GG genotype was 23% in the controls group compared to 26% in the lung cancer group (N = 454, OR = 1.19, P = 0.25) and 21% in the COPD group (N = 458, OR = 0.88, P = 0.44) ( Table 4). Compared to controls, the GG genotype was significantly greater in those with lung cancer and COPD (N = 215) (23% vs 31%, OR = 1.50, P = 0.03) but no different in the lung cancer only subgroup (N = 207) (23% vs 21%, OR = 0.89, P = 0.57). The GG genotype was significantly greater in the lung cancer with COPD group than the lung cancer only group (31% vs 21%, OR = 1.68, P = 0.02). The GG genotype of the BAT3 SNP appears to confer susceptibility for lung cancer in those with COPD (G2 in Table 4).
Rs1489759 and rs2202507, 4q31 (HHIP and GYPA respectively). The GG genotype of the HHIP (rs 1489759) SNP was found to be more prevalent in the control group compared to COPD (17% vs 11%, OR = 0.59, P = 0.006) and lung cancer (17% vs 13%, OR = 0.70, P = 0.05) groups (Table 4). Similarly, the corresponding (minor) CC genotype of the GYPA (rs 2202507) SNP was more prevalent in the resistant smokers group compared to those with COPD (27% vs 19%, OR = 0.65, P = 0.06) and lung cancer (27% vs 21%, OR = 0.70, P = 0.02) groups (Table 4). When the lung cancer cases were stratified by available spirometric data (n = 419 and n = 416 for HHIP and GYPA genotyping, respectively), into those with and without COPD (GOLD 2+ criteria), the distribution of the minor allele homozygote for both SNPs does not change significantly. The effect sizes of the homozygote minor allele in these sub-analyses remain the same, although the p values are degraded due to smaller sample sizes. When grouping all subjects with COPD (combining COPD and lung cancer with COPD groups, N = 670), the protective effect was nearly identical to that from using the COPD cohort alone (OR = 0.60, P = 0.003 and OR 0.66, P = 0.004 for the HHIP and GYPA, respectively). The minor allele homozygotes for HHIP and GYPA SNPs (GG and CC, respectively) appear to confer protection from both lung cancer and COPD (G0 in Figure 1 and Table 4).
Rs1422795, 5q33, (ADAM19). Compared to controls, the frequency of the CC genotype was marginally increased lung cancer cases (9% vs 13%, OR = 1.44, P = 0.08) and COPD cases (9% vs 13%, OR 1.47, P = 0.07) groups (Table 3). When the lung cancer cases were divided according to COPD the effect size remained the same although p-values were degraded due to smaller numbers (lung cancer with COPD 13%, OR = 1.51, P = 0.10 and lung cancer without COPD 13%, OR = 1.40, P = 0.20). When the CC genotype frequency of the controls is compared to those with COPD and lung cancer with COPD (9% vs 13%, OR = 1.45, P = 0.05) the larger cohort identifies a significant increase in the CC genotype in those with the COPD phenotype. The CC genotype is likely to be associated with modest susceptibility to both COPD and lung cancer (G2 in Figure 1 and Table 4).
Rs2070600, 6q21 (AGER). Compared to controls, the TT/ TC genotype frequency was significantly decreased in COPD patients (10% vs 15%, OR = 0.60, P = 0.01) but not in lung cancer (13% vs 15% in controls, OR = 0.87, P = 0.87). Sub-grouping lung cancer cases according to COPD phenotype did not identify any other associations. The TT/TC genotypes of the AGER SNP appeared to confer a protective effect for COPD (G1 in Figure 1 and Table 4).  Rs2808630, 1q23 (CRP). ompared to controls, the CC genotype was slightly less frequent in lung cancer (11% in 8%, OR = 0.68, P = 0.09) and COPD groups (11% vs 8%, OR = 0.69, P = 0.10) but significantly lower in the lung cancer only group (11% in controls vs 5%, OR = 0.47, P = 0.02). The frequency of the CC genotype was also significantly lower in the lung cancer only cohort compared to lung cancer with COPD despite the modest numbers (5% vs 9%, OR = 0.54, P = 0.03). This suggests the CC genotype of the CRP SNP was associated with susceptibility to lung cancer only (G0 in Figure 1 and Table 4).

Gene-based risk model
Using the results of the uni-variate analysis above, nine ''risk genotypes'' were identified as either protective or susceptible (Table 4). For each subject in the smoking control and lung cancer cohorts, the sum total of these SNP-based scores were added to the scores for the clinical variables (age, diagnosis of COPD, family history of lung cancer) to derive a total lung cancer susceptibility score [27,32]. On FAR analysis [25,26], the plot of the total score with the frequency of lung cancer shows a linear relationship across SNP score quintiles for both the 9 SNP ( Figure 2a) and 19 SNP (Figure 2b) panels, as previously shown [27,32]. The distribution plot of the total scores according to control smokers (blue line, Figure 3) and lung cancer cases (red line, Figure 3) is bimodal and the corresponding AUC is 0.69 for the 9 SNP panel used here (Figure 3a). When genotype data of the 10 most significant SNPs (smallest P values) from a previous analysis [32] are added to the 9 SNP panel, the AUC increases to 0.72 (Figure 3b). We note when the clinical variables only are used the AUC is 0.67 compared to the 9 SNPs alone of 0.59 and 19 SNPs alone of 0.67. We conclude that the addition of the 9 SNPs or 19 SNPs improves the AUC and the risk prediction utility of the risk score.

Discussion
This study provides further evidence that the genes underlying susceptibility to lung cancer may include genes relevant to susceptibility to COPD. This has been possible by using cohorts of smokers, matched for smoking exposure, but quite different in their phenotypic response to smoking exposure. This phenotypic response has been defined in part by the presence or absence of COPD, itself a common sub-phenotype of lung cancer [8,13,14], defined by a measurable biomarker (FEV 1 ) with a strong genetic basis [2,7]. By comparing chronic smokers with normal lung function with those with COPD and lung cancer, sub-phenotyped for COPD, the genetic associations identified to date can be better understood. Indeed, by re-examining the associations reported from recently reported lung cancer and COPD (FEV 1 ) GWA studies, the results of this current study suggest the genetic effects from these loci confer specific protective or susceptibility effects on COPD, Lung cancer or both (Figure 1, Tables 1 and 4). Despite comparatively small sample sizes here, using this approach the authors have recently shown that the 15q25 (CHRNA 3/5) and 4q31 (HHIP/GYPA) loci might be relevant in both COPD and lung cancer [26,28]. The results in this study suggest that the rs1052486 SNP on the 6p21 locus (BAT3) confers susceptibility to lung cancer in smokers with pre-existing COPD and that, the rs402710 SNP on the 5p15 locus (CRR9/TERT) and the rs2808630 SNP on the 1q23 locus (CRP) confer susceptibility to lung cancer in those with no pre-existing COPD. The rs1422795 SNP on the 5q33 locus (ADAM 19), previously associated with reduced FEV 1 [22,23], might also confer susceptibility to both COPD and lung cancer. The rs7671167 SNP on the 4q22 locus (FAM13A), previously linked to reduced lung function and COPD [23,] is associated with both COPD and lung cancer. Larger studies will be needed to confirm these findings as the sample sizes here are small, particularly after sub-phenotyping the lung cancer cases for COPD. These results also suggest that the previously published risk algorithm [27,32], where combining risk genotypes and clinical variables identified in a multivariate analysis, can segment smokers into moderate, high and very high risk of lung cancer. The authors conclude that when spirometry is used to sub- phenotype smokers, genes with effects on reduced lung function or COPD appear to be relevant in ''susceptibility'' to lung cancer. This provides further evidence to support existing epidemiological studies suggesting COPD and lung cancer are related by more than smoking exposure [24,30] but also an overlapping genetic susceptibility to smoking (Figure 1 and Tables 1 and 4) [26,28]. Epidemiological studies suggest COPD is an important subphenotype of lung cancer. The results of this study suggest genetic associations broadly define three disease groups: smokers primarily susceptible to COPD (G1), smokers susceptible to both COPD and lung cancer (G2), and smokers susceptible to lung cancer only (G3) (Figure 1 and Table 4). More importantly, the epidemiological studies also show there is a fourth group of smokers, consisting of the majority of smokers (<70%) [4,5,12], who maintain normal or near normal lung function. This group, have a ''resistant'' phenotype (G0), either do not develop, or are at least risk of, COPD and lung cancer [4,5,8,9,12]. This is likely to be due, in part, to an excess of protective genetic variants compared to susceptibility variants [27,31]. Based on the results of this study, the G0 genes conferring protection from COPD and lung cancer include the rs7671167 SNP (FAM13A gene on the Chr 4q22 locus) and the rs1489759 and rs2202507 SNPs (GYPA and HHIP genes on the Chr 4q31 locus). The rs2070600 SNP (AGER on the Chr 6p21 locus), previously linked to reduced FEV 1 , appears to be a susceptibility gene for COPD but not lung cancer (G1). Both the rs169968 SNP (CHRNA3/5 gene on the Chr 15q25 locus) and the rs1052486 SNP (BAT3 gene on the Chr 6p21 locus) appear to confer susceptibility to lung cancer, but the latter only in conjunction with COPD (G2). The rs402710 SNP (CRR9 (TERT) on the Chr 5p15 locus) appears to confer susceptibility to lung cancer in those with no pre-existing COPD, in keeping with other studies (G3) [34,43,44]. These observations require validation in larger studies where SNP effects on histological subtypes might also be relevant to our findings [1,43]. Several loci linked to lung function in the general population, such as the rs10516526, rs11168048 and rs11155242 SNPs (GSTCD on 4q24, HTR4 on 5q33, and GPR126 on 6q24, respectively) [22,23] do not appear to be related to COPD in this study. However, given that the population study did not look specifically at smokers, it is possible that these loci are not relevant to the lung's response to tobacco smoke exposure. The authors conclude that the novel study design used here provides a viable approach with which to better understand the genetic epidemiology of lung cancer.
The pathologic link between COPD and lung cancer may stem in part from the overlapping inflammatory, apoptotic and matrix remodelling/repair processes [45][46][47] underlying COPD, and the development of squamous metaplasia, epithelial-mesenchymal transition (EMT) and DNA damage that underlies lung carcinogenesis [28,45,[48][49][50][51]. In particular, there is growing evidence that suggests these smoking induced changes are orchestrated by the bronchial epithelium [28,45,[48][49][50][51] -the HHIP, CHRNA 3/5 and FAM13A proteins are all known to be expressed on the bronchial epithelium (see below). Although several of the SNPs, investigated in this study have been shown to have functional effects on gene expression or protein function, they may not themselves be the causal variant, but instead representative of the causal allele through linkage disequilibrium [52]. We note that in many instances, the physical distance between these risk SNPs and the proposed candidate genes is large. Despite this, it remains possible that the investigated SNPs are themselves functional as (a) studies have shown that SNPs with regulatory effects on genes maybe some distance away [50], and (b) it has recently been recognised that common SNPs with consistent disease association signals, through ''Synthetic associations'', may represent the biological effects of rare variants in nearby genes as much as 2 mega-bases apart [53]. If such an effect were true, then there is potential for considerable overlap between the susceptibility genes for COPD and for lung cancer. The rs16969968 SNP (CHRNA 3/5 on 15q25,) investigated in this study results in a nonsynonymous amino-acid change in a highly conserved region of the second intra-cellular loop of the a5 subunit of the nicotinic acetylcholine receptor. This receptor is expressed on both bronchial epithelial cells and inflammatory cells, and is believed to moderate pulmonary inflammation [54] and lung destruction [34]. This receptor also binds both nitrosamines (known carcinogens in cigarette smoke [55]) and nicotine linking it to lung cancer and nicotine addiction respectively [56]. The rs1052486 SNP (BAT3 on 6p21,) is a missense mutation (Ser619Pro) in the BAT3 gene and has been previously linked to lung cancer [57]. BAT3 is a nuclear protein that influences apoptosis through it's interaction with p53 [58] linking it to both COPD and lung cancer. The rs1489759 SNP (HHIP on 4q31,) is 93 kb upstream of the HHIP gene and of unknown function. The HHIP protein is believed to be important in the bronchial epithelial response to smoking [59] and epithelial repair processes in lung cancer [60]. The HHIP protein has been linked with epithelial-mesenchymal transition, a pathological process that results from lung remodelling (with release of metalloproteinases and growth factors [29,45,61]) and initiates lung carcinogenesis [48]. The rs2202507 SNP (GYPA on 4q31,) is of unknown function and downstream of the GYPA gene. The GYPA protein, found on erythrocytes, shows reduced expression in COPD and is indicative of oxidative stress [62]. Whether the GYPA association with COPD and lung cancer reflects an independent effect or linkage effect with the HHIP locus (LD<0.70) is still debated [21]. The rs7671167 SNP (FAM13A on 4q22,) is found in intron 4 of the FAM13A gene and has no known biological function [43,63]. The FAM13A protein, expressed in respiratory cells, is thought to be involved in signal transduction with possible tumor suppressor activity [63,64]. The rs1422795 SNP (ADAM 19 on 5q33,) is a missense mutation (Ser284Gly) in the ADAM 19 gene. ADAM 19 is a transmembrane protein expressed in human lung implicated in cell-matrix interactions [65], pulmonary inflammation [66] and lung cancer [67]. The rs402710 SNP (CRR9 (TERT) on 5p15,) is an intronic SNP of unknown function in the CRR9 gene and associated with lung cancer in many studies [1,17,18,34]. This SNP is 25 kb upstream from the TERT gene encoding, which encodes the catalytic subunit of telomerase, a reverse transcriptase that affects telomere shortening, which has been implicated in both aging and lung cancer [68]. The results of the current study suggest that the CRR9/TERT locus confers susceptibility to lung cancer in the absence of COPD. Such a finding is in accordance with those recently reported by Yang et al [34], who found after adjusting for the presence of COPD, only the rs 402710 SNP (Chr5p15 locus) was associated with lung cancer while the effects of the other GWA associated SNPs were lost. The rs2808630 SNP (CRP on 1q23,) is found in the 39 flanking region of the CRP gene and has been associated with serum CRP levels (C allele with reduced CRP) [69]. Elevated CRP levels have been shown in prospective studies to be associated with greater decline in lung function [70] and elevated lung cancer risk after adjustment for smoking [71]. In the current study, where all cohorts were matched for smoking exposure, the CC genotype (low CRP level) was less frequent in both COPD and lung cancer cases although only achieved significance in the lung cancer only sub-phenotype. The rs2070600 SNP (AGER on 6p21,) is a missense mutation (Gly82Ser) of the AGER gene and shown to affect the inflammatory response in humans [72]. AGER protein expression has been shown to be increased in the lungs of smokers with COPD [73] whilst decreased in human lung cancer cell lines [74]. We conclude that the SNP associations described here with COPD and/or lung cancer can be explained by plausible, but as yet unproven, biological functions. We also conclude that through sub-phenotyping for COPD, possible clues as to the independent and overlapping pathogenic processes underlying COPD and lung cancer can be better examined.
The use of healthy smokers as controls in this study represents a novel though possibly controversial approach [31] to identifying the genetic basis of lung cancer. The authors contend that such an approach is classically used in pharmacogenetic studies where the disparate response to a standardised dose of drug provides a dynamic phenotype (high vs low metabolisers or responders vs nonresponders) from which to identify relevant genes [75]. In the setting of lung cancer, smoking is the drug and FEV 1 the biomarker of responsiveness. The latter is based on the epidemiological studies showing that FEV 1 is the most important risk factor for lung cancer among smokers [8,9,12,8,25,76] and has a bimodal distribution among chronic smokers [10][11][12]. The latter is very relevant as bimodal distribution supports a genetic basis as suggested by twin studies where heritability of FEV 1 is estimated to be 40-77% compared to only 15-25% for lung cancer [6,7]. From a genetic epidemiology perspective, a cohort of chronic smokers with the resistant or ''non-responder'' phenotype (normal or near normal FEV 1 ), might provide an alternate control group to the non-random (and unscreened) smokers used in casecontrols to date [17][18][19]. Controls recruited from hospital clinics or in the absence of spirometric screening (volunteers), report a COPD prevalence of 30% or more [33]). If the control group includes a high proportion of smokers with COPD, the effect of the COPD related genes on lung cancer susceptibility will be diluted or lost. This is also relevant as the proportion of COPD patients who eventually develop lung cancer may be as high as 25-30% [8,77] and the frequencies of several disease-related SNPs are very similar between lung cancer and COPD groups (See Table 3, eg FAM13A, HHIP). This might explain why the lung cancer GWA studies to date failed to consistently identify the Chr4q31 (HHIP/GYPA) and Chr4q22 (FAM13A) loci as a protective loci [17][18][19], and the Chr 5q33 (ADAM19) locus as a possible susceptibility locus. It would also explain why matching for COPD in the lung cancer cases and controls might identify only the Chr5p21 (CRR9/TERT) locus which in the current study was associated with lung cancer in smokers with no underlying COPD [34]. The authors propose that FEV 1 be routinely measured in genetic epidemiology studies of lung cancer to better understand the role of ''COPD genes'' in lung cancer [8,12]. Subtyping for emphysema using computerised tomography or reduced diffusion capacity would further refine the subphenotyping for COPD [78].
It is possible that the specific associations reported in this study reflect in part, small sample size and chance findings. This represents an important limitation of the current study requiring replication in a larger study. It is also possible that the findings reflect true associations that have been better identified, despite small sample sizes, by more precise phenotyping of subjects. Minimising misclassification has been shown to improve the power of a study to identify true associations [36]. The authors suggest that some important associations may be either missed [18,19] or miss-assigned [17][18][19] in studies where the COPD status of smoking controls is unknown, especially using hospital based controls where the prevalence of COPD has been found to be as high as 30% [33]. The latter would be analogous to searching for type 2 diabetes genes by comparing obese patients with type 2 diabetics thereby missing the genetic effects contributing to obesity. If previous case-control studies use control groups where the prevalence of COPD is 25-30%, then relevant genetic effect may be obscured. This is well illustrated in Table 3 where, for several SNPs (eg HHIP, GYPA, CRR9 (TERT), ADAM19 and CHRNA 3/5), the frequencies of ''risk genotypes'' between COPD and lung cancer cases are very similar. In addition, matching of other confounding variables, in particular smoking dose exposure, may also help to detect relevant genetic associations which might otherwise be diluted by using unexposed people (non-smokers [17][18][19]). Matching for smoking is particularly important in these studies of smoking related disease as the penetrance of SNP effects, reflected in the odds ratio, are likely to be related to the degree and/or duration of smoking exposure. The effect of certain SNPs have been shown to be greater when investigated only in those with greater smoking exposure [21,29]. This is the case in a1-anti-trypsin deficiency where people homozygote for the Z allele (low a1-antitrypsin level) are at risk of emphysema when they smoke, but much less so when they are non-smokers [79]. Lastly, there remains the possibility that the SNP associations reported here result from gender, age or height differences between the group comparisons. Although our sample sizes are modest, we think this is unlikely as the groups are comparable with respect to these variables and we specifically examined this possibility and did not find any SNP effects confounded by these variables.
The authors have previously reported a lung cancer susceptibility model whereby genotype data is combined with non-genetic data [27,32]. This model is based on the results of a multivariate analysis that include the genotypes, scored according to whether they conferred a small protective (-1) or susceptibility (+1) effect [27,32]. The clinical variables, identified as independent predictors of lung cancer following multivariate analysis were, age over 60 years, a family history of lung cancer and previous diagnosis of COPD. In stepwise regression, family history of lung cancer is independently associated with lung cancer risk after inclusion of the SNP genotype data [80] and likely reflects rare family-specific genetic effects not accounted for by the genotypes tested here. An example of such a genetic effect is represented by the RGS17 gene on Chr 6q24 implicated in familial lung cancer but not investigated here [81]. Similarly, the prior diagnosis of COPD is independently associated with lung cancer risk and likely reflects the contribution of genetic susceptibility to COPD not otherwise accounted for by the SNPs in the panel. The SNP data provides an important and significant contribution to the overall score as ''risk genotypes'' are a risk variable present from birth, and unlike family history and diagnosis of COPD, not dependent on age or natural history of disease. This is very relevant to prevention as high risk SNP genotypes can be identified early in a person's smoking history, before irreversible malignant transformation has occurred. Although lung function data itself is also an important variable in defining the risk of lung cancer, it is usually not available for the majority of smokers where it is often not done until exertional breathlessness is severe and when over 50% of lung function is irreversibly lost [12]. For each subject in the control smoker and lung cancer cohorts, a lung cancer susceptibility score was derived according to these variables and their distributions compared [27,32]. The distribution showed a bimodal separation suggesting utility as a screening test of risk [27,32,82]. Using the same approach in the current study, with the susceptibility and protective genotypes derived from the GWA SNPs (9 SNP panel, Table 4), the lung cancer susceptibility score was also bimodal and showed a limited utility in an ROC analysis (AUC = 0.69) (Figures 2 and 3). This utility was increased when the 10 most informative SNPs from the previous study were added (N = 19 SNP model, AUC = 0.72, data not shown). This suggests that as new genetic variants are identified and added to the risk model, a greater utility based on ROC analysis might be achieved [31,80]. This study provides further evidence that lung cancer results from the combined effects of several genetic variants [83] with low penetrance [84] from genes implicated in both COPD and lung cancer [26][27][28]. This study also highlights the limitations of the lung cancer GWA studies reported to date [85] and the need to consider sub-phenotyping using spirometry-defined COPD to better understand the relative effects of genetic variants on lung cancer susceptibility [26,28]. In conclusion, this study provides additional evidence that genes involved in the risk of COPD may also be relevant to the risk of lung cancer and that spirometry be routinely used to identify COPD, an important sub-phenotype of lung cancer. This study also supports the potential of combining genotype data [27,32] in an algorithmic fashion to identify smokers at greatest risk of lung cancer.