Genetic Variants in MUC4 Gene Are Associated with Lung Cancer Risk in a Chinese Population

Mucin MUC4, which is encoded by the MUC4 gene, plays an important role in epithelial cell proliferation and differentiation. Aberrant MUC4 overexpression is associated with invasive tumor proliferation and poor outcome in epithelial cancers. Collectively, the existing evidence suggests that MUC4 has tumor-promoter functions. In this study, we performed a case-control study of 1,048 incident lung cancer cases and 1,048 age- and sex frequency-matched cancer-free controls in a Chinese population to investigate the role of MUC4 gene polymorphism in lung cancer etiology. We identified nine SNPs that were significantly associated with increased lung cancer risk (P = 0.0425 for rs863582, 0.0333 for rs842226, 0.0294 for rs842225, 0.0010 for rs2550236, 0.0149 for rs2688515, 0.0191 for rs 2641773, 0.0058 for rs3096337, 0.0077 for rs859769, and 0.0059 for rs842461 in an additive model). Consistent with these single-locus analysis results, the haplotype analyses revealed an adverse effect of the haplotype “GGC” of rs3096337, rs859769, and rs842461 on lung cancer. Both the haplotype and diplotype “CTGAGC” of rs863582, rs842226, rs2550236, rs842225, and rs2688515 had an adverse effect on lung cancer, which is also consistent with the single-locus analysis. Moreover, we observed statistically significant interactions for rs863582 and rs842461 in heavy smokers. Our results suggest that MUC4 gene polymorphisms and their interaction with smoking may contribute to lung cancer etiology.


Introduction
Lung cancer is the most common cancer in the world and accounted for 13% (1.6 million) of total cases and 18% (1.4 million) of cancer deaths in 2008 [1]. In China, the incidence and mortality rates of lung cancer have grown rapidly in the past few decades [2], and it is now the leading cause of cancer mortality; the average 5-year survival rate is <15% [3,4]. The lung cancer epidemic is directly attributable to cigarette smoking, which accounts for 87% of lung cancer cases. However, only a small percentage of smokers (<20%) develop lung cancer in their lifetime [5], suggesting that genetic susceptibility may play a role in lung cancer development.
Exposure to cigarette smoke stimulates an inflammatory cascade in airway epithelial cells For example, tobacco smoke generates reactive oxygen species that could injure the lung epithelium, resulting in altered permeability, goblet cell hyperplasia, as well as recruitment of neutrophils and macrophages to the airway [6][7][8][9]. Chronic inflammation causes prolonged irritation and activates local host responses, which ultimately promote cell proliferation [10]. Sustained cell proliferation facilitates tumor formation and progression in an angiogenic environment rich in inflammatory cells, growth factors, and activated stroma [11,12]. It has been demonstrated that one-third of all cancers are preceded by chronic inflammation [13]. Case-control studies have demonstrated an increased risk of lung cancer in patients with inflammatory airway phenotypes, such as asthma, bronchitis, and emphysema [14,15]. Recent data suggest that cigarette smoke activates airway epithelial cells and immune cells to release proinflammatory cytokines, such as cyclooxygenase-2 (cox-2), interleukins-4, 6, and 8 (IL-4, -6, -8) and tumor necrosis factorα (TNF-α).
Mucins have long been known to be target molecules of inflammatory reactions, and inflammatory diseases of the epithelium are often characterized by mucin upregulation and hypersecretion [16][17][18][19][20]. Moreover, abnormal MUC4 expression has been reported in various cancers, such as pancreatic adenocarcinomas [21] and colon carcinomas [22], as well as in other lung and airway inflammatory diseases including cystic fibrosis and chronic obstructive pulmonary disease [23][24][25]. Growth factors are thought to be involved in mucus-secreting cell production because hypersecretory diseases are associated with abnormal epithelial cell growth and proliferation [26].
In addition to its adverse effects in inflammatory diseases, MUC4 also plays a critical role in regulating diverse processes in lung stromal/parenchymal cells, including apoptosis and metastasis. MUC4 acts as an intramembrane ligand for ErbB2/ HER2/neu and potentiates its autophosphorylation [27]. It has been found that MUC4-induced ErbB2/neu signaling may mediate the antiapoptotic function of MUC4 [28]. Moreover, MUC4 may possess a tumor-promotion function, in part by regulating HER2 gene expression. ErbB2/HER2 expression levels have been correlated with tumor size and lymph node metastasis, suggesting the involvement of ErbB2 and ErbB2mediated signaling in tumorigenesis [29]. Taken together, these observations imply that MUC4 may promote tumor progression in human lung cancer pathogenesis.
The present work was motivated by the biological plausibility that genetic variation in MUC4 could alter its expression level or biochemical function and thus may have an impact on individual lung cancer risk. To test this hypothesis, we conducted a case-control study of 1,048 incident lung cancer cases and 1,048 age-and sex-frequency-matched, cancerfree controls in a Chinese population. We also investigated potential interactions between tagSNPs of the MUC4 gene and cigarette smoking in lung cancer risk.

Study subjects
The study design and subject recruitment were described as below: briefly, the 1,048 lung cancer patients and 1,048 cancer-free controls were genetically unrelated ethnic Han Chinese from Guangzhou City. Patients with histopathologically confirmed incident lung cancer were consecutively recruited from September 2009 to September 2011 in the Thoracic Surgery Department of The First Affiliated Hospital of Guangzhou Medical University. The 1,048 cancer-free controls that were frequency matched to patients by sex and age (±5 years) were randomly selected from the Health Examination Center of the same hospital during the same time period. Before recruitment, written informed consent was obtained from each eligible subject, and a structured questionnaire was administered by interviewers to collect information on demographic data and environmental exposure history, including tobacco smoking and alcohol intake. Subjects were identified as nonsmokers or smokers. Individuals who had smoked fewer than 100 cigarettes in their lifetime were defined as nonsmokers; otherwise, they were defined as smokers (those smokers who stopped smoking for >1 year were also defined as smokers). Pack-years were calculated by multiplying the number of packs of cigarettes smoked per day by the number of years the person has smoked. Similarly, participants who had consumed alcoholic beverages at least once a week for the previous year were defined as drinkers, and the others were considered nondrinkers. Family history of cancer was defined as any self-reported cancer in first-degree relatives (parents, siblings, or children). After the interview, a 5ml venous blood sample was collected from each participant. The study was approved by the institutional review board of Guangzhou Medical University (Ethics Committee of The First Affiliated Hospital: GZMC2009-08-1336).

Selection of SNPs of MUC4
The human MUC4 gene is ~211 kb in size and is located on chromosome 3 in region q29 [30]. To identify SNPs that were related to lung cancer, we first selected 296 of MUC4 SNPs with minor allele frequency (MAF) > 5% from both dbSNP (http://www.ncbi.nlm.nih.gov/SNP, accessed 9/9/2012 and HapMap databases [Han Chinese]) (File S1), and genotyped them in a small subset of samples from 300 randomly selected pairs of case and control subjects from 1048 pairs on an Illumina high-throughput genotyping platform (Genome Analyzer IIx, Illumina Inc., San Diego, CA). Out of this group, we identified nine SNPs (rs863582, rs842226, rs842225, rs2550236, rs2688515, 2641773, rs3096337, rs859769, and rs842461) that exhibited significant frequency differences between cases and controls (data not shown). Genotype frequencies of SNPs can be influenced by sample sizes [31]. To minimize the bias due to small sample size, we next conducted direct sequencing for the whole set of 1,048 pairs of case and control samples using the ABI PRISM 7500 Sequence Detection System (Applied Biosystems, Foster City, CA) to confirm the above genotyping results ( Table 1). The results from the two platforms were found to be 100% concordant; therefore, we provided association results from the entire set of 1048 pairs in this paper. Finally, we identified two tagSNPs (rs863582 and rs842461) according to the following criteria: a minimal set of haplotypes that ensure an r 2 of at least 0.8 to cover all possible haplotypes that had a frequency of at least 5% as evaluated by the tagSNPs program [32]. In addition, as shown in Figure 1, the reconstructed linkage disequilibrium (LD) plot identified two blocks for the above nine SNPs in 1,048 control subjects: block 1 for rs863582, rs842226, rs842225, rs2550236, rs2688515 and rs 2641773; and block 2 for rs3096337, rs859769 and rs842461. Among these SNPs, we found the one in block2 were in high LD with each other (r 2 min > 0.80, D' = 1.00, see Table S1 for each pair), and therefore we chose rs842461 to represent all three.

Genotyping assays
The genomic DNA of subject's blood samples was extracted with a QIAGEN Blood DNA Kit (Venlo, The Netherlands). An allelic discrimination method using allele-specific fluorogenic probes (the 5 nuclease assay with MGB probes and TAMRA probes, as used in the Taqman assay [33] was chosen for genotyping using a ABI PRISM 7500 Sequence Detection System). Primers and probes are described in Table S2 and were designed by Primer Express 3.0 (Applied Biosystems) and synthesized by Shanghai GeneCore Biotechnologies (Shanghai, China). Polymerase chain reaction (PCR) was performed in 10-μl reaction systems. The PCR protocol consisted of an initial melting step at 95°C for 10 min, 40 cycles of 95°C for 15 s, and 60°C for 1 min. A multicomponent algorithm was used to calculate distinct allele signal contributions from fluorescent measurements for each sample with the ABI 7500HT real-time PCR system. The genotypes were automatically determined by Sequence Detection Systems software 2.3 (Applied BioSystems) ( Figure S1). In the genotyping assays, 10% samples were randomly selected to perform repeated assays for each SNP, and the results were 100% concordant.

Statistical analyses
Two-sided χ 2 tests were used to assess differences in selected demographic variables, smoking status, pack-years of smoking, family history of cancer, frequencies of MUC4 alleles, and genotypes between the cases and controls. Goodness-offit to Hardy-Weinberg equilibrium (HWE) in controls was also evaluated with a χ 2 -test for each SNP. Akaike's information criteria (AIC) [34] were applied to select the most parsimonious genetic model for each SNP. Odds ratios (ORs) and corresponding 95% confidence intervals (CIs) were measured with an unconditional logistic regression model with adjustments for age, sex, smoking status, alcohol drinking  Table 1. Two haplotype blocks (colored) were defined by the Haploview program using the approach described by Gabriel et al. [34] with default settings (the 95% CI for a strong LD was minimal for upper 0.98 and low 0.7, and maximal for a strong recombination of 0.9, and a fraction of strong LD in informative comparisons was at least 0.95). The rs number (top, from right to left) corresponds to the SNP name, and the numbers in squares are D' values (|D'|×100). The measure of LD (D') among all possible pairs of SNPs is shown graphically according to red shading, where white represents very low D', and dark red represents very high D'.  status, and family history of cancer. Stratification analyses were also performed by variables of interest, such as age, sex, smoking status, alcohol drinking status, family history of cancer, and histologic types. The pairwise LD among the SNPs was calculated using Lewontin's standardized coefficient D', and LD coefficient r 2 [35], and haplotype blocks were defined by the method described by Gabriel et al. [36] using publicly available Haploview software (http://www.broad.mit.edu/ personal/jcbarret/haplo/) with default settings (the CI for a strong LD was minimal for upper 0.98 and low 0.7, and maximal for a strong recombination of 0.9, and a fraction of strong LD in informative comparisons was at least 0.95). Each common haplotype (MAF > 0.05) was compared between all cases and controls and in each stratum of cumulative smoking dose to determine whether smoking influenced the risk associated with MUC4 variants by using haplo.stats (available at http://mayoresearch.mayo.edu/mayo/research/schaid_lab/ index.cfm). In addition, a PHASE 2.1 Bayesian algorithm [37] was used to validate the haplotype frequencies estimated by Haplo.stats and infer diplotype frequencies based on the observed genotypes. Diplotype (haplotype dosage, an estimate of the number of haplotype copies) was the most probable haplotype pair for each individual. Unconditional logistic regression analyses were used to estimate ORs and 95% CIs for case-control subjects carrying one to two copies versus zero copies of each common haplotype for the dichotomized diplotypes ( Table S3). The issue of multiple tests was controlled by performing 10,000 permutation tests.
To explore potential interactions between the tagSNPs and smoking status, we performed multiple tests to assess result consistency, including analyses of specific categories of cumulative smoking exposure (i.e., pack-years), genotypesmoking joint-effects, and interaction models that considered both discrete (nonsmokers, light smokers [≤20 pack-years] and heavy smokers [>20 pack-years]) and continuous (square root of pack-years) variables for cumulative smoking exposure. Statistical analyses were performed using SAS 9.2 software (Cary, NC, USA).

Study population characteristics
The characteristics of the 1,048 lung cancer patients and 1,048 cancer-free controls are described in Table S4. The lung cancer cases and controls were adequately matched for age and sex (P = 0.7597 and 0.7734, respectively). Cigarette smoking was associated with increased risk of lung cancer among heavy smokers (OR = 1.66 and 95% CI = 1.37-2.03, data not shown). Among the 1,048 lung cancer cases, 790 (75.38%) were defined as non-small-cell lung cancer (384 adenocarcinoma, 368 squamous cell carcinoma, and 37 largecell carcinoma), 121 (11.55%) were small-cell lung cancer, and 138 (13.17%) patients had other carcinomas.

Association between individual SNPs and lung cancer risk
As summarized in Table 1, the genotype frequency distributions of the nine selected SNPs ((rs863582, rs842226, rs842225, rs2550236, rs2688515, 2641773, rs3096337, rs859769, and rs842461) in control subjects were all consistent with those expect from the HWE model (all P > 0.05). One SNP (rs2688515) in this Chinese population represented an MAF that was 12.76% lower than reported in the dbSNP database (http://www.ncbi.nlm.nih.gov/SNP, accessed 9/9/2012), whereas the other SNPs (rs842226) represented an MAF 20.52% higher than those reported in the HapMap SNP database (Han Chinese), which may reflect either a diverse population difference or frequency bias due to small sample sizes from which the databases were derived. Allele frequencies of all SNPs showed significant differences between the 1048 case and control pairs (P = 0.0116 for rs863582, P = 0.0129 for rs842226, P = 0.0091 for rs842225, P = 0.0152 for rs2550236, P = 0.0036 for rs2688515, P = 0.0048 for rs2641773, P = 0.0012 for rs3096337, P = 0.0016 for rs859769, and P = 0.0012 for rs842461).
Significant associations were observed for all nine SNPs (P = 0.0425 for rs863582, 0.0333 for rs842226, 0.0294 for rs842225, 0.0010 for rs2550236, 0.0149 for rs2688515, 0.0191 for rs 2641773, 0.0058 for rs3096337, 0.0077 for rs859769, and 0.0059 for rs842461 in a additive model) based on the best fit of the AIC. The two tagSNPs (rs863582 and rs842461) remained significant after applying 10,000 permutations (P value from empirical distribution of minimal P values = 0.0315).
Multivariate logistic regression models showed that after adjusting for confounding factors, compared with wild-type carriers in a dominant model, a significantly increased lung cancer risk was associated with the variant genotypes of rs863582 (T>C) (adjusted OR = 1.39 and 95% CI = 1.02-1.56 for CT/CC genotypes) and rs842461 (A>C) (adjusted OR = 1.25 and 95% CI = 1.05-1.49 for CA/CC genotypes) ( Table 2).
We further assessed the associations of the rs863582 (T>C) and rs842461 (A>C) variant genotypes with lung cancer risk stratified by selected variables and histological types. As shown in Table 3, compared with the common wild-type homozygous genotype, the adverse effect of rs863582 (T>C) was more evident in smokers (adjusted OR = 1.41 and 95% CI = 1.12-1.79), especially heavy smokers (≥ 20 pack-years, adjusted OR = 1.59 and 95% CI = 1.19-2.13) and in those with severe lung cancer (adjusted OR = 1.34 and 95% CI = 1.04-1.73). Consistent with these results of rs863582 (T>C) genotypes and lung cancer risk analysis, rs842461 (A>C) variant genotype analyses also revealed almost identical change tendencies in different subgroups.
Logistic regression analyses revealed that lung cancer risk was significantly increased among individuals carrying the haplotype "GGC" (adjusted OR = 1.30 and 95% CI = 1.09-1.55) compared with those carrying the most common haplotype "ATA" in block 2 ( Table 4). Notably, the "GGC" haplotype harbored the rs3096337 G allele and the rs842461 C allele, and these two alleles were both associated with significantly increased lung cancer risk in the single-locus analysis. Furthermore, the stratified analyses revealed that lung cancer risk was further increased among heavy smokers carrying the haplotype "GGC" (adjusted OR = 1.60 and 95% CI =

Gene-smoking interaction analysis
As summarized in Table 5, we first classified cumulative smoking dose as a discrete variable (nonsmokers, light smokers, and heavy smokers) to avoid the issue of potential participant misclassification for smoking exposure. The adjusted ORs of the rs863582 TC/CC versus TT genotypes increased significantly as pack-years increased in both the cumulative smoking exposure and the genotype-smoking jointeffects analyses, although the comparisons between light and nonsmokers did not reach statistical significance. When we considered nonsmokers with TT or TC/CC as the reference group in the joint-effects model, heavy smokers with the same genotypes had the greatest risk for lung cancer (OR = 1.73 and 95% CI = 1.27-2.36; OR = 2.50 and 95% CI = 1.85-3.39), suggesting that it is a major risk factor for lung cancer. The genotype-smoking interaction model revealed significant multiplicative interaction between the rs863582 polymorphism (TC/CC versus TT) and trichotomized cumulative smoking dose (P < 0.0001). We also observed a consistent and robust result when considering smoking as continuous cumulative smoking dose (square root of pack-years) (P < 0.0001). Similar to the results of rs863582, rs842461 exhibited almost identical change tendencies in genotype and cumulative smoking dose analysis. Notably, the adjusted ORs of the rs842461 CA/CC versus AA genotypes increased significantly as pack-years increased in both the cumulative smoking exposure and genotype-smoking joint-effects analyses, although the comparisons between light and nonsmokers also did not reach statistical significance. When taking nonsmokers with AA or CA/CC as the reference group in the joint-effects model, heavy smokers with the same genotypes had the greatest risk for lung cancer (OR = 1.82 and 95% CI = 1.35-2.46; OR = 2.43 and 95% CI = 1.78-3.32), suggesting that it is a major risk factor for lung cancer. The genotype-smoking interaction model revealed significant multiplicative interaction between the rs842461 polymorphism (CA/CC versus AA) and trichotomized cumulative smoking dose (P = 0.0001). We also found a consistent and robust result when considering smoking as continuous cumulative smoking dose (square root of packyears) (P = 0.0024).

Discussion
In the present case-control study, we investigated the effect of multiple common MUC4 gene variants and their interaction with cigarette exposure on lung cancer risk in a Southern Han Chinese population. We found that nine SNPs (rs863582, rs842226, rs842225, rs2550236, rs2688515, 2641773, rs3096337, rs859769, and rs842461) were significantly associated with lung cancer risk. Consistent with the results of single-locus analysis, the haplotype analyses revealed an adverse effect of the haplotype "GGC" of rs3096337, rs859769, and rs842461 on lung cancer. Both the haplotype and diplotype "CTGAGC" of rs863582, rs842226, rs2550236, rs842225, and rs2688515 had adverse effects on lung cancer risk, which is consistent with the single-locus analysis results. Moreover, we observed a statistically significant interaction for rs863582 and rs842461 with cigarette smoking when tested as either a discrete or continuous variable. These findings support our hypothesis that MUC4 polymorphisms and their interaction with smoking may contribute to lung cancer etiology. To the best of our knowledge, this is the first study to assess associations for a broad spectrum of genetic variants individually and collectively as haplotypes of the MUC4 gene and lung cancer risk.
It is biologically plausible that MUC4 may be involved in lung cancer etiology. For example, MUC4 is thought to be a very specific (100%) and sensitive (91.4%) marker in paraffinembedded lung adenocarcinoma tissue, which could be useful in diagnostic practice in the distinction between malignant mesothelioma and adenocarcinoma [38]. Moreover, MUC4 overexpression was found to correlate with poor prognosis in small-sized lung adenocarcinomas [39]. Accumulating evidence suggests that MUC4 might also be a potential diagnostic and prognostic marker for other malignancies, such as ductal carcinoma [40][41][42].
We first of all found that the two tagSNPs, rs842461and rs863582 were associated with lung cancer risk. In the singlelocus association analysis, variant genotypes of these two SNPs exhibited a significantly increased risk of lung cancer individually, even after 10,000 permutations. Moreover, we found "GGC" was accounted for a 60% increase in lung cancer risk among heavy smokers, which was consistent with the effect of variant rs842461 genotypes among the same subgroup, suggesting that the adverse effect of "GGC" was indeed driven by the rs842461 C allele and the rs3096337 G allele. These two SNPs were both located in haplotype block 2, which showed a significant and consistent association with lung cancer risk. Notably, block 2 corresponds to intron 1. Existing evidence indicates that the sequence in intron 1 of human genes may play an important role in transcriptional regulation. The role of intron 1 of MUC4 in gene regulation and the influence of rs842461 are unknown. Although the functional relevance of rs842461 is not yet clear, it is possible that it may increase transcription activator affinity or decrease that of transcription suppressors to the intronic enhancer, thus upregulating MUC4 expression levels. Further study is warranted to provide experimental evidence in support of this hypothesis.
Our present study also indicates that the effect of rs863582 or rs842461 appears to be strongly modified by cumulative cigarette smoking. Interestingly, the variant genotypes had no effect in nonsmokers or light smokers but were risk factors among heavy smokers compared with their respective wildtype genotypes. For example, heavy smoking (≥20 pack-years) alone only conferred a 1.58-fold increased lung cancer risk for rs863582, but the effect of heavy smoking with the same genotypes was almost 1.73-or 2.50-fold when TT or TC/CC genotype was used as the respective reference in the jointeffects model, indicating a risk-enhancing relationship between smoking and rs863582 genotype variants. Consistent with these results, the rs842461 variant genotype analyses also revealed almost identical change tendencies. The underlying mechanism involved in the interaction between MUC4 and smoking is not clear. A large number of biologically active molecules, such as cytokines, bacterial products, growth factors, differentiation agents, and other factors (e.g., tobacco smoke) have been found to regulate MUC4 expression in vitro and/or in vivo in various cell types [43][44][45][46]. Therefore, it is likely that smoking might significantly induce MUC4 expression, and it is possible that the variant allele G of rs3096337 or C of rs842461 also leads to a higher basal expression level of MUC4 under normal circumstances. As MUC4 acts as a tumor promoter for lung cancer, the variant allele G of rs3096337 or C of rs842461 exerts a greater adverse effect than that of the wild-type allele among heavy smokers. Therefore, the subjects carrying the rs3096337 variant G or rs842461 variant C may have not have increased lung cancer risk under normal conditions but do have an elevated risk when the G or C allele is in strong LD with a variant allele of another gene (e.g., growth factor genes) that may induce MUC4 expression in response to heavy smoking. Nevertheless, such speculation requires further support from additional functional studies. There are three main strengths of this study. First, to the best of our knowledge, no study has evaluated MUC4 SNPs for associations with lung cancer. Because lung cancer is a multifactorial disease that likely involves multiple SNPs in genes, we assessed a broader spectrum of MUC4 variants individually as alleles and collectively as haplotypes, which may be more powerful than analyzing a single allele or locus. Second, all lung cancer diagnoses were confirmed by histologic methods, and complete questionnaire data were systematically collected. The adjusted ORs in both stratified and joint-effect analyses for different pack-year categories of smoking were similar in magnitude and direction to the point estimates obtained from fitted ORs of the interaction models. Third, the statistical powers in gene and gene-environment interaction analyses (File S2) in this study were sufficient. These consistent results suggest that our findings are not likely to be due to chance. Ultimately, an investigation of a candidate gene requires many SNPs for individual association analysis [47,48], but such testing will increase the false-positive (type I error) rate under nominal significance thresholds (e.g., a = 0.05) except when the selected SNPs are all in high LD with each other. Namely, when background LD exists between SNPs, but they are assumed to be completely independent, then the popular Bonferroni correction would overcorrect for the inflated false-positive rate, resulting in reduced study power [49]. For calculating the significance of SNPs in LD with each other, a permutation test was used to adjust for multiple tests while preserving the correlation structure among linked markers [50][51][52][53]. In this way, the false-positive rate for a large number of tests was well controlled in the present study. Despite the strengths and biologic plausibility of the associations observed in the present study, inherent biases may have resulted in spurious findings. Firstly, the lung cancer cases were enrolled from hospitals, and the controls were selected from community health stations and a health examination center, so inherent selection bias cannot be completely excluded. However, we minimized potential confounding factors by matching the controls to the cases on age, sex, and residential area (urban or rural). Secondly, the sample size of the present study may not be large enough either to detect a small effect from very low penetrance SNPs or to identify significant associations of the effect in different strata in subgroup analysis adequately. Thirdly, except for smoking status, other factors such as occupational exposure and nutritional status, which might interact with MUC4 genotypes or act as potential confounding factors, were not included in our study. Possible interactions between MUC4 genotypes and these risk factors should be thoroughly investigated in future work. Ultimately, the functional relevance of rs863582 and rs842461 are unknown and should be assessed.
In conclusion, our study provides evidence that MUC4 polymorphisms and their interactions with smoking status may contribute to lung cancer etiology in a Chinese population. Moreover, we also demonstrated that genetic susceptibility, coupled with a modifiable lifestyle factor (i.e., smoking status), and appeared to confer a significantly higher risk of lung cancer than either factor alone. These findings need to be substantiated by larger-scale studies in different ethnic populations. Figure S1. MUC4 rs863582 T>C, rs842226 C>T, rs842225 A>G, rs2550236 G>A, rs2688515 A>G, rs2641773 A>C, rs3096337 A>G, rs859769 T>G, and rs842461 A>C; genotyping by Taqman assays. (TIF)