Replication Study in Chinese Population and Meta-Analysis Supports Association of the 5p15.33 Locus with Lung Cancer

Background Common genetic polymorphisms on chromosome 5p15.33, including rs401681 in cleft lip and palate transmembrane 1-like gene (CLPTM1L), have been implicated in susceptibility to lung cancer through genome-wide association studies (GWAS); however, subsequent replication studies yielded controversial results. Methodology and Findings A hospital-based case-control study in a Chinese population was conducted to replicate the association, and then a meta-analysis combining our non-overlapping new data and previously published data was performed to clearly discern the real effect of lung cancer susceptibility. In our study with 611 cases and 1062 controls, the minor allele T carrier (TT plus CT) group conferred an OR of 0.801 (95% CI = 0.654–0.981) under the dominant model. The meta-analysis comprising 9111 cases and 11424 controls further confirmed the significant association in the dominant model (OR = 0.842, 95% CI = 0.795–0.891). By stratified analysis, we revealed that ethnicity and study design might constitute the source of between-study heterogeneity. Besides, the sensitivity and cumulative analyses indicated the high stability of the results. Conclusion The results from our case-control study and meta-analysis provide convincing evidence that rs401681 is significantly associated with lung cancer risk.


Introduction
Lung cancer is the most common cancer throughout the world. Globally, it accounts for 13% (1.6 million) of the total cases and 18% (1.4 million) of the deaths in 2008 [1]. In China, the incidence and mortality rates of lung cancer have been increasing rapidly in the recent decades [2], and lung cancer is now the number one cause of cancer mortality [3]. Environment factors such as smoking and pollution have been established to increase the risk of lung cancer. However, the etiology is still unclear, and genetic factors are strongly suggested to play an important role in lung carcinogenesis, which was estimated to contribute 26 percent to the risk through twin study [4].
Recent genome-wide association studies (GWAS) have implicated multiple novel single nucleotide polymorphisms (SNPs) to lung cancer susceptibility [5][6][7][8][9][10][11][12]. rs401681, located in the intron 13 of CLPTM1L (cleft lip and palate transmembrane 1-like gene) [13], is one of the most studied SNPs. It was firstly identified in a GWA set of 1952 cases and 1438 controls, supported by pooling data with two other GWA studies (5095 cases and 5200 controls) and with replication in additional 2484 cases and 3036 controls [5], which were all within Caucasian population. And in Asians, Kohno et al. and Miki et al. both confirmed the association between the T allele of rs401681 and decreased risk of lung adenocarcinoma [14,15]. However, the reported genetic effects varied across the following replication studies. For example, a study in Norway discerned that the rs401681 T allele wasn't associated with the risk of non-small cell lung cancer (NSCLC) in 341 cases and 431 controls of Caucasians (P trend = 0.259) [16], but another study with 1681 lung cancer cases and 1235 controls showed that the T allele was significantly associated with a reduced risk of overall lung cancer in the same ethnicity (P = 1.1610 25 ) [17]. On the other hand, within Asians, Yoon et al. replicated the association between the SNP and NSCLC (P trend = 1.89610 24 ), thereafter, neither Bae et al. [18] nor Chen et al. [19] replicated the similar positive results under the allelic model.
As above, the outcomes remain ambiguous and conflicting, which is probably due to the modest effect of this SNP, leading to the lack of power in small genetic association studies. It is also potentially owning to the so-called ''winner's curse'' phenomenon that OR of disease variant is usually overestimated in the first positive study. According to the reported OR of this study, the necessary sample size of replication studies would be underestimated, then the underpowered replication would be difficult to succeed [20]. Nevertheless, meta-analysis is an effective method combing data together to increase the sample size, getting enough power to clarify inconsistent results in genetic association studies [21]. A comprehensive meta-analysis of publications studying the associations between TERT locus polymorphisms and risk of different cancers in a time span of 2003-2011, has presented a modest risk reduction (per-allele OR = 0.87, 95% CI = 0.84 to 0.89) of rs401681 for lung cancer [22]. Two recently published meta-analyses further supported the similar findings under the allelic model [23] and additive model [24]. By contrast, we conducted a meta-analysis of four specific genetic models, which could provide more implications for the possible manners of inheritance of the SNP. Additionally, our study combined results from published studies up to 2012, including a set of unpublished data of genotype frequency requested from the original authors of a large-sample-size GWAS [12]. At the same time, we included the data of our own replication study within a Chinese population in the whole meta-analysis followed by stratified, sensitivity and cumulative analysis, providing a comprehensive and precise estimation of the association between rs401681 and lung cancer risk.

Study Population
In the present study, a total of 611 lung cancer cases and 1062 cancer-free controls were recruited from Tongji Hospital of Huazhong University of Science and Technology (HUST) between 2009 and 2011. All subjects were unrelated ethnic Han Chinese in Wuhan region. Cases were histopathologically confirmed with all lung cancer types and have not received any treatment prior to blood samples collection. Controls were selected randomly from a physical examination programs at the same hospital in the same time period as the patients were enrolled. The case patients and control subjects are adequately matched in terms of gender and age (65 years). At recruitment, 5-ml peripheral venous blood was collected from each subject after informed consent was obtained. This study was approved by ethnics committee of Tongji Hospital of Huazhong University of Science and Technology.

Genotyping
Genomic DNA was extracted from the peripheral blood sample applying the RelaxGene Blood System DP319-02 (Tiangen, Beijing, China) in accordance with the manufacturer's instructions. The rs401681 was genotyped with the TaqMan SNP Genotyping Assay (Applied Biosystems, Fostercity, CA) on a 7900HT Fast Real-Time PCR System (Applied Biosystems, Fostercity, CA). 5% duplicated samples were randomly selected to assess the reproducibility for quality control, with a concordance rate of 100%.

Statistical Analysis
The x 2 test and t test were applied to estimated differences in demographic variables and distributions of genotypes between cases and controls. Hardy-Weinberg equilibrium (HWE) was calculated using goodness-of-fit x 2 test for genotypes in controls and a value of P,0.05 was considered as significant disequilib-rium. After adjusting for age, sex and smoking status, unconditional multivariate logistic regression was employed to estimate genotypic odds ratio (OR) and its 95% confidence interval (95% CI), with the reference of the common homozygote. In order to avoid the assumption of genetic models, dominant and additive models were also analyzed. All above statistical analyses were performed with the SPSS 20.0 software.

Meta-analysis of rs401681 in Association with Lung Cancer Risk
A systematic literature searching on PubMed, EMBASE, and ISI Web of Science databases up to April 2012 was performed, using the keywords 'rs401681, CLPTM1L, TERT, or 5p15.339 combined with 'lung cancer, NSCLC' without language restrictions. Basing on the above searching strategy, we further searched in Chinese Biomedical (CBM) database. Meanwhile, references listed in the retrieved articles were scanned. Reviews and comments were also checked for additional studies. Articles which met the following criteria were included: (1) case-control or nested casecontrol study assessing the association between rs401681 and lung cancer risk; (2) providing data for calculating genotypic odds ratio (ORs) with corresponding 95% confidence interval (95% CI); (3) genotypes in controls being in Hardy-Weinberg equilibrium (P.0.05) (4) studies of humans. Whenever studies pertained to overlapping subjects, the one with larger sample size was selected to avoid duplication.
The following information from each study was extracted: first author's last name, year of publication, geographic location, ethnicity of study population, study design, genotyping method, numbers of cases and controls, frequencies of genotypes in cases and controls. Pooled frequency of the T allele in different ethnicities and histological types was estimated by the inverse variance method previously used by Thakkinstian et al [25]. The metrics for effect size of genotypes CT versus CC and TT versus CC were calculated, and a dominant model and an additive ''perallele'' model were also considered. Here we used x 2 -based Cochran's Q statistic and the I 2 metric to assess between-study heterogeneity. Heterogeneity was considered significant at P,0.10. The subsequent cut-off points of the quantity I 2 were utilized to quantify heterogeneity: I 2 = 0230%, no or marginal between-study heterogeneity; I 2 = 30%-75%, mild heterogeneity; I 2 = 752100%, notable heterogeneity [26]. When homogeneity existed basing on P for Q statistic greater than 0.1, the fixed-effects model (Mantel-Haenszel method) was adopted to compute the pooled ORs and 95% CIs [27]; otherwise, the random-effects model (DerSimonian and Laird method) was applied [28]. Then we conducted stratified analysis, if feasible (the number of studies included in each subgroup is not less than 3), according to ethnicity (Asian and Caucasian), study design (GWAS and replication studies), and histological type (NSCLC and integrated lung cancer). A sensitivity analysis was also carried out to assess the influence of each study on overall pooled OR, with sequential omission of individual study [29]. To investigate the dynamic trend of the association, cumulative analysis was performed by assortment of publication times [30]. Finally, publication bias was estimated by Egger's test [31]. All statistical analyses were implemented by Stata 11.0 software and all P values are twotailed with a significant level at 0.05 except for Q test for heterogeneity.

Results of Case-control Study
Population characteristics. A total of 611 incident cases of overall lung cancer and 1062 frequency-matched controls were enrolled in this study. As shown in Table 1, no statistically significant differences were found between cases and controls in terms of sex (P = 0.475) and age (P = 0.287) distribution. Males were 68.6% among cases compared with 70.2% among controls and the median age was 61.5 for cases and 61.0 for controls. As expected, more smokers were presented in the cases than in the controls (53.1% versus 43.1%; P,0.001), considering that cigarette smoking is the major etiological factor for lung cancer. Herein, smokers were defined as those who had smoked at least one cigarette per day for 12 months or longer at any time in their life, while non-smokers were defined as those who had not. Of the cases, 427 (69.9%) were histopathologically confirmed as NSCLC, the most common type that accounting for approximately 85% of lung cancer in general, including adenocarcinomas, squamous cell carcinomas and large cell lung carcinomas.
Association analysis. The distribution of rs401681 genotypes among subjects are displayed in Table 2, and no significant difference was observed between cases and controls in the overall test (P = 0.073). Genotypes in controls complied with Hardy-Weinberg equilibrium (P = 0.932). In the multivariate logistic regression model adjusted for age, sex and smoking status, individuals with the CT genotype had a significant decreased risk of lung cancer (OR = 0.794; 95% CI, 0.640-0.984) compared to those with the CC homozygote. A dominant model was performed to increase statistical power, by combining the TT with the CT into a T carrier (TT plus CT) group. And the result showed that the T carriers also present significantly reduced risk (OR = 0.801; 95%CI, 0.654-0.981), suggesting a dominant effect of this polymorphism on cancer in Chinese population. Likewise, significant associations were found in the additive models, with per-T-allele OR of 0.856 (95%CI = 0.734-0.998). To explore the variant's effect on NSCLC, we also compared the genotype distribution between the NSCLC cases and controls. Similar positive association and stronger genetic effect were found in this subtype of overall lung cancer ( However, when we further examined with stratifications by smoking status (smokers and non-smokers) and the median age (age#61.0 and age.61.0), no valid associations were detected under the dominant model except in the group who were under the age of 61.0. Null association might be due to the small number of subjects after stratification in the current study. The significant finding of the younger group is in line with the conception that genetic susceptibility is often associated with an early age of disease onset. But when we particularly defined the early-onset cases as who were #50.0 years of age and performed stratification analysis, we observed no significant associations (Table S1).

Results of Meta-analysis
Study characteristics. As shown in Figure S1, 13 eligible original publications including 15 studies [5,7,[9][10][11][12][14][15][16]18,19,32,33] were firstly identified and screened for retrieval, of which, 9 studies were judged to preliminarily fit the inclusion criteria. However, after further examination, one of the studies ''Wang-Texas 2008'' of the publication reported by Wang Y et al. [5] was excluded since the cases overlapped with the sample of a previous study reported by Amos CI et al [7]. Besides, we removed the study ''Wang-IARC 2008'' in our pre-analysis of data that mostly contributed to the original notable heterogeneity, which might be due to its mixed study population from several different countries [5]. Finally, 7 publications plus current study comprising 8 case-control studies of 9111 cases and 11424 controls provided data for this meta-analysis [5,7,12,16,18,19,33]. The characteristics of the included studies were presented in Table 3.
Significant between-study heterogeneity were observed in both subunits of NSCLC and lung cancer (P for heterogeneity,0.001). Under random-effects model, the pooled frequency of the T allele was 36.9% (95%CI = 26.2%-47.5%) in the NSCLC cases of our whole study, which was slightly higher than that of 31.6% (95%CI = 26.3%-36.8%) in overall lung cancer cases.
Overall meta-analyses of rs401681 in associated with lung cancer. No significant evidence of heterogeneity was seen in all genetic models except the homozygous model (P for heterogeneity,0.10), and its OR which was pooled under random-effects model are 0.778 (95%CI = 0.674-0.900). On the other hand, fixed-effects model was applied for heterozygous, dominant ( Figure 1) and additive models without evident heterogeneity (each P for heterogeneity.0.10). All of these models conferred significant decreased risk of lung cancer, with ORs of 0.864 (95%CI = 0.813-0.917), 0.842 (95%CI = 0.795-0.891), 0.871 (95%CI = 0.835-0.908), respectively ( Table 4). The meta-analysis of rs401681 revealed the similar results of significant decreased risk with lung cancer risk to our case-control study, on the same heterozygous, dominant and additive models. Stratified analyses. To explore the source of heterogeneity, stratified analysis was performed (Table 4). After stratified by ethnicity, all genetic models presented significantly decreased risk of lung cancer and showed hardly any heterogeneity (P for heterogeneity.0.10, I 2 = 0) in Caucasian. In Asian population, decreased risk was also conferred without evidence of heterogeneity in all genetic models except for the TT genotypic model (pooled OR = 0.825, 95%CI = 0.637-1.070, P for heterogeneity,0.10, I 2 = 71.8%). The mildly reduced risk from one copy of T allele to two copies in Chinese (Table 2) or Asian population (Table 4), potentially pointed out that the T variant of rs401681 acted in different manners between different ethnical populations. According to study design, statistically significant findings were seen in the GWAS without heterogeneity, but negative outcomes appeared in the homozygous model again within the subgroup of replication studies. Regarding the histological type, significant association was observed in all four genetic models of our study in either NSCLC or integrated lung cancer. It is worth noting that larger effect of the minor allele T was seen in NSCLC than that in overall lung cancer, which could support the same finding of our case-control study.
Sensitivity analysis. To evaluate the affection of individual study on the combined estimate, a one-way sensitivity analysis was performed, by dropping each particular data set at a time. As shown in Table 5, a series of pooled OR with 95% CI were not materially altered before or after each elimination of study under the dominant model, indicating that our results were robust.
Since significant between-study heterogeneity for the Asian studies was observed in the TT genotypic model, we also performed a sensitivity analysis among those studies. And we have found that the heterogeneity reduced most (I 2 = 15.6%) when the study of Yoon et al. was excluded (I 2 = 56.2%, P = 0.077), with the same null association (OR = 0.909, 95%CI = 0.705-1.172).
Cumulative meta-analysis. Cumulative meta-analysis was carried out via assortment of studies in chronologic order. As shown in Figure 2, inclinations toward significant association were obvious over time in the dominant model. Simultaneously, the 95% CIs became increasingly narrower with accumulation of  more data, suggesting the progressively improved precision of the estimates by continual enlarging sample size. Publication bias. The results of Egger's test indicated that there was no publication bias in all four genetic models (all P for Egger's test.0.05, Table 4).

The Bioinformatics Analyses of rs401681
Among three bioinformatics tools, two of them, ''F-SNP'' and ''FastSNP'' consistently forecasted that the SNP was likely to be functional as transcriptional regulation.

Discussion
In this compound study, we replicated the significant association between rs401681 and lung cancer risk in the Chinese population. Also, the following meta-analysis integrating data from the current and 7 previously published studies suggested that the SNP was significantly associated with reduced lung cancer risk under genotypic, dominant and additive models. Sensitivity analysis indicated the stability of the result and cumulative analysis further confirmed the positive findings, showing the effect of the variant got progressively significant with accumulation of more data.
These findings are biological plausible to some degree. rs401681 is located at 5p15.33, a susceptibility region for lung cancer encompassing two known genes TERT (telomerase reverse transcriptase) and CLPTM1L (cleft lip and palate transmembrane protein 1-like). Although little was known about the function of rs401681 which was situated within the intronic region of CLPTM1L, our bioinformatics analysis indicated that it might be transcription regulatory and further affect the expression of the gene. CLPTM1L, named for its homology with Cleft Lip and Palate Transmembrane Protein 1 that disrupted in a family with cleft lip and palate [34], was identified as an up-regulated transcript in a cisplatin resistant ovarian tumor cell line [13]. Over-expression of CLPTM1L mRNA was discovered in many kinds of cancer [32,35,36]. The function of CLPTM1L and its role in tumorigenesis is thus far unknown. But a recent study reported that CLPTM1L was a commonly overexpressed anti-apoptotic factor in lung cancer, suggested it a inhibitor role in genotoxic stress induced apoptosis, and therefore identified CLPTM1L as an important factor influencing survival of DNA damaged tumor cells and potentially lung cancer susceptibility [37].
CLPTM1L might be relevant not only in the light of its own biological activity but also because it is in LD with TERT. The entire gene CLPTM1L resides in a 62-kb region of high linkage disequilibrium (LD) that encompasses the 59-end of TERT, its promoter [38]. TERT is the reverse transcriptase component of telomerase [39], making it essential for production of telomerase enzyme which is responsible for telomere regeneration [40]. And regeneration of telomeres is highly suggested to be a vital step for carcinogenesis of most cancer [41]. The C allele of rs401681 was reported to be associated with the shortening of telomere length [32], which is in favor of its involvement in telomere biology and even cancer development. It is possible that the polymorphism of CLPTM1L is in linkage disequilibrium with some causal locus in the promoter of TERT which are hitherto uncharacterized.
The results of our case-control study indicated that the minor allele T of rs401681 was significantly associated with protective effect to lung cancer in heterozygous, dominant and additive genetic models, but not in the homozygous possibly because of our relative small sample size. Despite similar significant relationship maintained in our meta-analysis in all genetic models, obvious between-study heterogeneity can not be ignored in the homozygous model. A comprehensive stratified analysis was conducted to interrogate the potential source of heterogeneity. After stratified by ethnicity, all genetic models of the T variant allele were significantly associated with reduced cancer risk with disappeared heterogeneity in Caucasians, while in Asians, all genetic models conferred decreased risk without significant heterogeneity except the homozygous model. The apparent difference between the two homozygous ORs implied different allele frequencies between Asians and Caucasians, which is supported by our study of pooled T allele frequencies and HapMap data as shown above. In view of the differences in allele frequencies and possible distinction of linkage disequilibrium patterns among populations, distinct genetic mechanisms with different genetic effect sizes for TT genotype between two populations may occur. As to the heterogeneity in Asian studies, suggested by the subgroup  showed in replication studies, which could be also explained by the diverse genotyping methods of the replications. Because NSCLC accounts for the great mass of overall lung cancer cases to almost 85% as mentioned above, we pooled these two types of studies to enlarge sample size without concern to the potential bias of strengthening or weakening the real effect of the variant to a great extent. There weren't much change of the heterogeneity in the subgroup of NSCLC or lung cancer overall, hinting that histological type was not likely the source of heterogeneity. At the same time, a more pronounced risk reduction for NSCLC suggested that the rs401681 variant might have larger effect on NSCLC, supporting the same situation observed in case-control study. The variance of the genotypic effect between these two subgroups could be partly ascribed to different histological type characterized by allele frequency difference, with pooled T allele frequencies of 36.9% in NSCLC cases and of 31.6% in overall lung cancer. In the whole, ethnicity and study design probably accounted for the heterogeneity of the meta-analysis. Despite the clear strength of the current comprehensive analysis strategy (a case-control study and a meta-analysis), several limitations should be pointed out. Firstly, the sample size of our case-control study was relatively small. Secondly, lung cancer is a complex trait related to environmental and genetic risk factors.
However, insufficient environmental information limited us to further investigate the gene-environment interaction. Thirdly, lacking of functional experiments, whether this SNP is causal remained uncertain.
In conclusion, our replication study in a Chinese population and the subsequent meta-analysis collectively confirm genetic involvement of rs401681 polymorphism in lung cancer susceptibility, and suggested that the variant may yield stronger effect on NSCLC. However, fine-mapping of 5p15.33 region and functional experiments is warranted to identify causal variant. Figure S1 Follow chart of study selection.

(TIF)
Table S1 The association between rs401681 and risk of lung cancer by smoking status and age range.