Genome-Wide Association Study Identifies a Novel Susceptibility Locus at 12q23.1 for Lung Squamous Cell Carcinoma in Han Chinese

Adenocarcinoma (AC) and squamous cell carcinoma (SqCC) are two major histological subtypes of lung cancer. Genome-wide association studies (GWAS) have made considerable advances in the understanding of lung cancer susceptibility. Obvious heterogeneity has been observed between different histological subtypes of lung cancer, but genetic determinants in specific to lung SqCC have not been systematically investigated. Here, we performed the GWAS analysis specifically for lung SqCC in 833 SqCC cases and 3,094 controls followed by a two-stage replication in additional 2,223 lung SqCC cases and 6,409 controls from Chinese populations. We found that rs12296850 in SLC17A8-NR1H4 gene region at12q23.1 was significantly associated with risk of lung SqCC at genome-wide significance level [additive model: odds ratio (OR) = 0.78, 95% confidence interval (CI) = 0.72–0.84, P = 1.19×10−10]. Subjects carrying AG or GG genotype had a 26% (OR = 0.74, 95% CI = 0.67–0.81) or 32% (OR = 0.68, 95% CI = 0.56–0.83) decreased risk of lung SqCC, respectively, as compared with AA genotype. However, we did not observe significant association between rs12296850 and risk of lung AC in a total of 4,368 cases with lung AC and 9,486 controls (OR = 0.96, 95% CI = 0.90–1.02, P = 0.173). These results indicate that genetic variations on chromosome 12q23.1 may specifically contribute to lung SqCC susceptibility in Chinese population.


Introduction
Lung cancer is the most commonly diagnosed cancer and the leading cause of cancer death around the world [1]. Adenocarcinoma (AC) and squamous cell carcinoma (SqCC) are two major histological subtypes of lung cancer [2]. Although tobacco smoking increases the risk of all major histological subtypes of lung cancer, it appears to be stronger for SqCC than AC [3]. Different spectra and frequencies of ''driver'' mutations have been described between lung AC and SqCC and result in a histology-specific therapy [4]. These evidences support a histology-specific pathogenesis process and biological characteristics of lung cancer, and studies specifically focused on individual histological subtype are required for understanding lung carcinogenesis.
Several large genome-wide association studies (GWAS) of lung cancer have been conducted to uncover genetic factors associated with lung cancer risk [5][6][7][8][9][10][11][12][13][14][15] (Table S1). Three loci at 5p15, 6p21 and 15q25 were initially identified to contribute to the susceptibility to lung cancer in populations of European ancestry [5,6,[16][17][18]. These findings have provided new clues for the mechanism of lung cancer development. Interestingly, some of these loci reflected different associations across lung cancer histology. For example, the 5p15 locus defined by rs2736100 showed stronger association with AC in populations of both European [7] and Asian [19] ancestries. However, most of lung cancer GWAS combined lung cancer cases with multiple subtypes of histology together when compared with controls in the discovery stage, making it difficult to identify histology-specific susceptibility loci due to dilution of effect.
With efforts to determine genetic variants associated with a specific type of lung cancer, two GWAS of lung AC have been conducted in populations of eastern Asian. Hsiung et al. performed a GWAS of AC and subsequent replications in never-smoking females and further confirmed that rs2736100 at 5p15 is associated with risk of lung AC [20]. Recently, Miki et al. carried out a GWAS of lung AC in Japanese and Korean populations and identified a new susceptibility locus at TP63 on 3q28 [13], which have also been confirmed by following studies [11,21]. Interestingly, Landi et al. conducted a lung cancer histology-specific association study in 917 selected genes with 19,802 SNPs in the HuGE-defined ''inflammation'' pathway using available GWAS data from populations of European descent, and identified a locus at 12p13.33 associated with SqCC risk [15]. These evidences suggest the importance of exploring susceptibility loci by subtypes in lung cancer.
Recently, we conducted a three-stage GWAS for overall lung cancer in the Han Chinese populations and identified two new loci at 13q12.12 and 22q12.2 that were consistently associated with multiple subtypes of lung cancer [11]. Here, in order to identify genetic variants across whole genome specifically related to lung SqCC risk, we carried out the GWAS analysis in 833 cases with lung SqCC and 3,094 controls (Nanjing study: 428 cases and 1,977 controls; and Beijing study: 405 cases and 1,117 controls), and further evaluated suggestive associations involving lung SqCC risk by a two-stage replication with a total of 2,223 cases with lung SqCC and 6,409 controls in the Han Chinese populations.

Results
After filtering by standard quality-control procedures, a total of 3,927 subjects (833 lung SqCC cases and 3,094 controls) with 570,009 SNPs were qualified for further GWAS analysis (Table  S2). A quantile-quantile plot using P values from additive model showed a relatively low inflation factor (l = 1.04), suggesting a low possibility of false-positive associations due to population substructure ( Figure S1). After excluding the SNPs at reported loci of our previous study [11], P-value on a -log scale for each SNP was plotted by location on chromosome (i.e., Manhattan plot; Figure  S2).
To further characterize the association of genetic variants at 12q23.1 with lung SqCC risk, we performed imputation analyses based on CHB+JPT data of 1000 Genomes Project (released at June 2010). In a 300-kb region around rs12296850, 243 imputed SNPs at imputed r 2 .0.5 and MAF.0.05 were evaluated with association of lung SqCC risk. As shown in Figure 1 and Table S6, two SNPs, rs17030141 and rs11568535 having strong LD (r 2 .0.9) with rs12296850, showed similar associations with risk of lung SqCC at a P value of 6.46610 25 and 7.43610 25 , respectively.
We further conducted stratification analysis on the association between rs12296850 at 12q23.1 and lung SqCC risk by age, gender and smoking dose. As shown in Table S7, none of different

Author Summary
Previous genome-wide association studies (GWAS) strongly suggested the importance of genetic susceptibility for lung cancer. However, the studies specific to different histological subtypes of lung cancer were limited. We performed the GWAS analysis specifically for lung squamous cell carcinoma (SqCC) with 570,009 autosomal SNPs in 833 SqCC cases and 3,094 controls and replicated in additional 2,223 lung SqCC cases and 6,409 controls from Chinese populations (822 SqCC cases and 2,243 controls for the first replication stage and 1,401 SqCC cases and 4,166 controls for the second replication stage). We found a novel association at rs12296850 (SLC17A8-NR1H4) on12q23.1. However, rs12296850 didn't show significant association with risk of lung adenocacinoma (AC) in 4,368 lung AC cases and 9,486 controls. These results indicate that genetic variations on chromosome 12q23.1 may specifically contribute to lung SqCC susceptibility in Chinese population.
associations were significantly observed between subgroups. In addition, we did not detect significant interaction between rs12296850 and smoking on lung SqCC risk. Similar associations were observed among populations of Nanjing and Shanghai, Beijing, and Shenyang, and no significant heterogeneity between populations was detected for the association, though a nonsignificant association was shown in Guangzhou population ( Figure S3).
To investigate whether the variant rs12296850 was SqCCspecific, we further evaluated the association between rs12296850 and the risk of lung AC and small cell carcinoma (SCC) using the shared controls as SqCC study for each stage. We found that rs12296850 was not consistently associated with risk of lung AC in the three stages (GWAS: OR = 0.85, 95%CI = 0.76-0.95; Replication I: OR = 1.08, 95%CI = 0.96-1.22; Replication II: OR = 0.96, 95%CI = 0.88-1.05) ( Table 2). After combining three stages, rs12296850 was not significantly associated with lung AC risk (OR = 0.96, 95%CI = 0.90-1.02, P = 0.173). Similarly, rs12296850 was not consistently associated with lung SCC risk with a combined OR of 0.89 (95%CI = 0.79-1.01; P = 0.073) ( Table 2). These results indicate that rs12296850 at 12q23.1 may be a specific susceptibility locus to lung SqCC in Chinese population.
To characterize the functional relevance of the rs12296850, we further evaluated the relationship of this variant with the expression levels of two surrounding genes (NRIH4 and SLC17A8).

Discussion
In this study, we conducted a GWAS analysis in specific to lung SqCC in Chinese populations and identified a novel locus at 12q23.1 (lead SNP: rs12296850) that was specifically associated with lung SqCC. In our prior GWAS on overall lung cancer, we also showed genome-wide significant associations of loci at 3q28, 5p15.33, 13q12.12, and 22q12.2 with lung SqCC in stratification analysis [11]. Unlike previous study designed for overall lung cancer followed by a 'post-hoc' analysis on lung SqCC, the current study directly evaluated genetic variants across genome that might be specifically associated with lung SqCC risk. The identified locus was further assessed whether it was also associated with lung AC or SCC risk. This study represents an improved approach on exploring subtype-specific susceptibility loci for diseases with heterogeneous phenotypes, such as lung cancer.
We also evaluated the association of the SNP rs12296850 with SqCC risk in lung cancer GWAS data of European descent from MD Anderson Cancer Center (MDACC) [16]. After imputation based on HapMap 2 CEU population, rs12296850 was not significantly associated with SqCC risk (OR = 0.80, 95%CI: 0.52-1.24; P = 0.325) in 306 SqCC cases and 1,135 controls from the MDACC GWAS. The inconsistent results may be due to small sample size of MDACC study and different genetic backgrounds between Chinese and European descents. The minor allele (G) frequency of rs12296850 in Chinese population (.0.20 for all three stages) is more common than that in MDACC (0.053). The relative small sample size and low frequency may result in a negative result due to limited statistical power. In addition, the subjects of MDACC GWAS were all smokers, which may not represent the similar target population used in our study. However, at this stage, we have no substantial evidence to extend our findings to other populations, and further studies in other populations are required to further confirm our findings.
Genomic alterations on chromsome12q23 have been frequently linked to a spectrum of cancers, including non-small cell lung cancer (NSCLC), prostate cancer, adenoid cystic carcinoma and oligodendrogliomas and colorectal carcinoma [22][23][24][25][26]. For NSCLC, cigarette smoking dose has been associated with copy number alterations in12q23 [26]. In addition, chromosomal gains at 12q23-24.3 facilitated tumour progression and metastasis of lung SqCC and may serve as potential predictors for this disease [27]. These evidences as well as our findings collectively suggested the importance of chromosome 12q23 in the development of lung cancer, especially for SqCC.
At 12q23.1, the lead SNP rs12296850 is located in 4.2 kb downstream of SLC17A8 (encoding vesicular glutamate transporter 3) and 47.6 kb upstream of NR1H4 (encoding a ligand-activated transcription factor). Correlation analysis results indicate that this SNP may be associated with the expression of NR1H4, a gene known as nuclear farnesoid X receptor (FXR). FXR is a member of the nuclear receptor family of transcription factors and highly expressed in the entero-hepatic system where it transcriptionally regulates bile acid and lipid metabolism [28]. Bile acids are natural ligands for the FXR, and the bile acid-FXR interaction has been suggested to be involved in the pathophysiology of a number of inflammatory-associated cancers [29,30]. Loss of FXR increased tumor progression via promoting Wnt signaling by infiltrating neutrophils and macrophages, and elevated the tumor necrosis factor a (TNFa) production in vivo [30]. Furthermore, FXR was involved in CYP regulation through mutual repression with NF-kappaB which indirectly regulates the transcription of CYP genes [31]. Further studies are required to elucidate the potential role of NR1H4 on SqCC development.
SLC17A8 (also known as Vesicular Glutamate Transporter Type 3, VGLUT3) is a member of the solute carrier (SLC) superfamily encoding multiple transmembrane transporters that may involve in the development and progression of a number of diseases, including cancers [32]. Genetic variants in the urea transporter (UT) gene SLC14A were reported to be significantly associated with susceptibility to urinary bladder cancer in a GWAS of European population, whereas SLC5A8 may function as a tumor suppressor gene whose silencing by epigenetic changes may contribute to carcinogenesis and progression of pancreatic cancer  [33,34]. However, the expression levels of SLC17A8 were very low in lung cancer tumor and adjacent non-tumor tissues. Whether this gene involves in SqCC development is still unclear to date. In addition, SCYL2 and GAS2L3 were another two genes around the SNP rs12296850 in a relatively long distance. SCYL2 (also known as CVAK104) is located at 86.2 kb upstream of rs12296850, encoding a coated vesicle-associated kinase of 104 kDa. SCYL2 can regulate the levels of frizzled 5 (Fzd5) via inducing lysosomal degradation, which probably inhibit the Wnt signaling pathway [35]. GAS2L3, encoding proteins with putative actins and microtubule binding domains, is located at 147.4 kb downstream of rs12296850. GAS2L3 was reported to localize to the spindle midzone and the midbody during anaphase and cytokinesis, respectively, and to act as a novel target of DREAM and play an important role in accurate cell division [36]. However, expression quantitative trait loci (eQTL) analysis did not reveal any significant correlation between rs12296850 and the expressions of these two genes.
In this GWAS of lung SqCC in Chinese, we reported evidence that common genetic variants at 12q23.1 are implicated in the development of lung SqCC. Our findings highlight the importance of studying subtype of lung cancer and may provide new insight into the mechanism of SqCC. Further studies, such as resequencing this region followed by fine-mapping study and eQTL analysis in lung tissues as well as biochemical assays, may affiliate to determine causal variants at 12q23.1 that directly regulate the development of lung SqCC. In addition, the moderate sample size in GWAS scan stage may have decreased statistical power in the current study, and further studies with larger sample size or pooling multiple studies may promise to identify more SqCCspecific loci.

Study populations
A three-stage case-control study was designed to evaluate the associations between genetic variants across human genome and the risk of lung SqCC. Study subjects for GWAS scan of lung cancer and two-stage replication have been described elsewhere [11]. Briefly, the cases newly diagnosed with lung cancer were recruited from hospitals. The histology for each case was histopathologically or cytologically confirmed by at least two local pathologists. Cancer-free control subjects were recruited in local hospitals for individuals receiving routine physical examinations or in the communities for those participating screening of noncommunicable diseases. The controls were frequency-matched to lung cancer cases for age, gender and geographic regions. Demographic information was collected using standard questionnaire through interviews. Individuals were defined as smokers if they had smoked at an average of one cigarette or more per day and for at least one year in their lifetime; otherwise, subjects were considered as nonsmokers. Smokers were considered as former smokers who quit for at least one year before recruitment. Both current and former smokers were divided into light and heavy smokers according to the threshold of 25 pack-year (median value among the controls).
The patients with lung SqCC and all of the controls that were included in previous GWAS of overall lung cancer [11] were considered as the cases and controls in the current study. As a result, 833 SqCC cases and 3,094 controls were included in the GWAS scan stage, including 428 cases and 1977 controls from Nanjing and Shanghai (Nanjing Study), and 405 cases and 1,117 controls from Beijing (Beijing Study). The first replication stage (Replication I) included 822 SqCC cases and 2,243 controls that were from Nanjing and Shanghai (235 cases and 754 controls) and Beijing (587 cases and 1,489 controls). The second replication stage (Replication II) included 1,401 SqCC cases and 4,166 controls that were from Nanjing and Shanghai (238 cases and 1,069 controls), Beijing (362 cases and 936 controls), Shenyang (306 cases and 1,027 controls) and Guangzhou (495 cases and 1,134 controls).

Ethics statement
All study subjects provided informed consent and each study was approved by its respective institution's IRB.

Quality control (QC) in GWAS
A total of 906,703 SNPs were genotyped in the GWAS scan in 844 lung SqCC cases and 3,160 controls by using Affymetrix Genome-Wide Human SNP Array 6.0 chips as described previously [11]. A systematic quality control (QC) procedure was applied to both SNPs and samples before association analysis. SNPs were excluded if they (i) did not map on autosomal chromosomes; (ii) had a call rate ,95%; (iii) had a minor allele frequency (MAF) ,0.05; or (iv) deviated from Hardy-Weinberg equilibrium (P,1610 25 in all GWAS samples or P,1610 24 in either of the Nanjing Study or the Beijing Study samples). We removed samples with low genotype call rates ,0.95 (3 subjects) and ambiguous gender (4 subjects). Unexpected duplicates or probable relatives (52 subjects) identified by pairwise identity-bystate comparisons were also excluded according to their PI_HAT value in PLINK (all PI_HAT.0.25). Heterozygosity rates were calculated, and samples were excluded if they were more than 6 s.d. away from the mean (12 subjects were excluded). We detected population outliers using a method based on principle component analysis and 6 subjects were removed. As a result, 833 lung SqCC cases and 3,094 controls with 570,009 SNPs remained after QC.

SNP selection and genotyping in the replication study
After genome-wide association analyses, we selected SNPs for the first stage replication based on the following criteria: (i) SNPs had P#1.0610 24 for all GWAS samples; (ii) they showed consistent associations between the Nanjing study and the Beijing study at P#1.0610 22 ; (iii) they are not located in the same chromosome regions or genes of SNPs reported in previous GWAS; (iv) they had clear genotyping clusters; (v) only the SNP with the lowest P value was selected when multiple SNPs were observed in a strong linkage disequilibrium (LD) (r 2 $0.8). As s results, a total of 23 SNPs satisfied the criteria (i), (ii), (iii) and (iv), and 14 SNPs survived according to criterion (v). Therefore, we genotyped these 14 SNPs in the first replication stage (Table S3) and the other 9 SNPs that were in strong LD with 14 selected SNPs were excluded from further analysis (Table S4). The SNPs showed significant associations with lung SqCC risk with P,0.05 in the first stage replication were selected for the second replication stage.
Genotyping were performed by using the TaqMan OpenArray Genotyping Platform (Applied Biosystems, Inc.) and the iPLEX Sequenom MassARRAY platform (Sequenom, Inc) for SNPs selected in the first replication stage, and TaqMan allelic discrimination Assay (Applied Biosystems, Inc.) for SNPs selected in the second replication stage. A series of methods was used to control the quality of genotyping: (i) case and control samples were mixed on each plate and genotyped without knowing the case or control status; (ii) two water controls in each plate were used as blank controls; (iii) five percent of the samples were randomly selected to repeat the genotyping, as blind duplicates, and the reproducibility was 100%; (iv) 1,347 samples were randomly selected and detected using both TaqMan Openarray platform and TaqMan assay for rs12296850, yielding a concordance rate of 99.97%.

Statistical analysis
The statistical analysis methodology of our lung cancer GWAS was described previously [11]. In brief, genome-wide association analysis was performed using logistic regression analysis in additive model as implemented in PLINK 1.07 (see URLs). EIGEN-STRAT 3.0 was used for the principal component analysis of population structure. Minimac software (see URLs) was used to impute untyped SNPs using the CHB+JPT data from the hg18/ 1000 Genomes database (released at June 2010) as reference set. Regional plot was generated using the LocusZoom 1.1(see URLs). R software (version 2.11.1; The R Foundation for Statistical Computing) was also used for statistical analysis and generating plots, including Q-Q plot and Manhattan plot.

Tissue samples
To determine the expression levels of NRIH4 and SLC17A8, we collected 46 paired lung cancer tissues from the patients who had undergone resection between June 2009 and April 2010 from the Nantong Cancer Hospital. All cases were histopathologically diagnosed lung cancer without radiotherapy or chemotherapy before surgical operation.

Quantitative reverse transcription polymerase chain reaction
Quantitative reverse transcription polymerase chain reaction (qRT-PCR) was performed to determine the mRNA expressions of NRIH4 and SLC17A8. RNAs from lung cancer tumor and adjacent non-tumor tissues were isolated with the Trizol reagent (Invitrogen). We used TaqMan gene expression probes (Applied Biosystems Inc.) to perform qRT-PCR assay. All real-time PCR reactions, including no-template controls and real-time minus controls, were run by using the ABI7900 Real-Time PCR System (Applied Biosystems Inc.) and performed in triplicate. b-actin gene was used to normalize the expression levels. A relative expression was calculated using the equation 2 2DCt (Ct, Cycle Threshold), in which DCt = Ct gene 2Ct b-actin .