Genome-wide association meta-analysis identifies pleiotropic risk loci for aerodigestive squamous cell cancers

Squamous cell carcinomas (SqCC) of the aerodigestive tract have similar etiological risk factors. Although genetic risk variants for individual cancers have been identified, an agnostic, genome-wide search for shared genetic susceptibility has not been performed. To identify novel and pleotropic SqCC risk variants, we performed a meta-analysis of GWAS data on lung SqCC (LuSqCC), oro/pharyngeal SqCC (OSqCC), laryngeal SqCC (LaSqCC) and esophageal SqCC (ESqCC) cancers, totaling 13,887 cases and 61,961 controls of European ancestry. We identified one novel genome-wide significant (Pmeta<5x10-8) aerodigestive SqCC susceptibility loci in the 2q33.1 region (rs56321285, TMEM273). Additionally, three previously unknown loci reached suggestive significance (Pmeta<5x10-7): 1q32.1 (rs12133735, near MDM4), 5q31.2 (rs13181561, TMEM173) and 19p13.11 (rs61494113, ABHD8). Multiple previously identified loci for aerodigestive SqCC also showed evidence of pleiotropy in at least another SqCC site, these include: 4q23 (ADH1B), 6p21.33 (STK19), 6p21.32 (HLA-DQB1), 9p21.33 (CDKN2B-AS1) and 13q13.1(BRCA2). Gene-based association and gene set enrichment identified a set of 48 SqCC-related genes rel to DNA damage and epigenetic regulation pathways. Our study highlights the importance of cross-cancer analyses to identify pleiotropic risk loci of histology-related cancers arising at distinct anatomical sites.


Introduction
The squamous cell carcinomas (SqCC) of the aerodigestive tract [1], lung squamous cell carcinoma (LuSqCC) and head and neck cancers (HNC, >90% SqCCs) including; oral/pharyngeal SqCC (OSqCC), larynx SqCC (LaSqCC), and esophageal SqCC (ESqCC); are strongly associated with common risk factors such as tobacco smoking, alcohol consumption and human papilloma virus (HPV) infection [2]. Similarly, recent molecular characterization studies across anatomically distinct SqCCs have shown that histology is more important than tissue of origin in defining tumor molecular profiles determined by shared features including somatic mutations, copy number alternations, deregulation of DNA methylation and/or gene expression [2][3][4].
Along with behavioral risk factors, it is increasingly recognized that inherited factors also play a role in aerodigestive SqCC risk. Previous genome-wide association studies (GWAS) have identified multiple genetic risk variants for individual aerodigestive SqCC types; notably variants in smoking-related genes at 15q25.1 for LuSqCC [5][6][7] and 4q23 variants in alcoholrelated genes for upper aerodigestive tract (UADT) cancers [8]. Importantly, candidate-gene and GWAS studies have previously described rare genetic variants linked to aerodigestive SqCC risk; including variants near BRCA2 (13q13.1), first identified as a risk factor for ESqCC in Middle Eastern populations [9] and later described to increase risk of LuSqCC [10] and UADT SqCC in Europeans [11]. Similarly, at 22q12.1 another rare missense variant within CHEK2 (rs17879961, p.Ile157Thr) has been linked to reduced risk of lung and UADT SqCCs [10][11][12][13]. Such studies provide evidence of genetic pleiotropy across aerodigestive SqCCs, as these variants exert cross-cancer effects possibly related to similar underlying mechanisms (i.e. DNA repair). Furthermore, a recent large-scale genome-wide genetic correlation analysis across six solid tumors (breast, colorectal, head/neck, lung, ovary and prostate cancer), highlighted that the strongest genetic correlation was between lung and head and neck cancers [14].
Collectively aerodigestive SqCC are an important public health issue; not only because as a group are amongst the most common type of solid tumors [2], but also due to the increasing global incidence of HPV-related head and neck SqCCs [15]. Identifying genetic risk loci that can have pleiotropic effects across aerodigestive SqCC sites is important for gaining insight into shared or divergent molecular basis of different tumors. To further examine pleotropic risk genomic regions across LuSqCC [16], OSqCC [17], LaSqCC [8] and ESqCC [8] and to 15)

Overview
We performed GWAS meta-analysis on aerodigestive SqCC risk including 13,887 cancer cases and 61,961 non-overlapping controls of European ancestry. The SqCC cases comprised 7,426 LuSqCC, 5,452 OSqCC, part of the OncoArray Consortium [16][17][18], and additional 693 LaSqCC and 316 ESqCC previously included in a upper aerodigestive cancer GWAS [8] ( Table 1). Summary associations statistics were used to perform fixed-effects (F-E) and a subset-based meta-analyses using the ASSET software [19]. This approach allows exploration of all possible subsets of studies to identify the strongest association signal, while accounting for subset search multiple testing, and adjusting standard errors to account for overlapping controls between analyses; partial overlap (N = 2,500) between LuSqCC and OSqCC and complete overlap between the ESqCC and LaSqCC. After quality control steps, 8,468, 885 genetic variants with summary statistics in at least three of the four interrogated SqCC types were used for analyses. The quantile-quantile plot (F-E meta-analysis, S1 Fig) shows little evidence of genomic inflation after correcting for sample size (λ = 1.006). Loci that reached P meta < 5x10 -7 were considered noteworthy; meta-analysis results for all SNPs below P<5x10 -5 are shown in S1 Table. From noteworthy loci, those not previously reported in the lung [16] or oral/pharyngeal [17] analyses (single-cancer P > 5x10 -7 ) were considered as novel SqCC regions. We identified one novel aerodigestive SqCC loci at genome-wide significance (F-E meta-analysis, P meta < 5x10 -8 ) within 2q33.1. We detected suggestive associations with SqCC risk at 1q32.1, 5q31.2 and 19p13.11, not detected in previous analyses (Fig 1 and Table 2). Other loci that reached P meta <5x10 -7 were considered pleiotropic if these had at least two cancer sites at P<5x10 -4 and the same effect direction in all tumor sites. Using these criteria, the loci categorized as pleiotropic (4q23, 6p21.32, 6p21.33, 6p22, 9p21.3 and 13q13.1) included 108 SNPs (S2 Table), the lead SNP (lowest P meta ) for each of these regions is shown in Table 3. In contrast, other known cancer regions that reached the GWAS threshold (12p13. 33, 15q25, 19q13.2) or P meta < 5x10 -7 (4p14, 9q34.1, 10q24.31, 11q21, 15q15.3) in the SqCC meta-analysis were not pleiotropic (Fig 1). We did not observe additional associations reaching the GWAS threshold or suggestive significance in the ASSET subset-based meta-analysis, indicating that at least for the strongest associations, the effects have consistent direction across the examined aerodigestive SqCC types. For noteworthy SNPs, we performed expression quantitative trait (eQTL) analyses with normal lung tissues from the multicenter Lung Microarray Study (S3 Table). We also query these variants in multiple public genomic annotation databases. The Genotype-Tissue Expression (GTEx) for lung and esophageal eQTLs (S4 Table). ROADMAP and the Encyclopedia of DNA Elements (ENCODE) for epi/genomic annotations (S5 Table). The NHGRI-EBI GWAS Catalog (S6 Table) for disease/phenotype associations and the COSMIC catalogue for cancer somatic mutation information (S7 Table). Lastly, we performed a genome-wide genebased association analysis (GWGAS) of the SqCC meta-analyses results using MAGMA (Multi-marker Analysis of GenoMic Annotation) [20] (S8 Table). To map individual SNPs to genes we used the Functional Mapping and Annotation (FUMA, S9 Table) [21]. Overlapping genes from these were used to assemble a list of aerodigestive SqCC genes (S10 Table) used in enrichment analyses (S11 Table). The funders did not participate in study design, data collection and analysis, decision to publish or preparation of the manuscript.

PLOS GENETICS
rs10931936 is in LD with nearby CASP8-ALS2CR12 variants which have been previously linked with risk of multiple cancers in Europeans [22], as well as esophageal and lung cancer in Chinese populations [23,24]. CASP8 plays an important role in apoptosis; mutations in this gene have been described in 2% of LuSqCC and 6% of UADT SqCC tumors (S7 Table) [25]. However, the 2q33.1 genome-wide significant SNP (rs56321285) associated with aerodigestive SqCC risk, seems independent from CASP8-ALS2CR12 variants. In eQTL analyses using lung tissues Lung Microarray Study (S3 Table), rs56321285 is a nominally significant cis-eQTL for AOX2P (Laval and Groningen datasets) and CDK15 (Laval and UBC datasets). However, rs56321285 is not a lung or esophageal eQTL in the GTEx catalog [26]. Regulatory annotations from ENCODE [27] and ROADMAP [28] are consistent with rs56321285 mapping to a H3K4me1 enhancer in lung fibroblasts (S5 Table). The lead SNP (rs12133735) at 1q32.1; the G allele was associated with increased risk of aerodigestive SqCC (OR meta = 1.08, P meta = 2.16x10 -7 , Table 2 and Fig 2B), predominantly driven by the LuSqCC (OR = 1.07, P = 4.63x10 -4 ) and OSqCC (OR = 1.14, P = 9.82x10 -6 ) results. rs12133735 is located 3' of MDM4 (S3 Fig) and is a MDM4 eQTL in all datasets from the lung eQTL study (S3 Table and Table). MDM4 is a crucial negative regulator of p53 and its upregulation has been described as a common p53 inactivation mechanism in tumors [29,30]. In contrast, in our analyses rs12133735-G is associated with lower MDM4 expression in lung tissues and increased SqCC risk. However, the regulation of MDM4 expression and interaction with p53 involves complex mechanisms (including alternative splicing) and are reported to differ between normal and cancer tissues [29]. In Europeans, rs12133735[G] is in moderate LD (r 2 = 0.61, 1KG) with rs4245739 [C] (SqCC OR meta = 1.07, P = 7.62x10 -6 ) which has been associated with increased risk of triple
Expectedly, two other previously reported rare variants at 4q23 (rs1229984, ADH1B) and 13q13.1 (rs11571815, BRCA2) also displayed pleiotropy in this analyses [10][11][12] [44] (Fig 1,  Table 3). Of note, we did not observe pleiotropy for variants at 15q25.1, a known locus related to lung cancer [5] and smoking behavior [45]. The lead SNP in this region rs55781567 (CHRNA5) was prominent for SqCC (OR meta = 1.19; P meta = 1.74x10 -29 ), but this result was primarily driven by the LuSqCC (OR = 1.3; P = 4.6x10 -41 ) with no effect in any other SqCC site (OSqCC P = 0.26; LaSqCC P = 0.68 and ESqCC P = 0.7, S1 Table and S9 Fig). Variants at 15q25 have been interrogated before in relation to upper aerodigestive cancer risk (including samples used in this study) [46]; significant associations were found only in women and unrelated to smoking behavior suggesting that 15q25.1 SNPs relate differently to LuSqCC compared to oral/oropharyngeal SqCCs. Importantly, none of the published HNC GWAS in Chinese or Europeans have reported associations with 15q25 variants. Thus, while the relation between this locus with smoking behavior and lung cancer is unequivocal; to date there is no evidence of a clear link between 15q25 and head and neck cancer. Future studies including more cases and stratified analyses should examine this further.

Aerodigestive SqCCs risk genes and pathways
To gain further functional insight into aerodigestive SqCC genetic susceptibility, we used the results of the F-E meta-analyses to map risk variants to genes (FUMA) and to perform a genome-wide gene-based association analysis (GWGAS) using MAGMA. The SNP to gene analyses highlighted 182 genes within 21 genomic regions (S8 Table) and the gene-based analysis identified 51 significant genes related to aerodigestive SqCC (Bonferroni correction P<2.67x10-6, S9 Table). Next, we overlapped the results from the two analyses and obtained a list of 48 SqCC-related genes (S10 Table), which includes TMEM237 (2q33.1), MDM4 (1q32.1), AC138517.1 (5q31.2) and BABAM1 (19p13.11) located within the pleotropic SqCC risk regions identified in the meta-analyses. Expectedly, the HLA region in chromosome 6 had the highest number of genes mapped (24), however and interestingly the most prominent signals were located within 6p22 and included >10 histone genes. Gene-set enrichment analyses using these SqCC-related genes and canonical pathways resulted in 63 significant gene sets (S11 Table) most of which mapped to DNA damage pathways (telomere, checkpoint, oxidative stress and strand break response) as well as epigenetic regulation pathways related to histones and DNA methylation.

Discussion
This study identified one novel genome-wide significant loci associated with aerodigestive SqCC risk (2q33.1) . Four other loci (1q32.1, 5q31.2 and 19p13.11) showed suggestive associations with SqCC. Amongst known SqCC loci, four showed evidence of pleiotropy across cancer sites. Our results demonstrate the power of cross-cancer analyses of histologically-related tumors to identify genetic risk loci.
It is notable that many of the detected associations are plausibly related to cancer risk. Our results from 2q33.1 and 5q31.2 combined with previous evidence [23,24,38] indicate that these loci relate to SqCC risk across distinct ancestries (Asians and Europeans). Moreover, the signal at 2q33.1 was also proximal to CASP8-ALS2CR12, a region previously associated with other cancers [47,48]. Likewise, the 1q32.1 and 19p13.11 genomic regions implicate genes like MDM4 and BABAM1, which have been previously associated with risk of other epithelial malignancies including breast, prostate and ovary [31,36,39,49]. Lastly, these observations not only make our findings more plausible but also expand our understanding of cross-cancer genetic susceptibility and complex biology behind these associations.
The top associations displayed homogenous effect direction across SqCC sites and stronger associations in the F-E meta-analyses compared to the subset-based meta-analyses. This could relate to the shared histology and risk factors of aerodigestive SqCCs. Nonetheless, we cannot rule out heterogeneous SqCC associations that we were not able to detect in our data. However, and not surprisingly, for these loci effect sizes were small (range of OR meta 0.89-1.09) which limited association detection in the smaller single cancer-analysis using commonly applied GWAS P-values thresholds. In contrast, most of the known loci that exhibited pleiotropy in our analysis have larger effects sizes, particularly true for the less common variants within BRCA2 and CHEK2.
Our study has several major strengths. Firstly, we leveraged available European GWAS data sets to perform a large-scale meta-analysis of aerodigestive SqCC risk. Secondly, we analyzed tissue-specific gene expression data from multiple studies and integrated these data with publicly available information on epigenetic regulatory profiles of relevant tissues to aerodigestive SqCCs. Thirdly, for the newly discovered loci we also integrated our results with existent data from genetic susceptibility studies in other populations as well as available tumor repository information. However, this study has a number of limitations. The sample sizes for laryngeal and esophageal SqCC were very limited; this impacted our power to identify more signals at the GWAS threshold. The described associations (particularly those at P > 5x10 -8 ) could be spurious due to the high testing burden and lack of replication; other studies should examine these regions further to replicate these results. Our criteria to identify pleiotropic loci tried to capture robust loci across multiple aerodigestive SqCC while accommodating for the sample size imbalances across tumor sites. We recognize that this approach did not fully account for multiple testing and could have missed some pleiotropic regions. Pleiotropic studies are limited by sample size of existent GWAS data, as well as the frequency of variants in these regions. Future studies should investigate this further using lager samples, different methodology, and if possible, including SqCCs from other sites (e.g. cervical, anal and bladder). Also, our analyses were restricted to individuals of European ancestry, performing a similar analysis including other genetic backgrounds offers the potential to pinpoint loci that exert effects across ethnicities. In summary, we provide evidence for one new locus (2q33.1) influencing aerodigestive SqCC risk, and highlight loci for future investigation. Future work should investigate the biological mechanisms underscoring these associations to unearth shared and divergent molecular features of these histologically similar tumors.

Ethics statement
Informed written consent was obtained from all participants, and all contributing studies have been approved by the IARC Institutional Review Board (IRB; references: 14-03, 13-17, 07-02) which requires to obtain local ethics committees approvals prior to their enrolment and evaluation.

Study population
This meta-analysis includes data from three previous studies of lung squamous cell [16], oral/ pharyngeal [17] and upper aerodigestive tract (UADT) cancers [8], totaling 13,887 cases and 61,961 non-overlapping controls. The characteristics and references for each study are summarized in Table 1. The SqCC cases comprise 7,426 LuSqCC, 5,452 OSqCC, part of the OncoArray Consortium (http://epi.grants.cancer.gov/oncoarray/) [16,17], and additional 693 LaSqCC and 316 ESqCC previously included in a upper aerodigestive cancer GWAS [8]. Controls partially overlapped (N = 2,500) between the LuSqCC and OSqCC analyses, and completely overlap (N = 2,847) between the ESqCC and LaSqCC analyses. For this analysis, GWAS summary-statistics for single-site SqCCs were derived using only individuals of European ancestry across multiple epidemiological studies from Europe, North and South America.

Genotyping and imputation
For each of the studies, genomic DNA samples were previously isolated from blood or buccal cells. Genotyping for the lung and oral/pharyngeal cancers OncoArray Consortium [18] studies, was performed at the Center for Inherited Disease Research (CIDR) using the Illumina OncoArray custom designed for cancer studies. Genotype calls were made in GenomeStudio software (Illumina) using a standardized cluster file for OncoArray studies. The esophageal and larynx cancer cases and controls from the upper aerodigestive tract GWA study [8] were genotyped using the Illumina Sentrix HumanHap300 BeadChip at the Centre d'Etude du Polymorphisme Humain (CEPH) and the Centre National Genotypage (CNG Evry France) as previously described [8]. Genotype data have been deposited dbGaP (https://www.ncbi.nlm.nih. gov/gap/) accession number phs001202.v1.p1 for the oral and pharyngeal study [17] and for the lung data [16] accession numbers phs001273.v3.p2 and phs000876.v2.p1. The lung cancer GWA study [16] was imputed using the 1000 genomes reference panel (phase3) (http:// phase3browser.1000genomes.org/index.html/) and the oral/pharyngeal cancer, larynx and esophageal cancer GWAS were imputed using the Haplotype Reference Consortium Panel [50] (http://www.haplotype-reference-consortium.org/) using the University of Michigan Imputation Server [51] (https://imputationserver.sph.umich.edu/). Only variants with imputation quality of R2 > 0.3 were used in the meta-analysis.

Summary association statistics and meta-analyses
Cancer risk association results from two previous OncoArray Consortia studies (LuSqCC [16] and OSqCC [17]) and the esophageal and laryngeal analyses ORs, P-values and standard errors for each SNP for each cancer site were obtained using logistic regression with a log additive models adjusted for age, sex and principal components using plink2 [52] (https://www.coggenomics.org/plink2/) and R [53] (http://www.r-project.org/). Summary statistics for the lung SqCC data are deposited in dbGaP (phs001273.v3.p2). The oral and pharyngeal GWAS summary statistics by cancer site and world region have been deposited in the IEU Open GWAS platform (https://gwas.mrcieu.ac.uk/) under the GWAs IDs: ieu-b-89, ieu-b-90, ieu-b-94, ieub-96, ieu-b-93, ieu-b-97, ieu-b-91, ieu-b-95 and 98. Meta-analyses were performed using a fixed-effects (F-E) and subset-based meta-analysis using the ASSET software tool [19] (https:// dceg.cancer.gov/tools/analysis/asset/). ASSET allows exploration of all possible subsets of studies to identify the strongest association signal, while accounting for subset search multiple testing, and adjust standard errors to account for overlapping controls between analyses; partial overlap (N = 2,500) between the LuSqCC and OSqCC and complete overlap between the ESqCC and LaSqCC analyses. Meta-analysis for a SNP was performed when at least three cancer sites had association results. P-values from both analyses were two-sided. Meta-analyses results for fixed-effects and subset-based were considered noteworthy if these reach P<5x10 -7 . Loci were considered as new if these had not been previously reported in the single SqCC cancers analysis (P>5x10 -7 for any single site). Loci with previously reported LuSqCC or OSqCC were characterized a pleiotropic if: 1) P meta <5x10 -7 ; 2) two single cancer association results at P<5x10 -4 and consistent effect direction across all cancer sites. All analyses were performed using the R statistical environment version 3.4.3 [53]. Linkage disequilibrium (LD) calculations (R 2 ) were performed using the LDlink [54] tool and the 1000 Genomes Project European ancestry populations. Regional association plots were generated using stand-alone LocusZoom v1.4 [55] (https://github.com/statgen/locuszoom-standalone/).). Forest plots of association results were produce using the metafor R package [56].

Lung and esophageal cis-eQTLs
To investigate the association between lead SCC variants and mRNA expression, we used three lung eQTL data sets from the Microarray eQTL study. In the Microarray eQTL study [57], lung tissues for eQTL analysis were obtained from patients who underwent lung surgery at three academic sites: Laval University, the University of British Columbia (UBC) and the University of Groningen. Whole-genome gene expression profiling in the lung was performed on a custom Affymetrix array and is available through GEO (https://www.ncbi.nlm.nih.gov/ geo/) accession number GSE23546. Genotyping was carried out on the Illumina Human 1M-Duo BeadChip array, data is accessible in dbGaP (phs001745.v1.p1). Genotypes and gene expression levels were available for 408 (Laval University), 342 (Groningen) and 287 (UBC) patients. Microarray and genotypes preprocessing, quality control and eQTL mapping have been described previously [58]. We also investigated top aerodigestive SqCC associations in the public GTEx catalog (V8) [26] for lung and esophageal tissue eQTLs and sQTLs, summary statistics based on RNAseq and genotypes analyses obtained via the GTEx data portal (https:// www.gtexportal.org).