A Genome Wide Study of Copy Number Variation Associated with Nasopharyngeal Carcinoma in Malaysian Chinese Identifies CNVs at 11q14.3 and 6p21.3 as Candidate Loci

Background Nasopharyngeal carcinoma (NPC) is a neoplasm of the epithelial lining of the nasopharynx. Despite various reports linking genomic variants to NPC predisposition, very few reports were done on copy number variations (CNV). CNV is an inherent structural variation that has been found to be involved in cancer predisposition. Methods A discovery cohort of Malaysian Chinese descent (NPC patients, n = 140; Healthy controls, n = 256) were genotyped using Illumina® HumanOmniExpress BeadChip. PennCNV and cnvPartition calling algorithms were applied for CNV calling. Taqman CNV assays and digital PCR were used to validate CNV calls and replicate candidate copy number variant region (CNVR) associations in a follow-up Malaysian Chinese (NPC cases, n = 465; and Healthy controls, n = 677) and Malay cohort (NPC cases, n = 114; Healthy controls, n = 124). Results Six putative CNVRs overlapping GRM5, MICA/HCP5/HCG26, LILRB3/LILRA6, DPY19L2, RNase3/RNase2 and GOLPH3 genes were jointly identified by PennCNV and cnvPartition. CNVs overlapping GRM5 and MICA/HCP5/HCG26 were subjected to further validation by Taqman CNV assays and digital PCR. Combined analysis in Malaysian Chinese cohort revealed a strong association at CNVR on chromosome 11q14.3 (Pcombined = 1.54x10-5; odds ratio (OR) = 7.27; 95% CI = 2.96–17.88) overlapping GRM5 and a suggestive association at CNVR on chromosome 6p21.3 (Pcombined = 1.29x10-3; OR = 4.21; 95% CI = 1.75–10.11) overlapping MICA/HCP5/HCG26 genes. Conclusion Our results demonstrated the association of CNVs towards NPC susceptibility, implicating a possible role of CNVs in NPC development.


Introduction
Nasopharyngeal carcinoma (NPC) (OMIM 161550 and 607107) is a malignant tumour arising in the nasopharyngeal mucosa. NPC displays a geographic bias. It is a rare malignancy in most parts of the world (ASR = 1 case per 100,000 per year), but a highly endemic disease in selected demographics such as Southern Chinese population of Guangdong (as high as 15 to 25 cases per 100,000 per year) [1], native Greenlanders [2] and migrants of the Southern Chinese in America, Australia, and Malaysia [3,4]. The skewed demographic distribution of the disease coupled with familial aggregation [5], Epstein-Barr virus (EBV) infection and environmental factors such as air pollutants [6], dietary intake of salted fish and preserved food [7], suggest possible interplay between genetic and environmental factors in NPC pathogenesis. Many genetic studies on single nucleotide polymorphisms (SNPs) have reported associations to NPC. Genes linked to NPC include HLA-A (Human leukocyte antigen A) [8,9], HLA-B (Human leukocyte antigen B) [10], GABBR1 (Gamma-aminobutyric acid (GABA) B receptor 1) [11], CYP2E1 (cytochrome P450 2E1) and hOGG1 (human 8-oxoguanine DNA N-glycosylase 1) [12]. Despite many reports linking SNP variants to NPC predisposition, structural variants such as CNVs and its possible influence on NPC predisposition remain grossly neglected. Thus far, only deletions linked to NPC have been reported at chromosome 3p [13] and 6p21.3 [14].
Copy number variants are structural variants present in multiple copies involving DNA segments of ! 1 kb [15]. CNVs can have simple structures such as tandem duplication or complex structures like sequence complexity at junctions that arise from complex amplification and deletion rearrangements. Some CNVs remain neutral in function, others convey differences in phenotypes by disrupting genes, creating fusion genes, changing copy number of dosage sensitive genes and altering the regulatory transcription levels [16]. These changes in phenotypes can be benign or can have detrimental implications in diseases such as osteoporosis [17], schizophrenia [18], Li-Fraumeni-syndrome-associated cancers [19] and neuroblastoma [20].
The Malaysian Chinese population is largely made up of descendants from Southern China. This population comprises various dialect groups of the Hokkien, Hakka, Cantonese, Teochew and Hainanese. NPC incidence is high among the Malaysian Chinese, especially in Chinese males (ASR = 14 per 100,000) [21]. As for the Malaysian Malay population, they are a heterogeneous ethnic group with different ancestral origins based on migration patterns of centuries ago. They were the descendants of the Proto-Malays, admixed with Siamese, Javanese, Sumatran, Indian, Thai, Arab and Chinese traders [22]. The Malays make up the majority of the Malaysian population. NPC incidence is low among the Malays, including the Malay males (ASR = 4.0 per 100,000) [21].
This study aims to evaluate the role of CNVs in NPC predisposition in the Malaysian Chinese and Malay populations. We report a case control genome-wide SNP microarray approach to identify CNVs associated to NPC in the Malaysian Chinese and Malay populations. Genotyping, quality control and correction for population structure Genome-wide genotyping of NPC cases and healthy controls of Malaysian Chinese was conducted using Illumina 1 HumanOmniExpress_12 v1.1 Genotyping BeadChip (San Diego, CA, USA) according to manufacturer's protocol. The genomic DNA was extracted from peripheral blood leukocytes using conventional phenol-chloroform method. Samples with sample call rates <0.99, mismatch between recorded and estimated gender, cryptic relatedness and population outliers were excluded from the subsequent studies. Population outliers were identified using Principal component analysis (PCA) performed in EIGENSTRAT version 2.0 [23].

Generation of CNV calls
Samples passing quality control measures were used for generating CNV calls. Log R ratio (LRR) and B allele frequency (BAF) were generated from intensity data of the genome-wide genotyping using GenomeStudio (v.3.1.6) (San Diego, CA, USA) and CNVs were called using PennCNV (v.2011 Jun16) [24] and cnvPartition (v.3.1.6) (San Diego, CA, USA). For uniformity, samples with standard deviation (SD) of LRR! 0.20, SD of BAF ! 0.2, BAF drifting value of ! 0.01, waviness factor ! 0.05 and samples with more than 50 CNV calls were removed from analysis using PennCNV prior to CNV calling. Centromeric regions, telomeric regions, XY chromosomes and regions coding for immunoglobulin genes were also excluded from CNV calling. Minimum number of markers of 5 were set as threshold for all CNV calling. For cnvPartition analysis, confidence score threshold was set to 35 and minimum number of probes was set to 5.

CNV validation and replication
Validation of GWAS CNV calls and replication of CNV association at candidate loci on chromosome 11q14.3 and chromosome 6p21.3 were performed using Taqman CNV genotyping assays (ABI, Foster City, CA) on the GWAS discovery cohort (Malaysian Chinese only) and replication cohort (Malaysian Chinese and Malaysian Malay). Quantification cycle (C q ) of each region of interest was determined by QuantStudio™ 12K Flex Software (v.1.0, Applied Biosystem, Foster City, CA) and copy numbers were determined using Copy Caller (v.2.0, Applied Biosystems, Foster City, CA) according to a comparative delta Cq method. Random samples with a mixture of copy number (CN) state of 1, 2 and 3 were subjected to validation with Droplet digital PCR (ddPCR) (Bio-Rad Laboratories, Hemel Hempstead, UK) following manufacturer's protocol.

Gene-set enrichment analysis
Five pathways previously found to be associated to NPC were subjected to enrichment analysis [25]. Enrichment analysis was carried out using cnv-enrichment-test [26] implemented in PLINK v.1.07 [27]. Gene sets were compiled from the Molecular Signatures Database (MSigDB) v.3.1 namely KEGG WNT signalling pathway, ST ERK1 ERK2 MAPK pathway, KEGG NOTCH signalling pathway, REACTOME regulation of apoptosis and REACTOME signalling by EGFR in cancer. One sided empirical P-values with 10,000 permutation of enrichment of a subset of genes of a pathway relative to all genes were acquired.

Global CNV burden analysis
PLINK v.1.07 [27] was used to perform global CNV burden analysis. CNVs were classified as rare when found in <1% of total GWAS samples or common when found in ! 5% of total GWAS samples. Tests for CNV burden (one-sided) were done for number of CNV segments per individual, number of genes overlapped by CNVs per individual and number of genes intersected per total CNV kb using UCSC RefSeq (hg18) gene annotation. CNVs were considered co-localized if they overlapped by at least 50% of their length.

HLA allele imputation
SNPs that pass quality control filtering on chromosome 6 were input into the SNP2HLA software [28] in forms of PLINK formats of (.bed/.bim/.fam). SNP allele annotations were mapped to forward strand and physical coordinates corresponding to hg18. Default parameter settings for SNP2HLA were used in the phasing and imputation. The Pan-Asian reference panel was used for this study [29,30].

Statistical association analysis
Copy number variant regions (CNVRs) are identified using reciprocal overlap (minimum threshold 50%). Logistic regression using PLINK v.1.07 was carried out to assess the frequency difference between NPC cases and healthy controls using gender and age as covariates. Gene-based association was carried out using ParseCNV v.17 [31]. GWAMA v.1.4 [32] was used for I 2 heterogeneity analysis. G Ã Power v.3.1.9.2 was employed for power analysis for z tests logistic regression, using post-hoc computation of achieved power, one tail at α = 0.05 setting [33].

Genome-wide CNV Distribution in the Malaysian Chinese GWA Sample
Our CNV analysis workflow is detailed in Fig 1. From the 733,202 genotyped SNPs, 13,365 SNPs were removed due to low call frequency (<0.99). In total, 9 cases and 24 controls were removed due to population outliers, mismatch between recorded and estimated gender and cryptic relatedness. CNV calling quality control measures removed a further 44 cases and 13 controls due to high standard deviation of Log R ratio (SD BAF ! 0.2), excessive CNV calls (!50 CNVs) and drifting B allele frequency values (BAF drifting value of ! 0.01). Post quality control measures, 140 cases and 256 controls remained and were used for downstream CNV analysis.
For global CNV analysis, we used only stringent consensus CNVs from both PennCNV and cnvPartition calling algorithms. Total of 1679 CNVs were called (mean size = 147,798bp; median size = 66739bp; range size = 720bp-4,979,575bp). CNV burden analysis revealed no significant difference in global distribution of rare CNVs between cases and controls ( Table 1). There was also no significant difference in common CNV rates (CNV freq !5%) (P = 0.5326) ( Table 1), although common CNVs were 5.62 times more enriched with genes in cases than in controls (P = 7.00x10 -4 ) ( Table 1). NPC cases also had an over-representation of common genic deletions compared to controls (Case/control ratio = 2.26, P = 0.03). There were however no significant difference between cases and controls in terms of rare CNV (CNV freq <1%) rate and gene enrichment.

CNVRs associated to NPC susceptibility and pathway enrichment
We focused on consensus CNVs called from both PennCNV and cnvPartition calling algorithms. From the consensus CNVRs, we selected regions overlapping genes and genomic regions previously linked to NPC and other cancers. We also checked for genes differentially expressed in NPC tissue specimens compared to normal nasopharyngeal tissue [34,35]. Six candidate CNVRs fulfilling the selection criteria were presented in Table 2. Further qPCR validation was done on CNVR 11q14.3 and 6p21.3 and Fig 2 provided showed example LRR and BAF in these CNVs. LRR and BAF from SNP data within CNVs in chromosome 6 (S1 Text) and chromosome 11 (S2 Text) can be found in the supporting information. Concordance rate between CNV calls generated from calling algorithms and qPCR was shown in Table 3. As shown in Table 3, PennCNV calls had higher concordance for both CNVRs on chromosome 6p21.3 (99.75%) and 11q14.3 (91.67%). Taqman genotyping of CNVRs 11q14.3 and 6p21.3 confirmed suggestive association in the Malaysian Chinese discovery cohort (P 11q14.3_Discovery = 4.60x10 -3 ; P 6p21.3 Discovery = 0.051) ( Table 4). CNVRs 11q14.3 and 6p21.3 were selected for further replication in a separate Malaysian Chinese (465 cases and 677 controls) and Malaysian Malay (114 cases and 124 controls) replication cohort.
The CNVR 11q14.3 and 6p21.3 copy numbers identified through Taqman qPCR were highly concordant with digital PCR copy numbers though comparison of CNV copy numbers was limited to only 10 random samples.

Discussion
Common CNVs have been associated with predisposition to cancers such as neuroblastoma [20] and breast cancer [36]. The over-representation of common deletion genic CNVs in cases as compared to controls (Case/control ratio = 2.26, p-value = 0.03) could potentially bring about deleterious and pathogenic consequences that would lead to susceptibility to NPC. No previous studies on global CNV burden analysis were reported in NPC. The most significantly associated CNVR from our study was within the metabotropic glutamate receptor 5 (GRM5) gene, suggesting its potential role in NPC pathogenesis. GRM5 gene (approximately 563kbp) located on the minus strand of chromosome 11q14.3 encodes a group I Gq-coupled receptor. Alternative 5' splicing and usage of multiple promoters were involved in the regulatory mechanisms of GRM5 expression [37]. Evidence from studies suggests a role for glutamatergic signalling in the biology of cancer in peripheral tissues [38,39], especially those of mucosal nature. Different groups have also shown evidence for a role of glutamate and its receptors in regulation of tumour growth [40]. Examples of metabotropic glutamate  receptors being implicated in development of cancers are the involvement of GRM1 [41] and GRM5 [42] in the induction of melanoma in transgenic mice. In addition, GRM5 was found to play a role in tumour cell migration and invasion in oral squamous cell carcinoma [43] and found to be overexpressed in lung cancer cells [44]. Deletion on chromosome 6p21.3 which overlaps MHC Class I Polypeptide-Related Sequence A (MICA), HLA complex 5 (HCP5) and HLA complex group 26 (HCG26) genes was  previously associated to NPC predisposition. MICA encodes a ligand for the natural killer-cell receptor NKG2-D type II. MICA is highly expressed on cancer cells and can activate antitumor effects from natural killer cells and CD8 + T cells [45]. Meanwhile the function of HLA complex P5, HCP5 is not fully understood.

REACTOME regulation of apoptosis
CNVs identified from our study were enriched for ERK1/2 and Notch signalling pathways. Mutations in GRM5 influenced Ca 2+ oscillations in transgenic mice that showed tumour/melanoma phenotype in addition to dramatic increase in phosphorylation of ERK in these tumour samples. They had implicated ERK as a downstream effector of GRM5 signalling in tumours. The possible role of ERK pathway in NPC pathogenesis had also been corroborated by Lan et al. [50]. Further studies are needed to better understand the dynamics and interaction of GRM5 within the ERK pathway and its role in NPC.
We report the association of CNV 11q14.3 and 6p21.3 as well as the ERK1/2 and Notch signalling pathways with NPC susceptibility in the Malaysian Chinese cohort using a case control genome-wide SNP microarray approach. The caveat attached to SNP microarray CNV detection remains the diverse algorithmic calling methods, resulting in considerable variation in the CNV calls. We have employed extensive validation and replication using Taqman real-time PCR and a further more sensitive digital PCR method to negate false CNV calls for CNVs 11q14.3 and 6p21.3. Our study was able to detect association of CNV 11q14.3 and 6p21.3 in the Malaysian Chinese cohort but not in the Malaysian Malay cohort. However power analysis showed that power for CNV study for Malay replication cohort at 11q14.3 and 6p21.3 is 0.26 and 0.11 respectively, hence a larger sample size would be needed to improve the achieved power for better confidence. To increase reproducibility and confidence in our association, replication efforts with new NPC cohorts will aid to confirm the link between CNV 11q14.3 and 6p21.3 and NPC.