Coding variants in NOD-like receptors: An association study on risk and survival of colorectal cancer

Nod-like receptors (NLRs) are important innate pattern recognition receptors and regulators of inflammation or play a role during development. We systematically analysed 41 non-synonymous single nucleotide polymorphisms (SNPs) in 21 NLR genes in a Czech discovery cohort of sporadic colorectal cancer (CRC) (1237 cases, 787 controls) for their association with CRC risk and survival. Five SNPs were found to be associated with CRC risk and eight with survival at 5% significance level. In a replication analysis using data of two large genome-wide association studies (GWASs) from Germany (DACHS: 1798 cases and 1810 controls) and Scotland (2210 cases and 9350 controls) the associations found in the Czech discovery set were not confirmed. However, expression analysis in human gut-related tissues and immune cells revealed that the NLRs associated with CRC risk or survival in the discovery set were expressed in primary human colon or rectum cells, CRC tissue and/or cell lines, providing preliminary evidence for a potential involvement of NLRs in general in CRC development and/or progression. Most interesting was the finding that the enigmatic development-related NLRP5 (also known as MATER) was not expressed in normal colon tissue but in colon cancer tissue and cell lines. Future studies may show whether regulatory variants instead of coding variants might affect the expression of NLRs and contribute to CRC risk and survival.


Introduction
Within the last few years it has become evident that the interplay between pattern recognition receptors (PRRs), such as Toll-like receptors (TLRs) or Nod-like receptors (NLRs), and the gut microbiota has a profound influence on the homeostasis of the immune system and therefore on many important aspects of human health [1][2][3]. If undisturbed and well-regulated, this symbiosis is beneficial for the human host. A disruption of the underlying regulatory pathways can, however, result in the development of local and chronic inflammation, inflammatory bowel disease (IBD) and/or colorectal cancer (CRC) [4,5]. Gut homeostasis is maintained by a physical separation of the microbial community from the gut epithelia by the mucosa and a mucus layer. PRRs monitor the integrity of this barrier and the adjacent microbial community by detecting microbe-associated molecular patterns (MAMPs) as well as endogenous damage associated molecular patterns (DAMPs), and consequently controlling antimicrobial responses that contribute to an equilibrium between microbes and host [2,[4][5][6]. The activation of PRRs TLRs and/or NLRs by MAMPs or DAMPs results in the activation of multiple signaling pathways including nuclear factor-κB (NF-κB), mitogen-activated protein kinases (MAPKs), and the type I interferon (IFN) response, with subsequent induction of an inflammatory and antimicrobial response that includes secretory IgA, antimicrobial peptides, pyroptosis and autophagy [1,2,7]. Some NLRs, such as NLRP1, 3, 6, 12 and NLRC4, form so-called "inflammasome" complexes, comprising of the respective NLRs, the adaptor Apoptosis-associated specklike protein containing a CARD (ASC) and pro-caspase-1. Inflammasome assembly initiates inflammatory and antimicrobial response via the autoproteolytic cleavage of caspase-1, catalysing the proteolytic conversion of pro-interleukin-1β (IL-1ß) and other IL-1 family members into biologically active cytokines which drive inflammation [2,8,9].
We recently reported an impact of TLR polymorphisms on CRC survival [10,11]. Given the suggested concerted action of TLR and NLR signaling [1,2], the connection between NLRs and CRC seemed of special interest. Provoking studies in mice and association studies in humans have suggested that NLR signaling is involved in inflammatory bowel disease, chronic inflammation and gastrointestinal cancers, including CRC [6,11,12]. Apart from various reports on mice, convincing data directly connecting NLRs and human CRC are available only for NOD1, NOD2 and NLPR3, which were found to be associated with susceptibility, progression and treatment of sporadic CRC, colitis and/or colitis-associated CRC [13,14]. In general, it is unclear whether and how other NLRs contribute to human CRC development or progression. Additionally, the functional importance of NLRs that have embryonic lethal phenotypes in mice-so-called "reproduction-related NLRs" like NLRP2, 5 and 13 -in human immunity and/or tumorigenesis remains an unresolved question [8].
In order to systematically investigate the influence of potentially functional coding polymorphisms in the NLR genes on sporadic CRC risk and survival, we conducted a case-control study with replications in 2 large genome-wide association studies (GWASs), covering the majority of known non-synonymous single nucleotide polymorphisms (nsSNPs) in 21 genes across different NLR signalling pathways. In silico analysis was done on selected nsSNPs in the NLR gene family. Furthermore, RNA expression of selected genes was measured to assess mRNA expression in immune cells, biopsies and/or CRC cell lines.

Ethical approval
The For the work in Tübingen: All patients or healthy blood donors included in gene expression analyses for this study provided their written informed consent before study inclusion. Approval for use of their biomaterials was obtained by the local ethics committee at the University of Tübingen, in accordance with the principles laid down in the Declaration of Helsinki. Terminal ileum/ colon biopsies were obtained from patients undergoing routine colonoscopy at the University Hospital Tübingen, buffy coats obtained from blood donations of healthy donors were received from the Center for Clinical Transfusion Medicine (ZKT) at the University Hospital Tübingen and whole blood from voluntary healthy donors was obtained at the University of Tübingen, Department of Immunology.
The DACHS study was approved by the ethics committee of the Medical Faculty of the University of Heidelberg (no. 310/2001). The DACHS study is registered: StudyBox no. ST-066, DRKS no. DRKS00011793 The work in Scotland was approved by the UK National Health Service Research Ethics Committee (approval references 13/SS/0248; 11/SS/0109 and 01/0/05). 21 Fig 1).

SNP selection and in silico analysis of conservation and functional relevance
To gain additional insight into a possible functional relevance, all genotyped and linked SNPs were mapped to their location in the respective proteins. 19 of the 41 genotyped SNPs are located in defined NLR protein domains [3] (Fig 1 & Table A in S2 File): 11 in the NACHT domain, seven in the LRR domain and one in the PYD domain. The remaining genotyped SNPs mapped to linker regions. SIFT (sift.jcvi.org) and PolyPhen2 (genetics.bwh.harvard.edu/ pph2) databases were used to assess possible effects of the SNPs on the protein. 23 of the genotyped SNPs are predicted to be deleterious or damaging, and/or to result in non-sense mediated decay or retained introns (Fig 1 & Table A in S2 File). Assessments of evolutionary conservation of the selected variants was performed by three software suites namely Genomic Evolutionary Rate Profiling (GERP [15]), PhastCons [16] and phylogenetic p-value (PhyloP [17]). The GERP score of >2.0 and the PhastCons score (values between 0-1) of >0.3 indicate a good level of conservation of the variants. Positive PhyloP scores (values between −14 and +6) are predicted to be conserved. Higher values of these tools reflect the probability that the nucleotide is located at a conserved position, based on the multiple alignment of genome sequences of 100 different vertebrates. Lower values of these tools reflect fast-evolving variant positions.

Discovery set-Czech Republic
Study population. The study was carried out on a Czech CRC case-control population of patients (n = 1237; median age 63 years; 61.7% males) with colon or rectal malignancy-  SNP genotyping. TaqMan SNP Genotyping Assays (Applied Biosystems) or KASP genotyping assays (LGC Genomics) were used for the analysis of the SNPs. Case and control samples were amplified simultaneously in 384 well format (Hydrocycler 16 (LGC Genomics), using 3 ng whole genome amplified DNA from blood). Endpoint genotype detection was carried out on the ViiA 7 Real-Time PCR System (Applied Biosystems). Call rates for 40 out of 41 SNPs were 94-99%. Internal quality controls showed a concordance rate of ! 99%. Samples with < 50% call rate over all assays were excluded from the study.

Replication sets-Germany and Scotland
For replication, all SNPs associated with CRC in the Czech population (p<0.05) were tested in two large European genome-wide association studies (GWASs) carried out in Germany ("Darmkrebs: Chancen der Verhütung durch Screening Study"-DACHS) [19,20] and in Scotland ( Table B in S2 File) [21,22].
Germany. The sample set used as the replication set is part of the still on-going DACHS project and comprised 1796 CRC patients (median age 69 years; 58.6% males) who received in-patient treatment due to a first diagnosis of CRC in 22 hospitals of the Rhine-Neckar-Odenwald region of Germany. The 1810 community-based controls were randomly selected from population registries matched for gender, 5-year age groups and county of residence (median age 70 years, 59.6% males, cancer-free at the time of sampling), (Table B in in S2 File) [19,20]. Cases and controls genotyped in the present study were recruited between January 01, 2003 and December 31, 2007. For overall survival (OS) analysis, 1794 incident CRC cases with information about age, sex, tumor stage, and a median follow-up time of 48.4 months in men and 49.9 months in women were available [23].
Scotland. The Scottish study series comprised 2115 cases (median age 57 years, 57% males) from the Scottish colorectal cancer study (SOCCS) [21] and 95 cases (median age 67 years, 66% males) from Ninewells Hospital, Dundee and Perth Royal Infirmary collected between 1997 and 2000 [26]. SOCCS is a case-control study designed to identify genetic and environmental factors associated with non-hereditary CRC risk and survival outcome. Population controls with no personal history of cancer were ascertained from four cohorts including 8533 (42% males, mean age 55.4 yrs)-from Generation Scotland-Scottish Family Health Study [27,28]; 513 (41% males, mean age 79 years) and 1004 (50.6% males, mean age 70 years) from the Lothian Birth Cohorts 1921 and 1936, respectively; and 262 Dundee controls (50% males) were recruited through the same General Practice surgeries as cases or from spouses/ friends of cases [29]. The detailed information on genotyping cases and controls and data quality control is described elsewhere [22]. 2210 cases and 9350 controls were included in the final analysis. The survival analysis was performed in a subset of SOCCS study comprising 1402 patients (median follow up 107 months, recruited between 2001 and 2006) with colorectal adenocarcinoma confirmed by pathological assessment. Participants completed a detailed lifestyle questionnaire and a semi-quantitative food frequency and supplements questionnaire (http:// www.foodfrequency.org). Genotyping was performed using the Infinium Human Exome BeadChip 12v1.0 or 12v1.1 (Illumina), with genotype calling using Illumina GenCall for HumanExome-12v1.0 and HumanExome-12v1.1 versions called separately. Generation Scotland controls and a subset of the cases from the SOCCS study were genotyped using OmniEx-pressExome BeadChip 8v1.1 or 8v1.2 (Illumina).
In accordance with the Declaration of Helsinki, all participants provided written informed consent. The studies were approved by the local ethics committees.

Statistical analysis-Discovery set
Genotype frequencies in controls were tested for Hardy-Weinberg equilibrium (HWE; Pearson's goodness-of-fit χ 2 test, deviation assumed at p < 0.001). NLRP11 rs12461110 was excluded for violation of HWE.
Single variant associations with CRC risk, overall and event-free survival. Odds ratios (ORs) and 95% confidence intervals (CIs) for associations between genotypes and CRC risk were estimated by logistic regression (PROC LOGISTIC, SAS V9.3; SAS Institute, Cary, NC) and refer to the minor allele. P values were considered nominally significant at p 0.05, with a study-wide significance level at p 0.001 considering Bonferroni correction for multiple testing (0.05/39 = 0.0012). ORs were adjusted for age and sex. The estimated power was > 95% for OR !1.5 (MAF > 5%; p = 0.05; dominant model) [30].
Differences in OS and EFS between genotypes were estimated by hazard ratios (HRs) and 95% CIs using Cox regression (PROC PHREG, SAS V9.3) adjusting for age, sex, tumor grade and tumor stage. The estimated power was > 90% for HR ! 2.0 (MAF > 10%, p = 0.05). OS was calculated for all patients (OS (pM0&1) ); for patients with non-metastatic disease at the time of diagnosis OS (OS (pM0) ) and EFS (EFS (pM0) ) were calculated. Kaplan-Meier plots were generated, estimating the differences between the survival functions by log-rank test (PROC LIFET-EST, SAS V9.3).
Additive SNP associations with CRC risk and survival. Additive influence of the risk alleles (p 0.05) on CRC risk and survival identified in the Czech population was estimated (risk: five SNPs, 0-10 risk alleles per individual; survival: eight SNPs, 0-16 risk alleles). For each SNP the allele associated with a higher OR or HR was designated the "risk allele". Patients were grouped into equally sized groups of risk alleles (risk: 0-3/4-5/6-10; survival: 0-5/6-7/8-12) and ORs and HRs were calculated, adjusting for age and sex, HRs also for tumor grade and stage. Kaplan-Meier plots were generated for the additive survival model and the log-rank test was performed. The same analysis was conducted separately for the three NLRP5 risk SNPs (p 0.05) (0-6 risk alleles).
Statistical analysis-Replication sets. Data provided by the DACHS study consisted mostly of imputed genotypes, all in dosage format referring to the number of copies of minor allele. To permit direct comparison with the German data set genotype data from the Czech study was coded as 0, 1, or 2 copies of the minor allele. For these two data sets, association between SNPs and risk for CRC was obtained by applying logistic regression considering a log-additive genetic effect model (PROC LOGISTIC, SAS Version 9.2; SAS Institute). HRs (PROC PHREG, SAS version 9.2, SAS Institute) were calculated via Cox regression with a model that included the SNP coded as number of copies of the minor allele, age, sex and tumor stage for both sample sets.
ORs and 95% CIs for association between each of the genotypes and risk of CRC in Scotland were estimated using unconditional logistic regression adjusted for age and gender. HRs and corresponding 95%CI for overall survival analysis in Scotland was calculated for each of the genotyped SNPs and dominant model using Cox regression adjusted for age, gender and TNM stage. OS was calculated for all patients (OS (pM0&1) ) and for patients with non-metastatic disease at the time of diagnosis OS (OS (pM0) ). No event-free survival analysis was performed in the Scottish data. All analysis was performed in R v3. 1

Gene expression
For expression analyses, patients providing biopsy material were recruited at the University Hospital Tübingen. Healthy blood donors were recruited at the Center for Clinical Transfusion Medicine (ZKT), University Hospital Tübingen and respective buffy coats obtained from blood donations. All patients/ healthy blood donors included in gene expression analyses for this study provided their written informed consent before study inclusion. Approval for use of their biomaterials was obtained by the local ethics committee at the University of Tübingen, in accordance with the principles laid down in the Declaration of Helsinki. Terminal ileum/ colon biopsies were obtained from patients undergoing routine colonoscopy at the University Hospital Tübingen, buffy coats obtained from blood donations of healthy donors were received from the Center for Clinical Transfusion Medicine (ZKT) at the University Hospital Tübingen and whole blood from voluntary healthy donors was obtained at the University of Tübingen, Department of Immunology.
Cell lines, primary human immune cells and biopsy material. HCT116, DLD-1 and Caco2 cells were grown and sourced as described [31], without re-authentication. Primary leukocytes were isolated from buffy coats (Tübingen University Hospital, Center for Clinical Transfusion Medicine (ZKT)) using Ficoll (GE Healthcare) density gradient purification and CD14+ monocytes were isolated using MACS (Miltenyi) magnetic beads to a purity of > 95% (anti-CD14-PE flow cytometry, BD). Subsequently, cells were differentiated into monocytederived dendritic cells (MoDC) or monocyte-derived macrophages (MoMacs) by culture in RPMI 1640 medium (Invitrogen) supplemented with 10% fetal calf serum (FCS) in the presence of 40 ng/ml IL-4 and 25 ng/ml GM-CSF (Peprotech) or with 25 ng/ml GM-CSF for 6 days, respectively. Neutrophils were isolated from the Ficoll pellet after NH 4 Cl lysis of erythrocytes. All cells were grown at 37˚C and 5% CO 2 . Biopsies from the terminal ileum or colon (n = 12; median age 46; 56% males) were obtained during routine colonoscopy at the University Hospital Tübingen and stored in liquid nitrogen until analysis [13].
Gene expression analysis. Gene expression analysis was carried out using single-gene TaqMan1 Gene Expression Assays (Applied Biosystems) for NLRP2, NLRP3, NLRP5, NLRP6, NLRP13 and NLRC5. mRNA was isolated from whole blood or primary blood cells (two donors, #1 and #2, respectively) or THP-1, HCT116, DLD-1 or CaCo2 cell lines using an RNeasy Mini Kit (Qiagen) and commercially available RNA samples for human ovary, duodenum, ileum (sample #7), rectum and colon adenocarcinoma were used (Agilent). RNA from ileum or colon biopsies from six patients (samples #1-6) was isolated using TRIzol Reagent (Life Technologies) according to standard protocols and reverse transcribed into cDNA using oligo(dT)12 primer [13]. Following transcription to cDNA (High Capacity RNA-to-cDNA Kit; Life Technologies), expression was analysed using pre-validated TaqMan1 Gene Expression Assays (Applied Biosystems). Data were normalized to TBP (TATA box binding protein). The samples were analysed in triplicate using the 7500fast Real-Time System (Applied Biosystems).

NLR variants are associated with CRC risk and survival in the Czech sample set
Nominally significant associations with CRC risk were detected for six SNPs (Table 1; Table C in in S2 File). In an additive risk model combining those six variants, CRC risk increased significantly with increasing numbers of risk alleles, and a maximum for carriers of 6-10 risk alleles (OR 2.10, p = 0.0005; Table 1).

GWAS data on the NLRP risk and survival SNPs-Replication sets
In order to validate the results from the Czech cohort, all SNPs included into the additive model for CRC risk (N = 5) and survival (N = 8) were analyzed in two large GWAS sample sets from Germany and Scotland. Complete data was available from the DACHS GWAS. The Scottish GWAS provided data on three CRC risk variants (rs12150220, rs306457 and rs303997) and seven survival variants (Table E in in S2 File). Scottish data was available for

SNP Genotype Cases Death (%) HR (95%CI) p-val Cases Deaths (%) HR (95%CI) p-val Cases Events (%) HR (95%CI) p-val NLRP1
A genotypes, DACHS data according to allelic probabilities. Despite the promising initial results, neither the associations for CRC risk nor the associations for CRC survival were replicated in the GWAS sets. We also tested the additive models in the DACHS population, but no association was evident (data not shown).

Divergent expression patterns of NLRP2, 5, 6 and 13 in hematopoietic and non-hematopoietic cells
To investigate whether the NLRs found to be associated with CRC in the Czech discovery set, were expressed in the gut or immune cells, mRNA levels were quantified for selected NLRs in primary tissue samples and cell lines. In gut-related tissues, CRC cell lines, whole blood and neutrophils (PMN), NLRP2 showed moderate expression, with the lowest expression in rectum (Fig  2A and 2B). Although NLRP2 showed low expression in CD14 + monocytes, this was increased up to 100-fold in monocyte-derived primary dendritic cells and macrophages ( Fig 2B). NLRP5 (also known as MATER) was not detectable in immune cells (not shown) and normal gut tissue but in ovary (Fig 2C), in keeping with its role in oogenesis [8]. However, we observed expression in a primary CRC sample and three CRC cell lines (Fig 2C) but not fibroblast or B lymphocyte cell lines (not shown), in agreement with mRNA expression data from the GENT database ( Figure A in S1 File). Consistently with an earlier report [32], NLRP6 (also known as PYPAF5) was highly expressed in neutrophils (PMN), low in monocytes and MoDC but not detectable in gut biopsies (not shown). NLRP13 was below the level of detection in all analysed samples except for the positive control, ovary and the DLD1 CRC cell line (Fig 2D). NLRC5 was expressed to varying degrees in healthy gut tissue and most highly in colon adenocarcinoma, and was inducible by IFNγ in HCT116 cells (Fig 2E). NLRP3 was expressed at considerable levels only in duodenum, rectum and CRC samples, but not in normal ileum and colon biopsies (Fig 2F). While the role of NLRP13 remains unclear, additional data on the reported occurrence of somatic mutations in these genes in CRC suggest that NLRP2, NLRP3 and NLRP6 may impact CRC development and survival via immune cells, whereas NLRP5 might be relevant in gut tissues themselves, possibly experiencing a re-expression after malignant transformation.

Discussion
In the discovery set from the Czech Republic, five of 39 successfully tested SNPs were associated with CRC risk, and eight with CRC survival. An additive effect on CRC risk and survival was detected, resulting in a 2-fold increased risk and a 3-fold worse survival for carriers of !6 and !8 risk alleles, respectively. Despite these promising results in the Czech population, these associations could not be confirmed in the two large German and Scottish GWAS data sets. This was surprising taking into account the in silico predictions about the functionality of the SNPs and the results of the expression analysis which showed that the genes NLRP2, NLRP3 and NLRP6 may impact CRC development via immune cells. Accordingly, differential expression of these genes may cause alterations in pathways providing the emerging hallmarks of cancer, such as evading immune clearance and tumor-promoting inflammatory responses [33]. For NLRC5, whose expression was induced in HCT116 cells by IFNγ (Fig 2E), one plausible functional outcome may be the modulation of MHC class I expression [34]. The latter strongly correlates with CRC survival due to its effect on CD8 cytotoxic T cell and natural killer cell immuno-surveillance [35]. According to murine data, the NLRP12 may also affect T cell function in the context of human CRC [36]. Most intriguingly, expression of development-related NLRP5 was undetectable in normal gut-related tissues but was up-regulated in malignant gut tissue and colon cancer cell lines (cf. Fig 2C and Figure A in S1 File.), suggesting for the first time a potential novel role beyond developmental control for this enigmatic NLR in humans [8]. During oogenesis, murine Nlrp5 appears to influence mitochondrial localization and activity, ATP content and Ca 2+ homeostasis-processes which all have been linked to NLRP3 inflammasome activation and thus inflammation in differentiated cells. NLRP5 may thus act in concert with NLRP3, which is known to be associated with human CRC [14], a speculation warranting further investigation. The concerted association of genes of the NLR family may directly link environmental risk factors, intestinal inflammation, the microbiota and well-described cancer pathways involved in CRC development, such as the MAPK pathway and the NF-κB pathway.
One might argue that the failure to replicate the association results in the Czech discovery set might be due to differences in the clinical composition between the case-control populations of the discovery set and the replication sets (Table B in S2 File). However, data was adjusted for all significantly different parameters except tumor location (colon or rectum; not possible due to incomplete data) suggesting that the detected associations in the discovery set were false positive results. In the light of the supporting gene expression data it is possible that the coding variants analyzed in this study do actually not have an effect on the functionality of the receptor proteins. This assumption is supported by the fact that the majority of variants are not located in evolutionary conserved regions of the genes which allow for natural variability. Further, it is also possible that undetected environmental factors might have biased the results. Especially for immune related genetic variants, interactions with environmental factors or treatment might play a major role enhancing or even enabling effects of SNPs on CRC risk or survival. Based on the interesting expression results, future studies of these genes and their encoded receptors, including the analysis of regulatory genetic variants affecting the gene expression as well as the analysis of the patient specific tumor microenvironment and tumor infiltrating immune cells and immune constitution, may contribute to uncover the still poorly understood role of NLRs within the intestinal immune system, as well as, in CRC development and survival [37]. The integration of different exogenous, endogenous, tumour and immune factors, potentially including the variants in NLR genes studied here, holds promise for future approaches in precision medicine [37].  (Table A) Complete list of genotyped SNPs in candidate genes, with information about all linked missense SNPs (r 2 ! 0.8) and the location in protein domains. NMD: nonsense mediated decay; Ã Variant Effect Predictor by Ensembl http://www.ensembl.org/Homo_ sapiens/Tools/VEP (Table B) Population Description. a Z statistics: Wilcoxon Rank-Sum-Test; b Chi-square; event = recurrence, metastasis, death. (Table C) Genotype distribution of all analysed SNPs in the Czech case-control population: Risk and Survival analysis. CRC Risk: Data adjusted for age of diagnosis and sex. Overall Survival and Event free Survival: Data adjusted for age of diagnosis and sex, grade and stage. Nominal significance at p 0.05; significance level corrected for multiple testing at p 0.001. (Table D) mRNA Expression for the most promising candidate genes: Study data and reported somatic mutations for CRC-associated NLRs. (Table E)