Crohn’s Disease Localization Displays Different Predisposing Genetic Variants

Background Crohn’s disease (CD) is a pathologic condition with different clinical expressions that may reflect an interplay between genetics and environmental factors. Recently, it has been highlighted that three genetic markers, NOD2, MHC and MST1, were associated to distinct CD sites, supporting the concept that genetic variations may contribute to localize CD. Genetic markers, previously shown to be associated with inflammatory bowel disease (IBD), were tested in CD patients with the aim to better dissect the genetic relationship between ileal, ileocolonic and colonic CD and ascertain whether a different genetic background would support the three disease sites as independent entities. Methods A panel of 29 SNPs of 19 IBD loci were analyzed by TaqMan SNP allelic discrimination method both evaluating their distinct contribute and analyzing all markers jointly. Results Seven hundred and eight CD patients and 537 healthy controls were included in the study. Of the overall population of patients, 237 patients had an ileal involvement (L1), 171 a colonic localization (L2), and the 300 remaining an ileocolon location (L3). We confirmed the association for 23 of 29 variations (P < 0.05). Compared to healthy controls, 16 variations emerged as associated to an ileum disease, 7 with a colonic disease and 14 with an ileocolonic site (P < 0.05). Comparing ileum to colonic CD, 5 SNPs (17%) were differentially associated (P < 0.05). A genetic model score that aggregated the risks of 23 SNPs and their odds ratios (ORs), yielded an Area Under the Curve (AUC) of 0.70 for the overall CD patients. By analyzing each CD location, the AUC remained at the same level for the ileal and ileocolonic sites (0.73 and 0.72, respectively), but dropped to a 0,66 value in patients with colon localization. Conclusions Our findings reaffirm the existence of at least three different subgroups of CD patients, with a genetic signature distinctive for the three main CD sites.


Introduction
Crohn's disease (CD) is a pathologic condition with different clinical expressions that may likely reflect a peculiar interplay between genetics and the environmental factors [1,2].
To date, more than 180 genes or loci have been associated with the susceptibility to CD [3][4][5]. Of these, 138 loci conferred the risk to both CD and ulcerative colitis (UC), whereas 42 were unique to CD, albeit each variant had a small individual effect. Major established determinants of the clinical course of CD are disease location, clinical behavior, age at disease onset, and extra intestinal manifestations [6][7][8][9][10][11][12][13]. While disease behavior and extraintestinal manifestation dramatically change over the course of CD, disease sites shows little or no variation [14] and might serve as a clue to define distinct entities of CD. Recently, Cleynen I. et al [15] highlighted three genetic markers (16q12/NOD2, 6p21/MHC and 3p21/MST1) associated to distinct CD sites, supporting the concept that peculiar genetic pathways may contribute to localize CD. From the genetic analysis, colonic CD had peculiar loci different from those expressed in either ileal CD and UC. Albeit the prospect to use genetic markers in refining the molecular classification of CD has been extensively evaluated, only the association between NOD2 variants and ileal CD has been validated unambiguously [16].
In the present work we tested a panel of 29 SNPs of 19 loci previously shown to be associated with disease sites of inflammatory bowel disease (IBD) to better dissect the genetic relationship between ileal, ileocolonic and colonic Crohn's disease and ascertain whether a different genetic background would support the three disease sites as independent entities both evaluating their distinct contribute and analyzing all markers jointly.

Patients
Seven hundreds and 8 CD cases were recruited at gastroenterologic and paediatric referral centres in the framework of the IBD genetic study, a multi-centre collaborative effort started in 1998 and co-ordinated by the Division of Gastroenterology of the IRCCS, 'Casa Sollievo della Sofferenza', Hospital, Italy. 403 DNA samples from CD patients were shared with the International IBD Genetic Consortium projects, and were also included in our previously works. 537 healthy blood donors without personal or familial history of inflammatory disorders were also included in the study. Patients' feature and clinical data were stored in an anonymized database. The study and the experimental protocols received the approval from the ethic committee of the 'Casa Sollievo della Sofferenza' Hospital (N. 12701/08) and were performed in accordance with declaration of Helsinki approved guidelines. All participants had signed an informed consent form before study entry.
Diagnosis of CD was based on the Lennard Jones criteria [17]. According to Montreal's classification [9] disease confined exclusively to the distal ileum, with or without cecum involvement, was labeled as L1, exclusively to the colon as L2, and as L3 in the event of an ilecolonic location. Disease located in the upper digestive tract was labeled as L4.
Genotyping DNA was extracted from peripheral blood using the Qiagen DNA Blood Kit procedure (Qiagen, Hilden, Germany). All patients and controls were genotyped for 29 Single Nucleotide Polymorphisms (SNPs) of 19 loci, using TaqMan SNP allelic discrimination method by means of an AB17900HT sequence detection system (Applied Biosystems Inc., Foster City, CA) at "Casa Sollievo della Sofferenza" Hospital, San Giovanni Rotondo, Italy.
The list of the investigated SNPs was obtained by screening of the available literature by two investigators (OP, AL). 26 CD-related SNPs encompassing 18 genomic loci from fine mapping or GWAs studies, were selected (S1 Table). Finally, we selected three variations among those identified by Cleynen et colleagues [15] encompassing the HLA locus: the SNPs rs6930777 and rs9268832, the SNP rs9267798 was typed instead of the rs4151651 being in Linkage Disequilibrium with the former (D' = 0.82 and r 2 = 0.62 in CEU population) (S1 Table).

Statistical analyses
Univariate and multivariate stepwise logistic regressions were performed using SPSS software version 14.0 (SPSS, Chicago, IL, USA) and Haploview Software version 4.1 (http://www.broad. mit.edu/personal/mpg/haploview). The allelic frequencies for all investigated polymorphisms were tested for consistency with the Hardy-Weinberg equilibrium. Allelic and genotypic associations of SNPs were evaluated by Pearson's χ 2 test (or Fisher's test whenever appropriate). Compared to the control group, Odds ratio and 95% confidence intervals (CI) were estimated for each disease site (ileal, ileocolonic and colonic).
The stepwise logistic regression model was applied for all the variations significantly associated to the trait after correcting for multiple comparisons. This approach allowed to take into account a dose-response effect (heterozygote or homozygote), and the possible interactions between genes. P-values of less than 0.05 and 2.6e -3 after Bonferroni correction for multiple testing were considered significant.
In addition, to estimate the predictive value of multiple susceptibility loci on the disease status, we constructed a Genetic Risk Score (GRS) based on significant SNPs in the case-control study (P < 0.05) and their respectively odds ratios. We assigned to each subject a score based on the number of risk alleles carried for the SNPs associated with CD risk. We named "0" the common allele homozygote carriers, "1" the heterozygotes, and "2" the rare allele homozygotes. The number of risk alleles at each locus (2, 1, 0) was multiplied by their corresponding beta-coefficients of effect sizes [log(OR)] and then summed up in GRS that each individual carried. For the ORs we used "positive" scores achieved by the CD and healthy controls cohorts [18]. The receiving operating characteristic (ROC) curve was used to measure the area under the curve (AUC), sensitivity and specificity at various GRS cut-off.

Case-control study
The allele frequencies of the SNPs analyzed were in accordance with the predicted Hardy-Weinberg equilibrium in all subgroups of CD (ileal CD, ileocolonic CD and colonic CD) (P > 0.05). Clinical and demographic characteristics of CD patients their initial presentation are shown in Table 1.

Ileum localization
When compared to healthy controls, 16 variations of the 29 analyzed SNPs (55%) emerged as associated to a disease localized to the ileum (P < 0.05) ( Table 3). All these variants were referred to 12 loci. The NOD2 and MAP3K7IP1 genes, belonging to the innate immunity pathway, showed the strongest association. Carriers of the risk genotypes (aa + Aa) of the three NOD2 polymorphisms showed a highly statistical significance: rs2066844 [P = 1.4e -6 , OR 2.  Further, we applied a multiple stepwise logistic regression method for all the variations remaining significantly associated after using the Bonferroni correction for multiple comparisons (p<2.6e -3 ). For the NOD2, the presence of at least 1 variant was taken into account. All the variations resulted independently associated with ileal localization. In particular, the carriers of at least one of the NOD2 mutations [P = 6.9e -10 , OR 3.8 (2.5-5.

Colonic localization
When compared to healthy controls, seven (24%) SNPs of 7 loci resulted associated with a colonic disease ( Table 3). The most relevant association was for variations in genes IL23R, MST1 and for those on the locus 10q21. In particular, the 'aa' genotype of the rs7517847

Ileocolon vs colonic localization
By comparing the two disease sites, 3 SNPs (10%) emerged as significantly associated (Table 3) (Table 3). Either in NOD2 positive and negative patients, the frequency of smokers and non-smokers was weakly significant (P = 0.067) (data not shown).

Cumulative genetic risk score
We aggregated the information from the 23 genetic variants associated to CD and combined the contribution of each nucleotide polymorphism into a genetic risk score (GRS) (Fig 1). In the whole CD cohort, the Area Under the Curve (AUC) was 0.70, using a cut off value of 5.5 alleles (P = 8.6 e-20 ; sensitivity = 0.83; specificity = 0.46). By analyzing each Crohn's disease location versus healthy controls, the AUC increased in patients with ileal (AUC = 0.73; P = 5.4e -13 ; sensitivity = 0.85; specificity = 0.46) and ileocolonic localization (AUC = 0.72; P = 3.2e -14 ; sensitivity = 0.83; specificity = 0.46); in patients with pure colon localization an AUC of 0.66 was found (P = 1.4e-6; sensitivity = 0.80; specificity = 0.46). Next we considered the differentiating ability between an ileal or a colonic disease after combining the 23 genetic markers significantly associated to CD in the present investigation. The resulting GRS performed poorly in differentiating colonic versus ileal disease: AUC = 0.58; P = 0.03. Improved results became evident when we considered in the analysis only the NOD2 (rs2066844 and rs2066847), NKX2/3, MAP3K7IP1 and MTMR3 SNPs: AUC = 0.60; P = 2.7e -3 ; sensitivity = 0.52; specificity = 0.65 (data not shown).

Discussion
Genome-wide association studies, Immunochip and meta-analyses identified in patients with CD different susceptibility loci and possible candidate genes, that might affect the predisposition to the disease. Most involved genes were related to the innate and adaptive immunity, bacterial recognition, and autophagy-related mechanism. Cumulative evidences support the concept that deficiencies of innate immune cell functions, principally due to the three major mutations in the NOD2 gene, represent a crucial key in CD and may help distinguish it from UC [15,19].On the other side, serological and fecal biomarkers, although have the potential to become cornerstones of predictive models for monitoring the course of IBD, have been inconsistent in distinguishing either patients with CD or UC or in determining the different intestinal location of CD [20].
Very recently, a large genotype-phenotype study, carried out by the International IBD Genetic Consortium, highlighted that three loci namely 16q12/NOD2, 6p21/MHC and 3p21/ MST1 were of some guide in conditioning the site of CD within the gastrointestinal tract, with the 16q12/NOD2 locus as a major determinant of ileal disease, and the 6p21/MHC locus mainly associated with colonic disease [15].
The aim of the present work was to assess the potential advantage to use a panel of genetic markers to predict the clinical location in CD patients, and to use this information to stratify patients for a more effective targeted medical treatment.
In a case control study, we confirmed the association for 23 of the 26 variations associated to CD and IBD, while none of the HLA markers suggested in the Cleynen [15], resulted significantly associated. As expected, the most significant associations were observed for the variants in NOD2, IL23R, IRGM, TNFSF15 and MAP3K7IP1 genes.
Considering the three NOD2 major variants, the strongest association was achieved in carriers of at least one mutation. These figures remained still significant when we sorted the entire cohort of CD patients by ileal and Ileocolonic disease sites as compared to healthy controls. Carriers of at least one risk genotype of the L1007fsX (rs2066847) variant were statistically associated to ileal localization versus the colonic ones. This variant is also able to discriminate between ileocolonic and colonic sites increasing the risk in patients with ileocolonic CD. In accordance with Cleynen paper [15], NOD2 gene drives the association with both ileal and ileocolonic disease location, and L1007fsX variant remained the best marker for both sites. The smoking habits resulted only weakly associated in the group of NOD2 negative patients with ileal involvement, although smoking is one of the most replicated risk factors in CD pathogenesis [2].
A significant association was also found for the IRGM gene, that plays a role in the innate immune response by regulating autophagy in response to intracellular pathogens. Two polymorphisms of the IRGM gene were associated with CD risk in the study by the Wellcome Trust Case-Control Consortium GWA [21]. In addition, in a previous study we found these polymorphisms significantly associated with an aggressive behavior of CD [22]. In the present investigation, we confirmed the data for the two variations of the IRGM gene. In detail, for the rs1000113 variant of this gene we found a strong association with the ileal and ileocolon localization. Moreover, for the rs4958847 variant the association power was even higher in patients with ileocolonic localization. We have to acknowledge that after applying the Bonferroni correction, the significance of these associations waned off.
Of interest, the genes analyzed in this study, namely the TNFSF15 and MAP3K7IP1, proved to have a protective power for the development of CD. The original observation that the TNFSF15 gene was strongly associated with CD was provided by studying a Japanese cohort of patients and our Italian cohort [23,24]. The gene encodes for a protein named TL1A which exerts a defensive role in the gut against pathogens [25,26]. We found the rs4263839 variation of this gene to be associated to CD cohort: carriers of the 'aa' genotype appeared to be protected against the development of the disease at colonic and ileocolonic site, but not at the ileal site. On the contrary, the rs2413583 SNP of the MAP3K7IP1 gene was protective for an ileal CD. This variation was associated in CD by Franke et coll. [27]. Our investigation is the first independent confirmation of the protective power of the marker rs2413583 to development an ileal CD.
In the present investigation we typed 2 SNPs on the locus 3p21 encompassing the genes BSN and MST1. This locus resulted one of the three loci associated in the genotype-phenotype study by the International Consortium [15] and influenced the occurrence of extraintestinal manifestations in CD patients in our previously work [28]. Both markers resulted associated with CD, although only rs9858542 persisted associated with ileal localization following the Bonferroni correction; the significance was still maintained at the multiple stepwise logistic regression analysis. On the contrary, any association with SNPs of the HLA region was detected in our CD population. These data are in contrast with Cleynen study [15], where HLA markers prevailed in colonic CD. The numerically small sample size of the present study may have obscured the HLA impact on colonic CD.
The value of including all information provided by the single associated variants to CD in a genotype risk score was an essential component of this study even if the emerging results appear of marginal clinical benefit. Indeed, our genotype risk score, based on 23 single variations, proved to have a discriminant capacity for the diagnosing CD at a statistically level, even if the AUC value was only 0.70 (P = 8.6e -20 ). The AUC value increased minimally when restricting the analysis to patients with pure ileum (AUC = 0.73; P = 5.4e -13 ) or ileocolon (AUC = 0.72; P = 3.2e -14 ) involvement as compared to the control group, but the AUC value decreased in those with colonic involvement (AUC = 0.66; P = 1.4e -6 ), well in keeping with similar results from previous investigations [29][30][31][32][33].
To date, there are not available genetic markers that can be used to accurately predict the risk of developing Crohn's disease. Here we emphasize that the markers, so far identified as associated with CD, are actually more specifically linked to an ileocolonic and an ileal location of CD disease, whereas the pure colon CD remains orphan of a substantial genetic support.
In conclusion, although with the paucity of panel of loci analyzed, our findings reaffirm the existence of at least three different subgroups of CD patients, with a genetic signature distinctive to the three main CD sites. These results might have therapeutic implications in future work in the event a specific therapeutic target will be proved more effective in specific sites of the CD in respect to the other one(s).
Supporting Information S1