Genotype/Phenotype Analyses for 53 Crohn’s Disease Associated Genetic Polymorphisms

Background & Aims Recent studies reported a role for more than 70 genes or loci in the susceptibility to Crohn’s disease (CD). However, the impact of these associations in clinical practice remains to be defined. The aim of the study was to analyse the relationship between genotypes and phenotypes for the main 53 CD-associated polymorphisms. Method A cohort of 798 CD patients with a median follow up of 7 years was recruited by tertiary adult and paediatric gastroenterological centres. A detailed phenotypic description of the disease was recorded, including clinical presentation, response to treatments and complications. The participants were genotyped for 53 CD-associated variants previously reported in the literature and correlations with clinical sub-phenotypes were searched for. A replication cohort consisting of 722 CD patients was used to further explore the putative associations. Results The NOD2 rare variants were associated with an earlier age at diagnosis (p = 0.0001) and an ileal involvement (OR = 2.25[1.49–3.41] and 2.77 [1.71–4.50] for rs2066844 and rs2066847, respectively). Colonic lesions were positively associated with the risk alleles of IL23R rs11209026 (OR = 2.25 [1.13–4.51]) and 6q21 rs7746082 (OR = 1.60 [1.10–2.34] and negatively associated with the risk alleles of IRGM rs13361189 (OR = 0.29 [0.11–0.74]) and DEFB1 rs11362 (OR = 0.50 [0.30–0.80]). The ATG16L1 and IRGM variants were associated with a non-inflammatory behaviour (OR = 1.75 [1.22–2.53] and OR = 1.50 [1.04–2.16] respectively). However, these associations lost significance after multiple testing corrections. The protective effect of the IRGM risk allele on colonic lesions was the only association replicated in the second cohort (p = 0.03). Conclusions It is not recommended to genotype the studied polymorphisms in routine practice.

CD is a heterogeneous disorder with different clinical presentations. The disease may occur at any point from childhood to old age, and some CD cases require surgery and/or immunosuppressive therapy while others are characterized by rare relapses that are easily treated with anti-inflammatory drugs. Complications such as severe colitis, strictures and fistulas are common but not constant. Finally, the responses to treatments vary between patients. Unfortunately, to date, only limited parameters are available to define the early clinical course of CD [22]. As a result, some patients can be undertreated while others can be exposed to the side effects of drugs without clear benefit.
Genetic factors can be regarded as good candidates for classifying patients in terms of disease location, severity, complications, extra-intestinal manifestations and drug response/toxicity. The impact of NOD2 mutations has been extensively studied and its associations with a young age of onset, an ileal location and complicated behaviours are well established. However, NOD2 status is not sufficient by itself to influence clinical practice [23,24]. Less consistent associations have been reported for other CD susceptibility genes, but again no recommendations have been formulated (for a review see [24]). In contrast, a few studies have investigated a large number of CD susceptibility alleles with special attention being paid to clinical items able to impact upon clinical practice. Henckaerts et al. identified positive associations between rs1363670 (close to IL12B), rs12704036 (in a gene desert region) and rs 6908425 (CDKAL1) polymorphisms and disease behaviour [25]. However, these results remain to be confirmed. The aim of this study was to further investigate 53 CD-associated variants in a large cohort of CD patients with detailed medical records in order to determine the genotype/phenotype relationships.

Ethic Statements
The study received approval from the French national ethic committee (Hôpital Saint Louis, Paris, France) and all participants signed an informed consent form.

Patients
Six paediatric and adult gastroenterology tertiary centres recruited 798 CD cases as defined by the Lennard-Jones criteria [26]. The patients were included if the diagnosis of CD was made at least one year before inclusion and if they had attended continuous follow-up visits in the reference centres. More than 94% of the patients included had European origins while the others mainly originated from North Africa. The replication cohort consisted of 722 familial CD cases recruited through a European consortium [3]. A panel of 960 healthy blood donors without personal or familial history of inflammatory disorders [27] was also genotyped to evaluate the association of the studied SNPs with CD in the French population.

Phenotypic Data Recorded
Clinical, endoscopic, radiological and histological data were retrospectively collected on a standardized questionnaire by clinical research assistants, validated by the referring expert gastroenterologists of the patients and reviewed by the data managers of the study. The load factor of each item was up to 95%. In order to validate the quality of the data, the outlier values were searched for and verified and a sample of 200 questionnaires was checked twice. The items recorded included sex, date of birth, smoking habits (patients were classed as smokers in the case of any smoking habit) and presence of granulomas. Familial history of Inflammatory Bowel Diseases (IBD) was defined by a reported diagnosis of IBD in one or more first or second degree relatives.
The involvement of the digestive tract was registered at diagnosis and at the end of follow-up (cumulative locations) for the esophagus, stomach, duodenum, jejunum, proximal ileum, distal ileum, colon and rectum. Involvement was defined by macroscopic lesions. Disease behaviour' was classified according to the Montreal classification at last follow-up (B1: non-stricturing, non penetrating disease; B2: stenosing behavior, B3 penetrating disease excluding perianal disease).
The use, response and side effects were recorded for corticosteroids, azathioprine/6-mercapopurine, methotrexate, infliximab and enteral feeding. Steroid exposure was classified as being mild (less than 2 months per year), moderate or frequent (more than 6 months per year). Patients who relapsed when the steroid dosage was tapered or within 3 months after treatment ended were defined as steroid dependent. There is no consensus definition of steroid resistance [28] and patients who showed no response after 2 weeks of full steroid doses (at least 1 mg/kg/d in children and 60 mg/d in adults) were defined as corticosteroid resistant. For the other drugs, treatment failure was determined by the IBD gastroenterologist. Indications of surgery were classed as follows: penetrating complications, stenosing disease, failure of medical treatment and anal surgery. Total gut resection was classified as major resection (total colectomy or small bowel resection .50 cm), limited resection or no resection. Proctologic surgery included all surgical procedures for perianal CD performed under general anaesthesia and with therapeutic actions.
Malnutrition (defined by a loss of two standard deviations on the weight curve in children or by the need of artificial nutrition in adults), bleeding requiring blood transfusion, severe colonic attacks and extra-intestinal manifestations were noted. Patients were classified as never hospitalized (excluding the initial diagnosis procedure and single day hospitalizations), frequently hospitalized (more than once a year) or intermediate. The evolution types were defined as frank relapses and remissions; chronic continuous evolution and other.

Genotyping
All patients and controls were genotyped for 53 reported CD susceptibility Single Nucleotide Polymorphisms (SNPs, Table 1) using an AB17900HT Sequence detection system Illumina GoldenGate assay (Illumina, San Diego, CA) by the Centre National de Génotypage (CNG, Evry, France) or by the Integragen company (Evry, France). This set of SNPs was retained on the basis of the available literature and corresponded to the 50 SNPs with the highest published OR and the three main NOD2 associated variants. The genotyping rate of each SNP was higher than 90% (Table 1). All genotyped SNPs were in Hardy-Weinberg equilibrium in patients and in controls. No major discrepancies were observed between our calculated allele frequencies and those published in the SNP database. For rs916977, we observed a minor allele frequency of 0.25, whereas the published estimates varied between 0.13 and 0.42. The at-risk alleles were defined as the alleles previously associated with CD in the literature. For clarity, the nucleotides defining the risk alleles are indicated in table 1 and between brackets in the text.

Statistical Analyses
Qualitative variables were described in percentages and quantitative variables were described by their median with interquartile ranges (Q1-Q3). Comparisons of qualitative variables were performed using the Chi-square or Fisher's exact tests (when n,5 in the x 2 contingency table). Comparisons of medians were performed using the Kruskal-Wallis or Mann-Whitney tests. The Odds-Ratio (OR) values were calculated using the logistic regression method in the univariate and multivariate analyses. Multivariate analyses took into account all of the risk factors associated (P,0.05) with the item studied in the univariate analyses, including the NOD2 alleles. The cumulative incidences of first drug prescriptions and first surgery were drawn on Kaplan-Meier curves and compared using the log-rank test. The tests were done for each (at-risk or protective) allele of each tested marker corresponding to a recessive/dominant model of inheritance. For NOD2, the three rare alleles (corresponding to independent mutations) were also analyzed jointly. Statistical analyses were performed using STATA 10 statistical software (Stata Corporation, College Station, Texas, USA). The power of the cohort to detect an association depends on the respective frequencies of the sub-phenotypes tested and of the risk alleles in the subgroups compared. As an example, the cohort was powerful enough to detect an association with an OR of 1.5 between complicated and inflammatory behaviours for risk allele frequencies ranging from 0.3 to 0.7 in the reference group (a = 0.05; ß = 0.8). We report here a comprehensive overview of the most relevant positive tests with a nominal p-value lower than 0.05, with special attention paid to items exploring the Montreal classification, the responses to treatments and severity of the disease. For this study, we explored many phenotypic items for 53 markers and we thus performed several hundred statistical tests. These tests were not always independent but the coefficient for applying the Bonferroni correction needed to be higher than 500.
Under these conditions, only strong associations could remain significant. For this reason, associations with nominal p-values lower than 0.05 were further tested in the replication cohort no matter what their corrected P-values.

Case-control Analyses
The allele frequencies of the 53 SNPs tested were compared between cases and controls. We confirmed an association between CD and 26 independent SNPs ( Table 1). As expected, the most significant associations were observed for the CD susceptibility alleles with the highest reported OR, i.e. the NOD2 mutations and the rare IL23R protective allele.

Sex, Family History, Tobacco Use and Age of Onset
The description of the exploratory cohort is shown in Table 2 and Table S1. The median duration of follow-up was 7 years (Q1-Q3: 4-12.5). The significant results are summarized in Table 3. The sex ratio was not altered by the SNPs tested. The IL23R protective (A) allele and the NOD2 (C) risk allele rs2066845 were associated with a positive family history of inflammatory bowel disease (Table 3). No differences were observed between smoking groups, arguing against a gene-environment interaction. As previously reported, patients with at least one NOD2 variant had an earlier onset of disease (Table 3). No relationships were found between age at onset (or at diagnosis) and any other SNP.

Disease Location
For NOD2, the risk alleles rs2066844 (T) and rs2066847 (C) were associated with the involvement of the distal ileum. Patients carrying at least two NOD2 mutations and with pure colonic disease were extremely rare (n = 3). The risk allele (G) of PTPN22 (rs2476601) was associated with ileal lesions. Colonic disease (including rectum) was associated with the risk alleles of IL23R (G) and the chromosome 6q21 locus (T) and with the protective alleles of IRGM (T) and DEFB1 (A). Analyzes performed on the bases of the Montreal classification system for disease location confirmed the associations obtained for each anatomical site but with lower P-values. After multivariate analysis, only IL23R and DEFB1 remained associated with colonic disease. None of the at-risk alleles were associated with the presence of granulomas.

Behaviour
As previously reported, patients with two NOD2 mutations more frequently had non-inflammatory disease behaviour at diagnosis compared to patients with wild-type NOD2. The risk alleles of ATG16L1 (G), IRGM (C) and DEFB1 (G) were also associated with a non-inflammatory behaviour (B2+B3). These associations remained significant after multivariate analysis, suggesting that these genes acted independently to modulate disease behaviour.

Medications
It was found that 52% of the patients were steroid-dependant and 12.5% were cortico-resistant. No associations were found with time of first steroid therapy, response to treatment or steroid exposure. The patients received an immunomodulatory treatment in 74% of cases. Patients who carried the NOD2 protective allele rs2066847 (no insertion) (respectively rs2066845, (G)) received azathioprine treatment (respectively methotrexate) earlier (p: 0.04 respectively p: 0.03). Nevertheless, these associations disappeared when the three NOD2 mutations were taken into account. The CD risk alleles of the CCNY (A), CDKAL1 (C) and 10q21 (C) loci were weakly associated with a better response to immunosupressors

Surgery
The time of first non proctologic surgery did not depend on any of the SNPs tested except for the IL23R risk allele (G) (log-rank, p = 0.031; Fig. 1). When the analyses were performed on the subgroup of patients with pure colonic disease, the IL23R protective allele (A) was also predictive of an earlier surgery (logrank, p = 0.007). Patients who carried two NOD2 mutations had a less frequent incidence of perforating perianal disease (p = 0.001) but they were more frequently operated on for penetrating or occlusive disease. However, after adjustment for ileal location, these latter associations did not remain significant.

Complications
Malnutrition was observed in 8.8% (n = 69) of patients at the time of diagnosis and in 10.7% (n = 84) at the end of follow-up. Patients who carried two NOD2 mutations were more frequently exposed to this complication at diagnosis. This association was restricted to patients with ileal disease (9% for wild-type homozygotes vs. 28.3% for mutated patients, p = 0.0001) and remained significant in the subgroup of adult-CD onset. Severe colonic attacks were less frequent in patients who were homozygous for the at-risk alleles of IL23R (1.5% for genotype AA vs. 8.3% for genotype AG, p = 0.025) or DEFB1 (6.3% for genotype GG vs. 0% for genotype AA, p = 0.005). No consistent associations were found with arthritis, arthralgias, mouth ulcers, cutaneous or ocular manifestations, ankylosing spondylitis, psoriasis or primary sclerosing cholangitis. No association was found with evolution type or frequency of hospitalization.

Replication Study
As anticipated, none of the above reported associations remained statistically significant after multiple testing corrections.
Therefore, we tested their relevance in the replication cohort. Because the impact of NOD2 mutations on disease presentation was extensively studied previously (including patients from the replication cohort [28]), we only focused on the other 50 genetic markers. The exploratory and replication cohorts were not completely the same ( Table 2). The main differences can be explained by the fact that the replication cohort included older patients (date of birth Q1-Q2- Q3: 1954-1967-1975 versus 1963-1972-1980, p,0.0001) with, on average, a longer follow-up (11 years). For these patients, the clinical use of immunosuppressants and biotherapies was less generalized. When comparing the patients within this cohort, we only confirmed that the risk allele (C) at rs13361189 of IRGM was less frequently encountered in cases of colonic disease at onset (p = 0.03).

Discussion
Genetic studies have recently identified a large number of susceptibility genes that play a role in the predisposition to CD. The aim of this work was to assess the clinical utility of these genetic associations in routine practice. This is an important issue for the development of personalized medicine in which the genetic profile of an individual patient would help to choose optimal treatment strategies.
We analysed the clinical course of 798 CD patients from referring paediatric and adult gastroenterology centres in detail. These centres treat patients with the most severe form of the disease, as shown by the comparison between the description of the current cohort and population-based studies [29]. For example, whereas approximately only half of the patients with CD received steroid therapy at some point in the disease course and a third had steroid dependency in the population-based studies, 90% of the patients in our cohort were treated by steroids and more than 50% were steroid dependant [29]. The participant centres were used to follow the international guidelines for CD management. However, differences between therapeutic practices were likely present considering that we saw differences in the proportion of patients having received steroids, immunosupressors or surgery. However, if this heterogeneity between centres may affect disease behaviour, its impact on the here tested genotype/ phenotype relationships is difficult to measure.
We studied the CD susceptibility alleles available at the time of genotyping and corresponding to the 50 alleles with the highest OR (in addition to the most common NOD2 alleles). As shown by the case-control study, a large number of the alleles tested were positively associated with CD in our French cohort of patients, reinforcing their role in CD susceptibility. However, some alleles were not found to be associated with CD. This likely reflects the limited power of our case-control study when compared to the large meta-analyses required for identifying associations with the studied SNPs. However, some previously published CD-susceptibility alleles were not replicated, even in large cohorts of patients [19], suggesting that, in some cases, these SNPs do not indicate susceptibility to CD in all patient samples. In contrast, new SNPs have recently been added to the long list of CD susceptibility alleles [19] and additional SNPs will certainly follow. This work is thus limited by the knowledge available at the time of designing the study. However, it explored a panel of markers large enough to be representative of CD susceptibility genes. This panel contains the alleles that exhibit the strongest associations with CD. Another limitation of the study is that, except for NOD2, IL23R, IRGM and ATG16L1, CD-causing mutations have not yet been firmly established and thus the ''genetic markers'' tested might indirectly reflect the true causative alleles of the biological effects. However, recent in-depth sequencing of the best candidate genes does not argue for additional mutations with a larger effect in the studied candidate genes [30,31].
The first cohort was exploratory in nature. It was used to search for putative associations that could be relevant for clinical practice, and many items of disease presentation were explored. Under these conditions, the power of the cohort to detect relevant associations should be questioned. The cohort was comparable to the cohorts followed in medium-sized adult and paediatric IBD centres. It was thus supposed to be a good tool for exploring what is relevant for ''real-life''. In terms of power calculations, the cohort was large enough to detect an OR as low as 1.5 for the most common sub-phenotypes. This is in the range of what is expected to have a clinical impact. However, it is noteworthy that for less frequent sub-phenotypes (e.g. cancer or some extraintestinal manifestations) and/or the less frequent polymorphisms, larger cohorts are required to efficiently explore this matter. In those situations, specific works focusing on specific genotype/ phenotype relationships will be required, likely through large international consortia.
The exploratory cohort contained mainly Caucasian people from Europe (94%) or North Africa. Genetic heterogeneity may affect case-control studies. It is less clear that it may also affect genotype/phenotype correlation studies looking for phenotype modulating alleles. However, we performed the main analyzes again, excluding the patients with non-European ancestry. These analyzes did not significant change our conclusions (data not shown).
As previously published, the three NOD2 SNPs were associated with ileal location and young age at onset [23,[32][33][34][35]. However, we did not significantly extend the spectrum of NOD2 associateditems, confirming the conclusion that NOD2 genotyping only has a limited impact in routine practice [25]. Among the other 50 CD susceptibility alleles studied, only a few of them were associated with some of the clinical items in the first cohort. Considering that NOD2 is the CD susceptibility gene with the strongest effect on the phenotype, this observation suggests that genetic markers with a more limited role in CD risk may also have a limited impact on clinical presentation.
Even if some nominally significant associations were found with the first cohort, their number, nature and strength did not argue for their usefulness in clinical practice. In addition, after multiple testing corrections, none of these associations remained significant. Consequently, they would be seen by chance only. However, to better understand the significance of tests with a nominal P-value ,0.05, we used a replication cohort. The replication cohort had a power comparable to the exploratory cohort but it contained familial cases only while the exploratory cohort mainly contained sporadic cases. Noteworthy, the genetic predisposition to familial and sporadic CD is the same with only limited differences observed for NOD2 and IL23R (see above). In addition, there is no reason to suppose that genes involved in the modulation of CD phenotype are different in sporadic and familial CD. Thus the impact of the differences between cohorts, if any, should be limited. As a final result, we concluded that, individually, the newly identified risk alleles associated with CD do not notably contribute to the definition of clinical subgroups of patients.
In the literature, the ATG16L1 risk allele has been inconsistently associated with ileal location, penetrating diseases and early onset [36][37][38][39][40][41][42]. We found a non-replicated association between ATG16L1 and complicated disease behaviours. A modest association between rs4958847 of IRGM -which is partially correlated with rs13361189-and fistulizing behaviour/perianal fistulas has been reported [43]. The current study reports an association between the at-risk allele of rs13361189 with disease behaviour (in the first cohort) and an association between its protective allele and colonic disease at onset (in both cohorts). The last finding is in accordance with recent publications [44] but not with other ones [45,46]. It is thus difficult to definitively retain it. Finally, even if true, this association would have only a limited impact in practice.
Most studies failed to show an association between IL23R and CD subphenotypes [24,41,[47][48][49][50][51]. We found here positive associations in the first cohort but failed to reproduce them in the replication cohort. An association between the risk allele of rs11362 located in the 59-UTR of DEFB1 and colonic location has been published [14]. In the first cohort the DEFB1 protective allele was inversely associated with colonic location and severe colonic relapse, whereas it was positively associated with complicated behaviours. Finally, in a previous comparable exploratory study performed on 875 CD patients, Henckaerts et al. reported associations between rs1363670 at the IL12B locus and a stricturing behaviour; between rs12704036 on chromosome 5q and an early penetrating behaviour and between rs6908425 in CDKAL1 and perianal fistulas [25]. The associations obtained here were not seen in this former study while we failed to replicate Henckaerts' results. As a whole, the comparison of our data and the literature further confirms the fact that associations between CD risk alleles and clinical sub-phenotypes are inconsistent (except for NOD2).
The prediction of responses to treatment is an important issue for the clinician. Unfortunately, no associations between the SNPs tested and responses to treatment and/or side effects could be obtained. The prediction of disease severity (at its best at the time of diagnosis) would also be welcome in order to propose personalized therapeutic options and to avoid rapid disease progression. As an example, a top-down strategy could be proposed to patients who are genetically at risk of developing a disabling disease while other patients with a lower risk of developing a severe course could be treated with the classic stepup strategy [52]. The definition of a severe or disabling disease is not consensual and there is a lack of validated parameters for exploring this issue [52,53]. We thus explored a large number of clinical parameters including disease behaviour, the presence of severe colonic attacks, malnutrition, extra-intestinal manifestations, time and indications of surgery, cumulative bowel resection, and the time and use of different medications, amongst others. We also looked at the type of evolution and the frequency of hospitalization. Finally, we approached this question using a visual analogue score of severity provided by the referring clinicians of the patients (data not shown). No matter what parameter was tested, we failed to identify a relevant association between severe outcome and any of the CD susceptibility genes.
If a single allele does not predict the phenotype, it is possible that a combination of genetic variants could impact disease clinical presentation. With the exception of NOD2 variants, no allele dosage effects were observed for any of the allele tested. The exact mechanisms by which the CD susceptibility genes contribute to this disease is not known, but many of these genes are involved in two main biological functions: i) innate immunity, including bacterial recognition and killing and ii) the Th17 pathway and inflammation. It is thus tempting to imagine an epistatic interaction between the genes involved in the same (respectively complementary) biological functions. We tested this hypothesis using the logistic regression method but we failed to identify an epistatic interaction between the genetic variants involved in innate and/or adaptive immunity in the main disease subphenotypes (data not shown). This negative result may reflect a lack of statistical power but it is in accordance with other studies that also failed to find gene-gene interactions in the susceptibility to CD [33,36,37,40,47].
It is worth noting that if CD-causing genes do not seem to play a key role in the clinical presentation of CD, the possibility that other genetic factors may contribute towards modulating the clinical presentation of the disease cannot be excluded. Indeed, disease-modifier genes might be different from disease-causing genes. A re-analysis of the large genome-wide association studies taking into account sub-phenotype classifications of the patients will help to resolve this important issue in the development of personalized medicine.

Supporting Information
Table S1 Additional characteristics of the exploratory cohort of CD patients (DOC)