The Systemic Lupus Erythematosus IRF5 Risk Haplotype Is Associated with Systemic Sclerosis

Systemic sclerosis (SSc) is a fibrotic autoimmune disease in which the genetic component plays an important role. One of the strongest SSc association signals outside the human leukocyte antigen (HLA) region corresponds to interferon (IFN) regulatory factor 5 (IRF5), a major regulator of the type I IFN pathway. In this study we aimed to evaluate whether three different haplotypic blocks within this locus, which have been shown to alter the protein function influencing systemic lupus erythematosus (SLE) susceptibility, are involved in SSc susceptibility and clinical phenotypes. For that purpose, we genotyped one representative single-nucleotide polymorphism (SNP) of each block (rs10488631, rs2004640, and rs4728142) in a total of 3,361 SSc patients and 4,012 unaffected controls of Caucasian origin from Spain, Germany, The Netherlands, Italy and United Kingdom. A meta-analysis of the allele frequencies was performed to analyse the overall effect of these IRF5 genetic variants on SSc. Allelic combination and dependency tests were also carried out. The three SNPs showed strong associations with the global disease (rs4728142: P  = 1.34×10−8, OR  = 1.22, CI 95%  = 1.14–1.30; rs2004640: P  = 4.60×10−7, OR  = 0.84, CI 95%  = 0.78–0.90; rs10488631: P  = 7.53×10−20, OR  = 1.63, CI 95%  = 1.47–1.81). However, the association of rs2004640 with SSc was not independent of rs4728142 (conditioned P  = 0.598). The haplotype containing the risk alleles (rs4728142*A-rs2004640*T-rs10488631*C: P  = 9.04×10−22, OR  = 1.75, CI 95%  = 1.56–1.97) better explained the observed association (likelihood P-value  = 1.48×10−4), suggesting an additive effect of the three haplotypic blocks. No statistical significance was observed in the comparisons amongst SSc patients with and without the main clinical characteristics. Our data clearly indicate that the SLE risk haplotype also influences SSc predisposition, and that this association is not sub-phenotype-specific.


Introduction
Systemic sclerosis (SSc) is a chronic multisystem connective tissue disorder characterized by fibrotic events, vascular damage and autoantibody production. Two main clinical subtypes have been defined based on the extent of skin involvement, limited cutaneous scleroderma (lcSSc) and diffuse cutaneous scleroderma (dcSSc) [1]. Recent candidate gene and genome-wide association studies (GWASs) clearly suggest that an important genetic component underlies this disease. In this regard, an increasing number of loci have been reported to be convincingly associated with the susceptibility and clinical manifestations of SSc in the last years. However, the causal functional mutations responsible for these associations have not been unambiguously identified yet in most cases [2].
Outside the HLA region, interferon (IFN) pathway genes, which encode cytokines with critical modulatory effects on innate and adaptive immunity, have been shown to represent a key component of the genetic network leading to autoimmune processes. Interestingly, a misregulated expression of type I IFN genes, also referred to as IFN signature, have been observed in peripheral white blood cells patient subsets of several autoimmune diseases [3,4,5,6], thus suggesting that the IFN signaling plays a crucial role in autoimmunity. Indeed, multiple single-nucleotide polymorphisms (SNPs) of the IFN regulatory factor 5 gene (IRF5), a major regulator of the type I IFN induction, have been associated with different rheumatic disorders such as SSc, systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), and Sjögren's syndrome (SS) [7,8,9,10]. The IRF5 association with SLE was narrowed down to three different haplotype blocks that seem to have independent functional consequences, including 1) alteration of the protein stability, 2) creation of a donor splice site in intron 1 resulting in transcription of an alternative exon 1B, and 3) modification of the 39UTR length which affects expression levels [11]. Subsequent studies in SSc patients suggested that genetic variation within IRF5 correlate with SSc severity and survival [12,13].
Based on the above, we decided to explore whether the functional haplotype blocks described by Graham et al. [11] were also susceptibility signals affecting SSc development and progression. For that purpose, we analysed the allele frequencies of three representative IRF5 genetic variants that have been previously associated with SSc [8,14] in five large Caucasian European cohorts and performed allelic combination and dependency tests.

Study Population
We recruited a total of 3,361 SSc patients and 4,012 unaffected controls of Caucasian origin from five different European countries, including an initial cohort from Spain and four replication cohorts from Germany, The Netherlands, Italy and United Kingdom. Case and control sets were matched by geographical origin and ethnicity. Written informed consent from all participants and approval from the local ethical committees of all centres involved in the study were obtained in accordance with the tenets of the Declaration of Helsinki. All SSc patients fulfilled the classification criteria by Leroy et al. [15]. The clinical features of the different SSc cohorts are shown in Table 1. Case sets were further subdivided based on their skin involvement into limited cutaneous scleroderma (lcSSc) and diffuse cutaneous scleroderma (dcSSc) subgroups [15], and by autoantibody status according to the presence/absence of anti-centromere antibodies (ACA) or antitopoisomerase antibodies (ATA), which were detected using standard procedures. Pulmonary fibrosis (PF) was diagnosed by high resolution computed tomography (HRCT).

SNPs Selection and Genotyping
Samples were genotyped for three IRF5 tag SNPs, rs10488631, rs2004640, and rs4728142, representative of three different haplotype blocks (refers to as Groups 1-3, respectively) which have been reported to have functional roles in SLE patients [11]: Group 1 includes SNPs tagging a 30-bp in-frame INDEL variant of exon 6 that alters protein stability; Group 2 includes an exon 1B splice site variant; and Group 3 corresponds to genetic variants located in a conserved polyadenilation signal sequence that alters the length of the 39UTR, thus affecting expression levels.
Genomic DNA was obtained from peripheral blood cells using standard procedures, and genotyping was performed using TaqManH 59 allele discrimination assays (IDs: C___2691242_10, C___9491614_10, and C___2691222_10), in a 7900 HT Fast Real-Time PCR System (Applied Biosystems, Foster City, California, USA).

Statistical Analysis
The statistical power of the study was calculated with Power Calculator for Genetic Studies 2006 software (http://www.sph. umich.edu/csg/abecasis/CaTS/reference.html), which implements the methods described in Skol et al. [16]. PLINK v1.07 (http://pngu.mgh.harvard.edu/purcell/plink/) [17] was used to carried out all statistical analyses of allele frequencies. P-values were obtained by performing 262 contingency tables and x 2 test and/or Fisher's exact test, when appropriate. Since the association between IRF5 and SSc has been confirmed in several independent studies [2], we considered appropriate to set the significance threshold at P = 0.05. Odds ratios (OR) and 95% confidence intervals were calculated according to Woolf's method. Breslow-Day (BD) test method was used to estimate the homogeneity amongst populations. Pooled analyses were performed by Mantel-Haenszel test under fixed effects, or DerSimonian-Laird if the BD test reached statistical significance.
To analyse whether allelic combinations would better explained the possible association than the genetic variants independently, we compared the goodness of fit of both models using PLINK. For that purpose, we calculated the deviance (defined as 22 6 the log likelihood), which follows a x 2 distribution, to assess the significance of the improvement in fit. If statistically significant differences in the improvement of fit were observed when the haplotype effect was considered, we assumed that this model was more informative explaining the putative association.

Results
The overall statistical power of the study, based on previous IRF5 reports, to detect associations with OR = 1.2 at 0.05 significance level was 100% for rs2004640 and rs4728142, and 93% for rs10488631 (Table S1 in File S1). Additionally, no significant departure from Hardy-Weinberg equilibrium was observed either in cases or controls in each analysed population (P = 0.05).

Allele Test
The results of the global analyses of the discovery cohort and the four independent replication populations separately are shown in Table S2 in File S1. Since the Breslow-Day test evidenced no heterogeneity of the ORs amongst the different cohorts (P = 0.05), a combined meta-analysis was performed to test the overall effect of the IRF5 genetic variants in the whole dataset ( Table 2). The pooled analysis showed that the three SNPs were strongly associated with the global disease (rs4728142: P = 1.34610 28 , OR = 1.22, CI 95% = 1.14-1.30; rs2004640: P = 4.60610 27 , OR = 0.84, CI 95% = 0.78-0.90; rs10488631: P = 7.53610 220 , OR = 1.63, CI 95% = 1.47-1.81). Highly significant P-values were also yielded when the different phenotype subgroups were compared against the control population ( Table S3 in File S1). However, no statistical significance was observed in the comparisons amongst SSc patients accordingly with the presence/absence of the different clinical characteristics and autoantibody profile ( Table 2)

Conditional Logistic Regression
We decided to perform pairwise conditioning analyses to test whether there could be any dependency amongst them ( Table 3). The analysis showed that every SNP maintained its statistical significance after conditioning to the other two except for rs2004640, which was dependent of rs4728142. The moderate linkage disequilibrium between them (r2,0.68) could explain this fact ( Table S4 in File S1).
When comparing the haplotype model with the independent SNP model, we observed a statistically significant improvement of the goodness of fit compared to rs4728142 (likelihood P-value = 1.23610 217 ), rs2004640 (likelihood P-value = 1.94610 219 ), or rs10488631 (likelihood P-value = 1.48610 24 ) individually.
On the other hand, we also performed a sub-phenotype analysis of allelic combinations to test whether some haplotype could influence a specific clinical condition ( Table S5 in File S1). This analysis only showed a residual P-value for a low frequency haplotype in the PF+/PF-comparison (rs4728142*G-rs2004640*T-rs10488631*T: P = 0.041, OR = 1.28, CI 95% = 1.02-1.61). The rest of allelic combinations did not reach statistical significance in any other comparison.

Discussion
GWAS data have confirmed IRF5 as one of the strongest associated signals with SSc [19,20]. In addition, it has been proposed that particular IRF5 functional genetic elements contribute to SLE pathophysiology through their relationship with auto-antibodies and IFNa production [21,22]. These data indicate that this gene may represent a crucial member of the genetic component underlying this type of autoimmune diseases [23].
Previous published data suggested that two different IRF5 haplotypes influence specific SSc phenotypes. It was hypothesised that these haplotypes may explain a possible IRF5 association with dcSSc and PF, likely by tagging an intronic 5-bp biallelic insertiondeletion polymorphism (INDEL), which would represent the real causal functional variant [12]. However, our results are not in agreement with this idea, since we did not find evidence for a specific genetic association between IRF5 and any of the major clinical manifestations, despite the fact that two out of the three genetic variants comprising the previously described risk haplotype were covered in our analysis (rs2004640 and rs10954213 that is highly correlated with rs4728142). A similar discrepancy was also observed by Sarif et al. [13], who failed to replicate the IRF5 haplotype effect on PF described by Dieudé et al. [12]. Our data are, however, consistent with recent GWAS follow-up studies that did not show a phenotype-specific association of IRF5 with SSc, but a clear association with the overall disease [14,24]. It should be noted that one of the SNPs included in this study, rs4728142, has been shown to correlate with longer survival and a milder pulmonary involvement in SSc patients [13]. Taking this together with our results, it could be speculated that, although the risk variants of IRF5 do not predispose to develop PF, they may influence the severity of some clinical features like PF. In any case, it is important to note that whereas antibody profile and disease subtypes are clearly a dichotomous outcome, PF can range from mild-stable to severe-progressive involvement (and the utilised approach for definition of PF does not differentiate the different severity scales of this disease manifestation) [25].  As stated before, the tag SNPs analysed here are representative of three haplotype blocks that have been reported to affect the function of the protein in different ways, including production of an alternative spliced isoform, alteration of polyadenylation sites that leads to a shorter messenger RNA, and reduction of protein stability [11]. These three polymorphisms showed strong association signals in our study, supported by a high statistical power. Therefore, the functional alterations caused by their risk alleles may be also of high relevance in the pathogenic mechanisms that lead to SSc. However, the rs2004640 signal was dependant of that from rs4728142 in our study cohort. Hence, rs2004640 might not be an independent SSc susceptibility locus although it is functionally relevant, which suggests that not all functional variants in a determined risk gene may play an independent role in the associated disease. In any case, as described in SLE [11], we observed a significant additive effect amongst the three analysed SNPs because the haplotypes containing both the risk and protective alleles better explained the association between this locus and SSc. Hence, although no functional studies have been carried out yet to unmask the possible implication of the IRF5 risk alleles in the SSc pathophysiology, it is likely that each one of the protein alterations described above influence the development of SSc individually, and that carrying all the three risk alleles results in a critically reduced protein function that highly increases SSc susceptibility.
In conclusion, this study clearly shows that a haplotype of three different functional genetic variants within the IRF5 region confer susceptibility to SSc. The fact that this association is shared with SLE adds another piece of evidence to the common genetic background of both diseases, and provides a new perspective for the study of the type I IFN pathway and its implication in the development of autoimmune conditions.

Supporting Information
File S1. Table S1. Overall statistical power of the study for each analysed IRF5 genetic variant at the 5% significance level. Table S2. Independent analyses of IRF5 genetic variants in Caucasian SSc patients and unaffected controls from Europe. Table S3. Meta-analysis of IRF5 genetic variants comparing the main clinical phenotypes with unaffected controls. Table S4. Linkage disequilibrium structure of the IRF5 region analysed in this study.