Identification of Novel Genetic Markers Associated with Clinical Phenotypes of Systemic Sclerosis through a Genome-Wide Association Strategy

The aim of this study was to determine, through a genome-wide association study (GWAS), the genetic components contributing to different clinical sub-phenotypes of systemic sclerosis (SSc). We considered limited (lcSSc) and diffuse (dcSSc) cutaneous involvement, and the relationships with presence of the SSc-specific auto-antibodies, anti-centromere (ACA), and anti-topoisomerase I (ATA). Four GWAS cohorts, comprising 2,296 SSc patients and 5,171 healthy controls, were meta-analyzed looking for associations in the selected subgroups. Eighteen polymorphisms were further tested in nine independent cohorts comprising an additional 3,175 SSc patients and 4,971 controls. Conditional analysis for associated SNPs in the HLA region was performed to explore their independent association in antibody subgroups. Overall analysis showed that non-HLA polymorphism rs11642873 in IRF8 gene to be associated at GWAS level with lcSSc (P = 2.32×10−12, OR = 0.75). Also, rs12540874 in GRB10 gene (P = 1.27 × 10−6, OR = 1.15) and rs11047102 in SOX5 gene (P = 1.39×10−7, OR = 1.36) showed a suggestive association with lcSSc and ACA subgroups respectively. In the HLA region, we observed highly associated allelic combinations in the HLA-DQB1 locus with ACA (P = 1.79×10−61, OR = 2.48), in the HLA-DPA1/B1 loci with ATA (P = 4.57×10−76, OR = 8.84), and in NOTCH4 with ACA P = 8.84×10−21, OR = 0.55) and ATA (P = 1.14×10−8, OR = 0.54). We have identified three new non-HLA genes (IRF8, GRB10, and SOX5) associated with SSc clinical and auto-antibody subgroups. Within the HLA region, HLA-DQB1, HLA-DPA1/B1, and NOTCH4 associations with SSc are likely confined to specific auto-antibodies. These data emphasize the differential genetic components of subphenotypes of SSc.

OR = 0.54). We have identified three new non-HLA genes (IRF8, GRB10, and SOX5) associated with SSc clinical and autoantibody subgroups. Within the HLA region, HLA-DQB1, HLA-DPA1/B1, and NOTCH4 associations with SSc are likely confined to specific auto-antibodies. These data emphasize the differential genetic components of subphenotypes of SSc.
SSc is a clinically heterogeneous disease with a wide range of clinical manifestations, ranging from mild skin fibrosis with minimal internal organ disease to severe skin and organ involvement, reflecting the three main pathological events that characterize this disease: endothelial damage, fibrosis, and autoimmune dysregulation [16]. SSc patients are classified into two clinical subgroups based on the extent of skin involvement, limited SSc (lcSSc) and diffuse SSc (dcSSc) that are associated with different clinical complications and prognoses [17]. Another SSc hallmark is the presence of disease specific and usually mutually exclusive auto-antibodies that correlate both with the extent of skin involvement and the various disease manifestations, such as pulmonary fibrosis and renal crisis [18]. The most common are DNA topoisomerase I (ATA), and anti-centromere antibodies (CENP A and/or B proteins) [19]. Each of these auto-antibodies is a marker for relatively distinct clinical subgroups of SSc, with anticentromere typically associated with limited cutaneous disease, uncommon pulmonary fibrosis, late-onset pulmonary hypertension but generally an overall good prognosis, while ATA is a marker for diffuse skin disease and clinically significant pulmonary fibrosis with a resultant poorer prognosis.
It has been observed that certain SSc clinical features and the presence of disease specific auto-antibodies vary in different countries and ethnicities [20]. This fact supports the likelihood that genetic factors may influence the different clinical features of the disease and auto-antibody subsets [19]. Furthermore, the affected members within multicase SSc families tend to be concordant for SSc-specific auto-antibodies and HLA haplotypes, thus, providing further evidence for a genetic basis for auto-antibody expression in SSc [21]. Moreover, several studies have reported that certain SSc genetic risk factors correlate with specific clinical subsets of the disease or SSc-related auto-antibodies [4,12,22,23].
In this study, we aimed to identify novel genetic factors associated with different SSc clinical and auto-antibody subsets through a stratified re-analysis of results from a previous GWAS from our group and validation in a large replication study.

Results
First, the genetic associations were tested in each of the four subgroups considered for this study (lcSSc, dcSSc, ACA positive and ATA positive) by the means of x2 tests in the GWAS data (individuals from the United States, Spain, Germany and The Netherlands), correcting the P values for the genomic inflation factor l of each subgroup (Figures S1, S2, S3, S4 and Tables S1, S2, S3, S4). We found a total of eighteen novel non-HLA loci associated in these subgroups with a P value lower than 1610 25 , seven in the lcSSc subtype, five in the dcSSc subtype, two in ACA positives and four in ATA positives. Next, we proceeded to replicate these associations in nine independent cohorts (from US, Spain, Germany, The Netherlands, Belgium, Italy, Sweden, United Kingdom and Norway). The statistically significant results observed in the replication step are shown in Table 1. The complete set of data is shown in Tables S1, S2, S3 S4.
In addition, exhaustive analysis was performed in the HLA region (megabases 28 to 34 in chromosome 6) with the GWAS data in order to find specific subgroup associations in this region. Due to the fact that most associations found herein in the MHC region have been previously described, we did not perform a replication phase of these findings. Instead, let these results be the replication for previous works. It is also noteworthy that all independent associations found within the MHC region have almost exactly the same ORs in the four GWAS cohorts separately, thus, replicating themselves.

Clinical Manifestations
In the lcSSc subtype, seven non-HLA novel loci were identified as susceptibility markers in the GWAS data (Table S1 and Figure  S1). Two out of the seven genetic markers showed evidence of association in the replication cohorts: rs11642873 near the IRF8 gene (lcSSc P = 2.  Table 1 and Table S1).
Regarding the dcSSc subtype, five non-HLA loci were found to be associated in the GWAS cohorts (Table S2 and Figure S2). Upon analyzing these five SNPs in the replication cohorts we could only replicate the association of rs11171747 in the RPL41/ ESYT1 locus (overall dcSSc P = 5.99610 28 , OR = 1.23 [1.14-1.33]) ( Figure 1, Table 1 and Table S2). However, the association found in this locus was heterogeneous among cohorts (Breslow-Day P = 5.32610 29 ).

Auto-Antibodies
The observed associations in the ACA positive subgroup and lcSSc were difficult to differentiate because of substantial overlap between these two disease subgroups. In the GWAS cohorts, SNPs in IL12RB2 and RUNX1 genes were identified as novel non-HLA loci associated with SSc patients positive for ACA antibodies (Table S3 and Figure S3). However, none of these associations could be confirmed at the replication stage. Interestingly, the SNP rs11047102 of the SOX5 gene, which was selected for replication due to its association with the lcSSc subgroup in the GWAS data, showed suggestive evidence of association with the ACA subgroup (P = 1.39610 27 , OR = 1.36 [1.21-1.52]) ( Figure 1, Table 1 and  Table S3).
In the ATA positive subgroup, four new susceptibility loci were identified in the GWAS data (Table S4 and Figure S4), none of which were confirmed in the replication phase. Since the ATA subgroup of patients has the smallest sample size, the lack of replication in any of the non-HLA locus may be due to a lower statistical power (Table S5).

HLA Region
The associations found in the HLA region in the GWAS data set showed clear differences between SSc subgroups (Figure 1, Figure 2, and Table 2). The observed effects in the lcSSc and dcSSc subtype were similar to that of the overlapping group of patients with ACA and ATA respectively, but less significantly. Therefore, we focused the analysis on antibody subgroups only.
We observed independent genetic associations in the ACA positive subgroup in the HLA region (Table 2 and Figure 1, Table  S6). The stronger independent signal was identified in the HLA-DQB1 gene of HLA class II: SNPs rs6457617 (ACA+ P = 1.99610 236 , OR = 0. 48 (Table 3).
Regarding the ATA positive subgroup, we also observed evidence of independent association in the HLA region (Table 2 and Figure 1, Table S7). We found three associations in the HLA class II region: rs3129882 in HLA-DRA (ATA+ P = 1.  (Table 3).
In addition, in the HLA class III region, the NOTCH4 gene was associated with the presence of ACA (rs443198, ACA+ P  (Table 2 and Tables S6, S7). Interestingly, SNP rs9296015 had an opposite effect size in ACA and ATA subgroup, being exclusively associated in the ATA subgroup. These two SNPs were not in LD in Caucasian populations either from the HapMap project (r 2 = 0.05 in CEU and r 2 = 0.03 in TSI) or our cohorts (r 2 = 0.1 in the combined cohorts, r 2 = 0.11 in Spanish, r 2 = 0.00 in German, r 2 = 0.00 in Dutch and r 2 = 0.01 in US), pointing to independent associations in the NOTCH4 gene with both ACA and ATA positive subgroups. All the associations ORs found in the HLA region were consistent among the four GWAS cohorts (Tables S8, S9).

Previously Described Genetic Associations
We wanted to investigate previously reported associations with subphenotypes or overall disease, such as CD247, TNFSF4, STAT4, BANK1, IRF5 and BLK in the present study's GWAS cohorts, to further establish them as SSc (or its subphenotypes) susceptibility loci. Table S10 shows the analysis of the SNPs in the previously mentioned genes which were present in our GWAS combined panel. As expected, association previously found in these six genes was replicated. Interestingly associations previously described to be confined to one of the SSc subgroups were also replicated as in the cases of TNFSF4 and lcSSc (lcSSc P = 7.70610 24

Author Summary
Scleroderma or systemic sclerosis is a complex autoimmune disease affecting one individual of every 100,000 in Caucasian populations. Even though current genetic studies have led to better understanding of the pathogenesis of the disease, much remains unknown. Scleroderma is a heterogeneous disease, which can be subdivided according to different criteria, such as the involvement of organs and the presence of specific autoantibodies. Such subgroups present more homogeneous genetic groups, and some genetic associations with these manifestations have already been described. Through reanalysis of a genome-wide association study data, we identify three novel genes containing genetic variations which predispose to subphenotypes of the disease (IRF8, GRB10, and SOX5). Also, we better characterize the patterns of associated loci found in the HLA region. Together, our findings lead to a better understanding of the genetic component of scleroderma.
found in the other subgroups. Similarly, the association found in IRF5 was stronger in lcSSc (lcSSc P = 1.64610 210 , OR = 1.50 [1.32-1.69]), although association was also found in the dcSSc, ACA+ and ATA+ subgroups.

Discussion
Systemic sclerosis (SSc) is a rare, severe, complex and heterogeneous rheumatic disease. Multiple lines of evidence suggest that genetic factors may underlie not only SSc susceptibility but also the predisposition to develop specific clinical phenotypes such as lcSSc, dcSSc subtypes and the presence of SSc-specific auto-antibodies. The discovery of genetic variants associated with specific clinical manifestations of the disease will lead to new insights regarding pathogenesis and may open novel avenues of therapy that can be targeted to specific subsets.
The aim of this study was to assess the genetic component involved in four different SSc clinical and auto-antibody   Table 1 and Table 2. Table 1 and Table 2 shows marginal effects of these SNPs while this figure presents ORs and CIs after the adjustment for the other SNPs claimed as independent for that phenotype). doi:10.1371/journal.pgen.1002178.g001 subphenotypes through an analysis of our previous genome-wide association study (GWAS) data stratified for these disease subphenotypes, together with a large, new replication study.
We have identified an association of the NOTCH4 gene with both ACA and ATA positive subgroups independent of the HLA associations. This gene is located in the MHC and encodes a transmembrane protein which plays a role in a variety of developmental processes by controlling cell fate decisions. Interestingly, NOTCH4 has been implicated in the pathways by which TGF-b induces pulmonary fibrosis [24], one of the most severe clinical manifestations of SSc [25,26]. The Notch signaling pathway also controls key functions in vascular smooth muscle and endothelial cells which may be particularly relevant to the microvascular damage seen in SSc [27]. Genetic variants in NOTCH4 also have been previously associated, independently from HLA genes or alleles, with other autoimmune disorders like diabetes type 1 [28], rheumatoid arthritis [29] and alopecia areata [30,31].
Additionally, through the analysis of the largest SSc case/ control cohort reported to date we identified three new susceptibility loci (IRF8, SOX5 and GRB10), outside the HLA/ MHC region, implicated in genetic predisposition to different SSc subphenotypes, in addition to other suggestive loci.
Type I and II interferons (IFN) are well known immunomodulators which can also regulate collagen production. Furthermore, they are believed to play a key role in the pathogenesis of SSc and other autoimmune diseases [32][33][34]. Interestingly, we found a strong association of the IRF8 gene with the lcSSc subtype and the ACA positive subgroup. IRF8 modulates TLR signaling and may contribute to the crosstalk between IFN-c and TLR signal pathways, thus acting as a link between innate and adaptive immune responses [35]. IRF8 also has been demonstrated to be a key factor in B cell lineage specification, commitment and differentiation [36]. In addition, IRF8 has been associated with another autoimmune disease, multiple sclerosis [37], although the SNP associated with multiple sclerosis (rs17445836) was not present in our study. Nevertheless, both variants are in medium LD in the CEU population of the HapMap project (r 2 = 0.51) and both associations have a protective OR for the minor allele; pointing to a dependence in the associations found in these two diseases.
The most prominent SSc specific auto-antibodies, ACA and ATA, are associated with the lcSSc and dcSSc clinical subsets, respectively [19]. The lcSSc subtype greatly overlaps with the ACA positive subgroup of patients (almost all ACA positive patients belonged to the lcSSc subtype). Similarly, the dcSSc subtype overlaps with the ATA positive group of patients. Therefore, it is difficult to determine whether some of the observed associations specifically belonged to one of the four subgroups. Such is the case of the association found with the SOX5 gene. In the GWAS data, SOX5 was associated with lcSSc as well as with the ACA positive subgroup, although the association with the lcSSc subtype was stronger than that in the ACA positive subgroup. Upon completion of the replication study with the resultant increase in statistical power, we were able to determine that the SOX5 gene was indeed a risk factor for the ACA positive group at the genome wide significance level, but not for lcSSc. The SOX5 gene encodes a member of the SOX (SRY-related HMG-box) family of transcription factors involved in the regulation of embryonic development, in the determination of cell fate, as well as in chondrogenesis [38].
Conversely SOX5, together with SOX6 and SOX9, can induce many cellular types (including melanocytes and bone marrow stem cells) into the chondrogenic pathway, leading to expression of COL2A1 and the formation of cartilage [38,39]. As stated above, IFN type I and II are inhibitors of collagen production and chondrogenesis; more precisely IFN-c (type II IFN) inhibits the COL2A1 gene which is one of the main downstream genes in the chondrogenesis pathway [40]. Taken all together, IRF8 (part of the interferon pathway and induced by IFN-c [41]) and SOX5 may be affecting the formation of the extra-cellular matrix through COL2A1 in the skin and other organs of SSc patients.
We also identified an association of the GRB10 gene with the lcSSc subtype; GRB10 codes for an adaptor protein known to interact with a number of tyrosine kinase receptors and signaling molecules and has a potential role in apoptosis regulation [42].
In dcSSc patients, the only observed genome wide significant association was with the RPL41/ESYT1 locus, although this association was heterogeneous among the investigated populations, probably due to lower statistical power in this smaller group. Three genes are relevant to this locus: RPL41, a ribosomal protein not considered to be related to the immune system; ZC3H10, a zinc finger protein related to tumour growth; and ESYT1, a synaptotagmin-like protein of unknown function. Although none of these genes has a suggestive role in the pathogenesis of SSc a priori, further studies are needed to investigate this intriguing finding.
Since most genes in the HLA region are implicated in the regulation of the immune system, it is not surprising that the HLAassociation with SSc is primarily related to auto-antibody expression. We found different patterns of independent association for the two major SSc auto-antibody subgroups across the HLA class II region. Both genetic markers located in the HLA-DQB1 locus were associated with the presence of ACA auto-antibodies in SSc patients. The allelic combination of these SNPs tags the described association of HLA-DQB1*0501 with the ACA positive subgroup of the disease [22,43]. The associations within the HLA region in the ATA positive subgroup are more complex: SNP rs3129763 (located near HLA-DRB1) tags the association of HLA-DRB1*1104, which has been described to be associated with the whole disease [22]. Furthermore, the haplotype in the HLA-DPB1 region described in Table 3, tags the HLA-DPB1*1301 also previously described [3,22]. Interestingly, the remaining independent association observed, rs3129882, is found within the HLA-DRA gene, which is much less polymorphic than the other HLA genes already mentioned; nevertheless, the association found in this SNP is tagging through the extensive LD structure of the MHC region the association of some aminoacidic positions in the nearby HLA-DQB1 gene, which has not been previously reported to be associated with the ATA positive subgroup of SSc.
In summary, taking advantage of our GWAS data and a large replication cohort, we have identified three new non-HLA loci associated with subphenotypes of SSc: GRB10, IRF8, and SOX5. In addition, we shed light on HLA associations with this disease, establishing different patterns of independent association in the ACA and ATA positive subgroups. Our findings provide evidence for genetic heterogeneity underlying the clinical and especially autoantibody subtypes of SSc. These findings may prompt reconsideration of the current classification of SSc patients; provide insight into pathogenetic pathways differing among subphenotypes, especially specific auto-antibody subgroups, and lead to novel therapeutic targets for this devastating autoimmune disease.

Subjects
For the GWAS analysis, a total of 2,296 Caucasian SSc patients and 5,171 Caucasian healthy controls were recruited through an international collaborative effort in the United States of America (USA), Spain, Germany and The Netherlands. The North American cases (initial n = 1,678; after applying quality control criteria, n = 1,486; 179 men, 1,307 women; mean age = 54.5 The initial European SSc cases came from previously established nationally representative collections of 380 Spanish, 288 German and 190 Dutch patients with SSc. As control populations, healthy unrelated individuals of Spanish (initial n = 414), German (initial n = 678) and Dutch (initial n = 643) origin were included in the study as well as 3478 controls from across the US collected as noncancer controls for GWAS studies of breast and prostate cancers in the Cancer Genetic Markers of Susceptibility (CGEMS) studies [44,45] (http://cgems.cancer.gov/data/).
In the second replication phase, a large independent replication cohort, consisting of 3,175 SSc patients and 4,971 healthy controls of Caucasian ancestry, were collected from Belgium, Spain, The Netherlands, Germany, Italy, Norway, Sweden, UK and the USA. Details on the investigated populations are provided in the Table S11.
All cases met the American College of Rheumatology preliminary criteria for the classification of SSc [46]. Furthermore, patients were classified according to the extent of skin involvement into limited (lcSSc) or diffuse (dcSSc) forms [17,47]. In addition, the presence of SSc specific auto-antibodies, anti-topoisomerase I (ATA, Anti-Scl70) and anti-centromere (ACA) was assessed by passive immunodiffusion against calf thymus extract (Inova Diagnostics, San Diego, CA, USA) and indirect immunoflourescence of HEp-2 cells (Antibodies Inc, Davis, CA, USA), respectively, in a total of 5,229 and 5,238 SSc patients respectively. Autoantibodies to RNA Polymerase III are also considered to be characteristic of SSc, but testing for this antibody is not widely available and since results were not known in almost two-thirds of our cases, this analysis was not done [18,19]. The distribution of SSc patients among these disease subsets is summarized in Table S11.
Collection of blood samples and clinical information from case and control subjects was undertaken with informed consent and relevant ethical review board approval from each contributing centre in accordance with the tenets of the Declaration of Helsinki.
Most of the individuals included in this study, GWAS and replication cohorts, have been analyzed in a previous study [15] but novel genotypes were generated in the replication cohorts for phenotype associated SNPs found in the GWAS, expanding the scope of the study.

SNP Selection for Replication
Our goal was to examine any novel genetic association specific for each subset rather than overall disease. Although partial Table 2. Independent associations identified in the HLA region with the ACA and ATA positive subgroups.  overlapping exists between lcSSc and ACA+ subgroups, and dcSSc and ATA+ subgroups; we wanted to assess whether association found in overlapped groups belonged to a subtype or an auto-antibody positive group. With that purpose we selected SNPs from the GWAS data based on the following criteria: N First, we selected all SNPs with a P value of 1610 25 or lower in each of the four considered SSc subgroups (i.e. lcSSc, dcSSc, ACA+ and ATA+) of the four GWAS cohorts (i.e. US, Spain, Netherlands and Germany).
N Since one aim of this study was to find novel genetic associations, we then ruled out every genetic association previously described in SSc (e.g. STAT4, IRF5 and the HLA region). N Finally we selected from each remaining region the best independent association (determined by conditional logistic regression) from the GWAS data.

Genotyping
The GWAS genotyping of the SSc cases and controls was performed as follows: the Spanish SSc cases and controls together with Dutch and German SSc cases was performed at the Department of Medical Genetics of the University Medical Center Utrecht (The Netherlands) using the commercial release Illumina HumanCNV370K BeadChip, which contains 300,000 standard SNPs with an additional 52,167 markers designed to specifically target nearly 14,000 copy number variant regions of the genome, for a total of over 370,000 markers. Genotype data for Dutch and German controls were obtained from the Illumina Human 550K BeadChip available from a previous study. The SSc case group from the United States was genotyped at Boas Center for Genomics and Human Genetics, Feinstein Institute for Medical Research, North Shore Long Island Jewish Health System using the Illumina Human610-Quad BeadChip. CGEMS and Illumina iControlDB controls were genotyped on the Illumina Hap550K-BeadChip.
SNPs selected for the replication phase were genotyped in the replication cohorts using Applied Biosystems' TaqMan SNP assays on ABI Prism 7900 HT real-time thermocyclers. Markers with call rates of 95% or less were excluded, as were markers whose allele distributions deviated strongly from Hardy-Weinberg (HW) equilibrium in controls (P,10 23 ).

Data Imputation
Imputation was performed in the GWAS cohorts in order to gain genome coverage for the SNP selection. Imputation was performed with IMPUTE software 1.00 as previously described [48], using as reference panels the CEU and TSI HapMap populations. However, SNP imputation did not show any new independent SNP associated at P,10 25 in the four subphenotypes considered. The imputed GWAS data in the four subphenotypes is shown in Figure S5.

Statistical Analysis
Data in the SSc GWAS cohorts was filtered as follows: Using Plink, we identified and excluded pairs of genetically related subjects or duplicates and excluded the genetic-pair members with lower call rates. To identify individuals who might have nonwestern European ancestry, we merged our case and control data with the data from the HapMap Project (60 western European (CEU), 60 Nigerian (YRI), 90 Japanese (JPT) and 90 Han Chinese (CHB) samples). We used principal component analysis as implemented in HelixTree (see Text S2), plotting the first two principal components for each individual. All individuals who did not cluster with the main CEU cluster (defined as deviating more than 4 standard deviations from the cluster centroids) were excluded from subsequent analyses. Additionally, we excluded individuals with low call rates (11 individuals from the US group, 24 from the Spanish, 1 from the German and 1 from the Dutch), relatedness (50 from the US group, 2 from the Spanish, 1 from the German and 1 from the Dutch), non-European ancestry (42 from the US group, 5 from the Spanish, 6 from the German and 4 from the Dutch) and inconsistent gender (83 from the US group, 2 from the Spanish, 2 from the German and 2 from the Dutch). Then we filtered for SNP quality, removing SNPs with a genotyping success call rate , 98% and those showing MAF , 1%. Deviation of the genotype frequencies in the controls from those expected under Hardy-Weinberg equilibrium was assessed by a x 2 test or Fisher's exact test when an expected cell count was , 5. SNPs strongly deviating from Hardy-Weinberg equilibrium (P,10 25 ) were Table 3. Allelic combination analysis of the SNPs which are in the same association locus within the HLA region for the ACA and ATA positive subgroups of SSc patients. eliminated from the study. For the combined analysis of the four datasets, the same quality controls per individual and per SNP were applied with the exception of the Hardy-Weinberg equilibrium (HWE) requirement. The genotyping success call rate on the merged dataset after all these quality filters were applied was 99.83% in the GWAS cohorts.
The replication cohorts were filtered as follows: all individuals with a SNP success call rate below 0.95 were excluded, SNPs with a per individual success call rate below 0.95 were excluded, SNPs with a HWE comparison P value below 0.001 in controls were excluded and SNPs with a MAF below 0.01 were also excluded. As a result, 18 SNPs selected for replication all were in HWE (P value . 0.001) and the overall genotype successful call rate was 96.61% and all SNPs individually had a successful call rate greater than 95%.
We performed power calculations for GWAS and replication cohorts for the whole dataset and the clinical/auto-antibodies subphenotypes according to Skol et al. [49] (Table S5). The significance level for these calculations was set at 5610 -8 .
x 2 tests were performed for allelic model for significant differences between cases and controls. Derived P values for the replication cohorts were not adjusted. All nine replication cohorts were jointly analyzed conducting Cochran-Mantel-Haenszel (CMH) tests to control for population differences. A threshold meta-analysis P value of ,0.05 for the replication phase was considered significant. We also conducted CMH meta-analysis of all the nine replication cohorts and the four cohorts previously included in the GWAS, considering a P value lower than 5610 28 as significant. Furthermore, P values in the range 5610 28 to 5610 26 were considered as suggestive associations. In all tests, odds ratios (OR) were calculated according to Woolf's method. We also applied Breslow-Day (BD) tests for all meta-analyses to check for heterogeneity in association among the investigated populations, and all associations with a P,0.05 in BD analysis were considered heterogeneous.
Due to the partial overlapping of the lcSSc and dcSSc subgroups with ACA+ and ATA+ subgroups, respectively, we wanted to test whether an association found in both overlapping groups belonged to one or the other specifically. With that purpose, all the associations in the present study claimed to belong to a group were tested for association in the correlated group (e.g. ACA associations were tested in lcSSc and vice versa) to look for the best P value. In addition, ACA and ATA hits were tested in lcSSc-ACA-and dcSSc-ATA-, respectively, to ensure group specific associations. Also, lcSSc and dcSSc were tested in ACA+-non-lcSSc and ATA+-non-dcSSc with the same purpose.
To determine independent associations in the HLA region, conditional logistic regression was carried out for all associated SNPs in the complete SSc group and the ACA and ATA positive subgroups. This analysis was carried out as implemented in Plink software, conditioning each SNP association to each of the other significantly associated (P,5610 27 ) SNPs in the corresponding LD block, controlling for the presence of the four populations as covariates. All SNPs which remained significant after conditioning were considered independent associations. All haplotype analysis was performed using Haploview software, defining the blocks by confidence intervals [50]. We only analyzed haplotypes or allelic combinations with frequencies of 1% and above.
Statistical analyses were undertaken using R (v2.6), Stata (v8), Plink (v1.07) [  Table S1 Analysis for GWAS cohorts, replication cohorts and combined analysis for all non-HLA, non-previously described associations with lcSSc subtype of the disease. {P values for GWAS cohorts are Mantel-Haenszel meta-analysis GC corrected according to the set l and in the replication and combined analysis Mantel-Haenszel meta-analysis P value. {P value for the totality of the SSc patients, in the case of GWAS cohorts GC corrected according to the set l, and in replication and combined analysis Mantel-Haenszel meta-analysis P value. (DOC)

Table S2
Analysis for GWAS cohorts, replication cohorts and combined analysis for all non-HLA, non-previously described associations with dcSSc subtype of the disease. {P values for GWAS cohorts are Mantel-Haenszel meta-analysis GC corrected according to the set l and in the replication and combined analysis Mantel-Haenszel meta-analysis P value. {P value for the totality of the SSc patients, in the case of GWAS cohorts GC corrected according to the set l, and in replication and combined analysis Mantel-Haenszel meta-analysis P value. *Association in rs11171747 had a significant BD P value, thus making them heterogenic associations among populations. (DOC)

Table S3
Analysis for GWAS cohorts, replication cohorts and combined analysis for all non-HLA, non-previously described associations with ACA positive subgroup of the disease. {P values for GWAS cohorts are Mantel-Haenszel meta-analysis GC corrected according to the set l and in the replication and combined analysis Mantel-Haenszel meta-analysis P value. {P value for the totality of the SSc patients, in the case of GWAS cohorts GC corrected according to the set l, and in replication and combined analysis Mantel-Haenszel meta-analysis P value. *Association in rs3790567 had a significant BD P value, thus making them heterogeneous associations among populations.