A Combination of CD28 (rs1980422) and IRF5 (rs10488631) Polymorphisms Is Associated with Seropositivity in Rheumatoid Arthritis: A Case Control Study

Introduction The aim of the study was to analyse genetic architecture of RA by utilizing multiparametric statistical methods such as linear discriminant analysis (LDA) and redundancy analysis (RDA). Methods A total of 1393 volunteers, 499 patients with RA and 894 healthy controls were included in the study. The presence of shared epitope (SE) in HLA-DRB1 and 11 SNPs (PTPN22 C/T (rs2476601), STAT4 G/T (rs7574865), CTLA4 A/G (rs3087243), TRAF1/C5 A/G (rs3761847), IRF5 T/C (rs10488631), TNFAIP3 C/T (rs5029937), AFF3 A/T (rs11676922), PADI4 C/T (rs2240340), CD28 T/C (rs1980422), CSK G/A (rs34933034) and FCGR3A A/C (rs396991), rheumatoid factor (RF), anti–citrullinated protein antibodies (ACPA) and clinical status was analysed using the LDA and RDA. Results HLA-DRB1, PTPN22, STAT4, IRF5 and PADI4 significantly discriminated between RA patients and healthy controls in LDA. The correlation between RA diagnosis and the explanatory variables in the model was 0.328 (Trace = 0.107; F = 13.715; P = 0.0002). The risk variants of IRF5 and CD28 genes were found to be common determinants for seropositivity in RDA, while positivity of RF alone was associated with the CTLA4 risk variant in heterozygous form. The correlation between serologic status and genetic determinants on the 1st ordinal axis was 0.468, and 0.145 on the 2nd one (Trace = 0.179; F = 6.135; P = 0.001). The risk alleles in AFF3 gene together with the presence of ACPA were associated with higher clinical severity of RA. Conclusions The association among multiple risk variants related to T cell receptor signalling with seropositivity may play an important role in distinct clinical phenotypes of RA. Our study demonstrates that multiparametric analyses represent a powerful tool for investigation of mutual relationships of potential risk factors in complex diseases such as RA.


Introduction
The aim of the study was to analyse genetic architecture of RA by utilizing multiparametric statistical methods such as linear discriminant analysis (LDA) and redundancy analysis (RDA).

Results
HLA-DRB1, PTPN22, STAT4, IRF5 and PADI4 significantly discriminated between RA patients and healthy controls in LDA. The correlation between RA diagnosis and the explanatory variables in the model was 0.328 (Trace = 0.107; F = 13.715; P = 0.0002). The risk

Introduction
Genetic factors have a substantial role in development of rheumatoid arthritis [RA] accounting for 50-60% of disease susceptibility [1]. For the past four decades, the strongest genetic association with RA has been attributed to human leukocyte antigen (HLA) region at chromosome 6p21, particularly to HLA-DRB1 locus [2]. Recently, 101 non-HLA loci have been confirmed in trans-ethnic meta-analysis of RA [3]. In the population-specific genetic risk model, the 100 RA risk loci outside of the major histocompatibility complex (MHC) region [4] explained 5.5% and 4.7% of heritability in Europeans and Asians, respectively. Lately, RA has been divided into two clinical phenotypes based on the presence or absence of rheumatoid factor (RF) and antibodies against citrullinated proteins (ACPA) [5,6]. These two clinical subtypes appear to have distinct genetic aetiologies [7]. Significant differences have been found in frequency of risk alleles in the HLA region and in PTPN22, CCR6, CD40, RASGRP1 and TAGAP genes between ACPA-positive and ACPA-negative RA patients [8,9].
Traditionally, genetic markers have been considered independent risk factors in majority of studies in RA. Although, this univariate approach has been successful in identifying alleles with relatively strong associations with the disease or its subtypes, interactions occurring in complex biological systems can be overlooked [10]. It remains unclear whether or not a combination of known genetic loci confers higher risk for RA development, clinical outcome or response to therapy compared to their simple additive effects. To solve this kind of question, multiparametric approaches may represent a potential tool enabling analysis of complex relationships such as those in the multifactor RA pathogenesis. The multiparametric methods have been used mainly in studies investigating predictive genetic tests in RA. In a pioneering study, McClure and colleagues found that a combination of five confirmed risk loci significantly increased an association with RA compared to the presence of any risk allele alone [11]. Subsequently, several other reports outlined predictive models for RA using HLA alleles, SNPs and clinical factors generating an aggregate weighted genetic risk score formed from the product of individual-locus odds ratios (ORs) [12,13]. Recently, validated environmental factors such as tobacco smoking and gene-environment interactions were added to the RA risk modelling [14,15]. These studies demonstrate that combining risk factors has a potential to provide a clinically relevant prediction with respect to disease onset [15]. The receiver operating characteristic curve analysis was adopted in studies to evaluate the performance of predictive genetic testing [16]. Various other methods have been used to combine multiple predictors for the ROC curve analysis. Among these, the most commonly used have been the allele counting methods and logistic regression [17,18]. In order to elucidate the genetic architecture of RA, the main goal of our study was to study interactions of known genetic risk factors with serologic and clinical parameters by utilizing multiparametric statistical methods: the multivariate linear discriminant analysis (LDA) and the redundancy analysis (RDA). These multivariate ordination analyses have been already used in genome-wide association studies for correction of population stratification [19,20]. The LDA and, in particular RDA, enables to quantify dependence between two groups of variables; independent and dependent (e.g. genes and clinical parameters) compared to the principal component analysis. Unlike regression methods, the LDA and RDA allow working with all variables with unknown correlations among them, reducing risk of finding false positive result based on a specific sample selection instead of on real differences in the whole population. Also, interpretation of such results is substantially improved, as not only relations between dependent and independent variables but also inside these groups are detected.

Patients and Controls
A total of 1393 volunteers, 499 patients with RA and 894 healthy controls were included in the study. All RA patients fulfilled the 2010 ACR-EULAR classification criteria for RA [21]. The patients were recruited from the National Institute of Rheumatic Diseases, Piestany and from ten outpatient rheumatology clinics in Slovakia. Control subjects were recruited from collaborating institutes and 485 DNA samples of geographically-and sex-matched healthy individuals were obtained from DNA bank of the Slovak Medical University in Bratislava. All subjects gave informed written consent and the study was approved by the Ethics Committee of the National Institute of Rheumatic Diseases, Piestany, Slovakia in agreement with the ethical guidelines of the Declaration of Helsinki as revised in 2000. Disease activity score 28 (DAS28) was calculated from number of swollen and tender joints and erythrocyte sedimentation rate [22]. The health assessment questionnaire disability index (HAQ-DI) was recorded using the standardized form [23]. Rheumatoid factor (RF), anti-citrullinated peptides antibodies (ACPA), C-reactive protein (CRP) and erythrocyte sedimentation rate (ESR) were measured in accredited hospital laboratories. IgM-rheumatoid factor was measured by nephelometry (CSL Behring, Germany), IgA-RF by ELISA (Hycor, Indianapolis, IN, USA), and ACPA by ELISA (Phadia, Freiburg, Germany). RA patients were considered RF positive when IgM or IgA-RF was positive. RA patients were considered seropositive when either RF or ACPA were above a specific threshold as given by the manufacturer. Clinical characteristics of the cohort are shown in Table 1.

SNP Genotyping
rs34933034) and FCGR3A A/C (rs396991) was performed using TaqMan 1 SNP Genotyping Assays (Applied Biosystems) and TaqMan 1 Genotyping PCR Master Mix (Applied Biosystems) on ABI 7900HT Fast Real-Time PCR System (Applied Biosystems). The results were analysed by Primer Express™ software (Applied Biosystems).

Statistical Analysis
Statistical analysis of obtained data was performed using SPSS v11.5 software (SPSS Inc., Chicago, IL, USA). All SNPs were individually tested for association with RA by Chi square test (χ 2 ). Odds ratio with 95% confidence interval was calculated using logistic regression. Multivariable ordination analysis was performed in Canoco 4.5 software (Microcomputer Power, NY, USA). The categorical data were coded as dummy-variables (i.e. each category as new variable with 0/1 values). All variables were Z-normalized. As the analysis cannot operate with blank fields in the table, those variables and subjects with the most missing values had to be excluded from the analysis. Before the main analysis, the detrended correspondence analysis (DCA) or detrended correspondence canonical analysis (DCCA) were performed to assure that PCA, LDA or RDA could be applied according to the 1 st ordinal axis gradient length. PCA was performed for each group of variables to see relations among variables in the group. Monte-Carlo permutation test with 999 permutations was applied as a method of forward selection of significant dependent variables in LDA and RDA. This approach helped us to avoid a problem with possible correlations among numerous independent variables (e.g. genes), as the best fitting variable was chosen by the method as the first; and the second (and further) variables were selected only if their presence improved the model. In the final model, the variables with significant influence were supplemented with variables correlated with them (including remaining stages of the variable). These ones do not explain any new variability of the model, so their influence is negligible, but they can help us to compare our results to previously published observations. First, LDA was performed to see, which genetic factors separate patients with RA from healthy controls. Second, RDA was performed under similar conditions, however, healthy controls and RA patients were split up into two groups according to the presence of antibodies. Third, another RDA was performed with genetic factors and smoking as independent variables, and with clinical parameters associated with RA diagnosis as dependent ones. Age and sex were included as covariates.

Analysis of Genetic Factors Associated with RA
All genes analysed in the study were in Hardy-Weinberg equilibrium. The presence of SE coding alleles in HLA-DRB1 gene was significantly higher in RA patients compared to controls ( Table 2). The most frequent SE alleles in patient group were DRB1 Ã 01:01 (13.4%), DRB1 Ã 04:01 (11.9%), DRB1 Ã 04:04 (3.8%) (data not shown). Among 11 studied SNPs, risk allele and genotype frequencies of PTPN22, STAT4, IRF5 and PADI4 genes were significantly higher in RA patients compared to controls (Table 2). LDA model was used to classify our cohort based on the genotype frequencies of studied SNPs. A major discriminant function was generated based on the variances seen in the data using LDA. This discriminant function was assigned to 1st ordinal axis of LDA diagram and the centroids for diagnosis were plotted in this matrix together with the significant genetic determinants (Fig 1). Based on these discriminant functions, the model predicted to which of the study group individual subject belongs with an overall accuracy of 10.7%. The correlation coefficient between RA diagnosis and explanatory variables in the model was 0.328 (Trace = 0.107; F = 13.715; P = 0.0002). These genetic determinants were selected as significant in the following order representing their importance in the model: SE 0 > IRF5 TT > SE 1 + SE 2 > STAT4 GG > PADI4 CC + PADI4 TT > PTPN22 CC.
In total, the presence of RA significantly correlated with genetic determinants PADI4 and SE alleles. As shown in Fig 1, at least one SE allele and two risk alleles (T) in PADI4 gene were the most important combination associated with RA diagnosis in our cohort. There was a strong prevalence of the non-risk genotype in genes PADI4, STAT4, IRF5 and PTPN22 and lack of SE coding alleles in control groups.

Analysis of Genetic Factors Associated with Autoantibody Status in RA
An association analysis between antibody status and genetic markers was performed in RA patients. The RDA, which estimated the effect of SNPs on the presence of antibodies, found a significant model explaining 17.9% of variability in data (1st ordinal axis 17.5% and 2nd ordinal axis 0.4%) (Fig 2). The correlation between categories RF status, ACPA status and genetic determinants on the 1st ordinal axis is 0.468, and on the 2nd ordinal axis 0.145 (Trace = 0.179; F = 6.135; P = 0.001). RF and ACPA negative RA patients were more likely to have SE 0 genotype. On the other hand, CD28 CC and IRF5 CC genotypes were associated with both RF and ACPA positivity, while CTLA4 AG genotype associated preferentially with RF positivity.

Analysis of Genetic Factors Associated with Clinical Outcomes
In general, RDA analysis with forward selection performed in RA patients showed that ACPA positivity, AFF3 gene, in particular AFF3 TT genotype and smoking significantly associate with higher disease activity defined by clinical parameters DAS28, CRP, ESR, TJC, SJC and HAQ-DI. The explanatory variables explain 6.1% of variability among RA patients (Trace = 0.061; F = 3.482; P = 0.0020). In particular, higher CRP levels were associated with the presence of AFF3 TT genotype, whereas higher DAS28 and higher number of swollen and tender joints were correlated with ACPA positivity (Fig 3).

Discussion
The aim of the presented study was to analyse possible interactions among genetic, serologic and clinical data in a cohort of Slovak patients with RA. LDA and RDA, multiparametric statistical methods, together with conventional association analyses of known RA risk gene variants were applied. As expected, among eleven gene loci previously associated with RA, risk variants  in IRF5, STAT4, PADI4 and PTPN22 genes and the presence of SE alleles in HLA-DRB1 were found to be significantly more frequent in RA patients compared to controls in our study. In line with the latter, the LDA model demonstrated that at least one SE allele and two risk alleles (T) in the PADI4 gene is the most significant determinant of RA diagnosis. On the other hand, the presence of non-risk genotypes in genes PADI4, STAT4, IRF5 and PTPN22 and the lack of SE coding alleles as determinants of non-RA phenotype in our study may suggest a cluster of protective gene variants preventing RA development. It remains unclear whether or not the gene cluster can be regarded RA protective in truly mechanistic way.
Our results suggest that only HLA-DRB1 and PADI4 are predisposing to RA, whereas the other four genotypes were dominant in the healthy population. Based on this analysis, the presence of non-risk genotypes in IRF5, STAT4, PADI4 and PTPN22 indicates more strongly the exclusion of RA diagnosis than would the risk genotype indicate its confirmation. The presence of at least one shared epitope coding allele in HLA-DRB1 gene among RA predisposing factors is not surprising since the HLA region, and particularly the HLA-DRB1 locus, is considered the strongest genetic factor contributing to the increased susceptibility of RA. PADI4 is, as one of the isoenzymes responsible for post-translational conversion of arginine to citrulline, one of the most interesting genes associated with RA. However, its association with disease remains rather disputable. The association of PADI4 with RA was confirmed in European as well as Japanese populations [25] showing also high specificity for the disease. On the other hand, it failed to be confirmed as RA-associated gene variant in a large British-French-Spanish cohort, calling its association in European population in question [26,27]. Our results support the association of SNP in PADI4 gene with RA in a population of European origin. Although there might be a correlation of PADI4 with ACPA positivity, the presence of both components in significant LDA model in our study indicates their own contribution to data variability and thus proves them as two separate factors.
The presence or absence of seropositivity appears to be fundamental with regard to a clinical phenotype of RA [5,6]. Therefore, we performed an analysis investigating which genes from the studied group are associated with RF and/or ACPA positivity in our RA cohort. We found a significant correlation of RA seropositivity with IRF5 and CD28 risk allele homozygosity. As shown on the RDA diagram [Fig 2], the risk variants of IRF5 and CD28 genes are common determinants for both RF-and ACPA-positivity. In other words, the presence of both risk gene variants has a significant association with seropositivity. Moreover, the RDA shows that RFpositivity is associated with the CTLA4 risk variant in heterozygous form. The association of IRF5 gene, including the rs10488631 SNP analysed in this study was mainly described in seronegative RA [28,29]. In other studies, however, this SNP was associated with the disease exclusively in ACPA positive RA patients and in autoantibody positive RA replication cohort [30,31]. Interestingly, the absence of SE alleles associated with ACPA negativity was more significant than the association of two SE alleles with seropositivity. It has been shown that SE is strongly associated with both RF status and presence of ACPA [6,32]. More specifically, ACPA play a vital role in the causal pathway between SE and joint erosions [33].
IRF5 is a latent transcription factor expressed in lymphocytes, macrophages and dendritic cells. IRF5 is involved in expression regulation of IFN-α genes and mediates Toll-like receptor (TLR) signal transduction leading to production of proinflammatory cytokines [34]. CTLA-4 and CD28 are functionally as well as structurally related, as both bind the same counter-receptor, the two B7 family members (CD80 and CD86), that are present on antigen presenting cells, but the molecules have opposing effects on T cell activation playing an important role in the T cell receptor (TCR) signalling regulation [35]. The CTLA4 gene was repeatedly found associated with a diagnosis of RA in genome-wide association studies [31,36,37], however, various SNPs in this gene were analysed. In case of CT60 SNP included in this study, the results are contradictory [38,39]. Among these CTLA4 SNPs, the CT60 variant appears to have the highest functional relevance; the G allele is associated with lower mRNA and protein levels of soluble CTLA4 that could result in increased T cell activation. The allelic distribution of CTLA4 gene was very similar in our cohort (A 61.7; G 38.3%) indicating highest prevalence of heterozygotes among genotypes (AA 38%, AG 47% and GG 15%). Lower significance of homozygotic states of CTLA4 in the final model could be contributed to lower statistical power because of their smaller counts. Surprisingly, CTLA4 and CD28 genes did not show association with RA in our study when analysed as independent risk factors. The lack of independent association of AFF3, CD28, CTLA4, IRF5 and TNAIP3 genes in our cohort might be partially explained by observations of Viatte and co-authors [40], who described the association of these SNPs with RA only in ACPA positive patients. This, together with the fact that the association analysis in this study was done on the whole group of patients regardless the antibody status, with 34.5% of ACPA negative patients from 499 cases analysed, might rationalize why these genes did not reach statistical significance.
In the present study, the RDA was used to investigate a combination of factors associated with the disease activity in RA expressed by DAS28, CRP, ESR, swollen and tender joint counts and HAQ-DI. Our results demonstrate that from all factors analysed, ACPA positivity and the homozygous state of AFF3 risk alleles most significantly influenced the disease activity in our cohort of patients. The link of serologic status to different disease outcomes, with more severe and erosive disease observed in RF and ACPA positive patients is well known [41][42][43]. In terms of disease activity, to our best knowledge, AFF3 locus has been previously associated with the rate of joint destruction [44] but no other clinical parameters. AFF3 gene encodes a tissue-restricted nuclear transcriptional activator that is preferentially expressed in lymphoid tissue. Interestingly, recent studies showed an association of AFF3 locus with triglyceride levels [45] and the end-stage renal disease [46] as a major complication of diabetes. These findings suggest even more plausible clinical implications of AFF3 for RA since serum lipid alterations are monitored and managed in RA patients to minimize the long-term risk of cardiovascular disease and diabetes [47][48][49]. The association of AFF3 locus with RA was first identified by Barton and co-authors [50]. Although the association of SNP in AFF3 gene [rs11676922] was previously found in meta-analysis [30] we did not confirm it in Slovak cohort. This might be influenced by stronger association of AFF3 SNP in ACPA positive patients [40] and therefore undetected in our mixed cohort with respect to its size (as mentioned above). Several other studies reporting an association of AFF3 gene with RA analysed different SNPs in this gene [50,51]. In RDA analysis performed in our study, all factors entered the analysis equivalently as explanatory variables and their appearance in final model after forward selection indicates their own significant contribution to data variability, i.e. strong impact on parameters of clinical manifestation. Thus, our results not only confirm the central role of ACPA as a determinant of RA severity and or inflammatory status, but suggest significant contribution of risk variants in AFF3 gene. Our results support the view that even if some genes, analysed as independent factors, may not provide sufficient risk stratification, their involvement within a prediction/ explanatory model may help to identify clinically relevant high-and low-risk groups [15].
In this study, we confirmed an association of PTPN22, STAT4, IRF5 and PADI4 with RA in Slovak population. A study of PTPN22, STAT4, OLIG3/TNFAIP3, TRAF1/C5 and HLA-DRB1 Ã 04 in Slovak population was previously reported by Stark [52]. Unlike that study, in which authors used osteoarthritis patients as controls, our cohort consisted of gender-and geographically matched healthy individuals. Furthermore, seven additional SNPs with known association with RA were investigated and the first of its kind comprehensive analysis of HLA-DRB1 allele prevalence in Slovak population was performed in our study.
In general, genome-wide association studies-established SNPs have helped identify cellular pathways relevant to RA pathogenesis, however, their modest effect limits their predictive value [53]. Several reasons exist for a failure to replicate GWAS findings. First, it could be that this study had insufficient power to detect the modest effect sizes of selected SNPs. Second, for some non-replicated SNPs, there are differences in terms of the allele frequencies and LD patterns. Third, the magnitude of the effect of a risk allele might be altered in combination with other genetic and non-genetic factors [54,55]. We performed the present study on the genetically homogeneous Slovak population, which shares a common genetic and cultural background and a common environment and is characterized by good genealogical and clinical records and low migration rates, thus contributing to an increased reliability of the data collected. Interestingly, an analysis of individual SNPs, in which association with RA was confirmed such as in PADI4, PTPN22, STAT4 or IRF5, were also found in the multiparametric LDA model suggesting that these genes represent independent risk factors for RA.

Conclusion
This study shows replication data of previously identified genetic risk factors of RA in Slovak population. In single as well as multiparametric analyses, HLA-DRB1, PTPN22, STAT4, IRF5 and PADI4 significantly discriminated between RA patients and healthy controls. IRF5, CD28 and CTLA4 were associated with seropositivity in RA patients. Moreover, risk alleles in AFF3 gene together with the presence of ACPA, were associated with higher clinical severity of RA. The association of multiple risk variants related to T cell receptor signalling with seropositivity may suggest an important role in distinct clinical phenotypes of RA. Our study demonstrates that multiparametric analyses represent a powerful tool for investigation of mutual relationships of potential risk factors in complex diseases such as RA.