Meta-Analysis of 125 Rheumatoid Arthritis-Related Single Nucleotide Polymorphisms Studied in the Past Two Decades

Objective Candidate gene association studies and genome-wide association studies (GWAs) have identified a large number of single nucleotide polymorphisms (SNPs) loci affecting susceptibility to rheumatoid arthritis (RA). However, for the same locus, some studies have yielded inconsistent results. To assess all the available evidence for association, we performed a meta-analysis on previously published case-control studies investigating the association between SNPs and RA. Methods Two hundred and sixteen studies, involving 125 SNPs, were reviewed. For each SNP, three genetic models were considered: the allele, dominant and recessive effects models. For each model, the effect summary odds ratio (OR) and 95% CIs were calculated. Cochran’s Q-statistics were used to assess heterogeneity. If the heterogeneity was high, a random effects model was used for meta-analysis, otherwise a fixed effects model was used. Results The meta-analysis results showed that: (1) 30, 28 and 26 SNPs were significantly associated with RA (P<0.01) for the allele, dominant, and recessive models, respectively. (2) rs2476601 (PTPN22) showed the strongest association for all the three models: OR = 1.605, 95% CI: 1.540–1.672, P<1.00E−15 for the T-allele; OR = 1.638, 95% CI: 1.565–1.714, P<1.00E−15 for the T/T+T/C genotype and OR = 2.544, 95% CI: 2.173–2.978, P<1.00E−15 for the T/T genotype. (3) Only 23 (18.4%), 13 (10.4%) and 15 (12.0%) SNPs had high heterogeneity (P<0.01) for the three models, respectively. (4) For some of the SNPs, there was no publication bias according to Funnel plots and Egger’s regression tests (P<0.01). For the other SNPs, the associations were tested in only a few studies, and may have been subject to publication bias. More studies on these loci are required. Conclusion Our meta-analysis provides a comprehensive evaluation of the RA association studies from the past two decades. The detailed meta-analysis results are available at: http://210.46.85.180/DRAP/index.php/Metaanalysis/index.


Introduction
Rheumatoid Arthritis (RA) is an autoimmune disease that causes inflammation of the joints and surrounding tissues. Its main symptoms are pain, swelling, stiffness and loss of function in the joints [1]. The prevalence of RA is about 1% in the adult population, and is higher among women than men [2].
As a common, complex disease, RA is usually caused by the interaction of multiple genetic variants and environmental factors [3]. Based on twin studies, the contribution of genetic factors is estimated to account for about 50-65% of the risk of developing RA [4,5]. Therefore, the identification of genetic factors is important for understanding the pathogenesis of RA.
Many studies have successfully identified RA disease loci. The most significant genetic locus for RA is the human leukocyte antigen (HLA) within the major histocompatibility complex (MHC) on chr 6p21. There are many alleles of the HLA class II gene, DRB1, especially affecting a shared common string of amino acid residues (the shared-epitope, SE). These DRB1 alleles have consistently been shown to have strong association with RA [6,7]. However, the region has a highly complex genetic structure, which hinders the effectiveness of standard SNP-based genotyping and analysis [8]. Family studies also suggest that the HLA region only contributes one-third of the genetic component [9]. Therefore, non-HLA loci associated with RA are being increasingly studied. SNP-based association studies (including candidate gene association studies and genome-wide association studies-GWAs) are effective for identifying those non-HLA loci. The number of association studies has grown rapidly year on year, and many important genes, such as PTPN22 and STAT4, have been successfully identified [10].
Although association studies of RA have achieved great success, certain problems remain. For the same locus, some studies have yielded conflicting results. For example, Munoz-Valle et al. described that the SNP rs231775 at position 49 (A/ G) of the CTLA-4 gene is associated with RA [11]. However, Milicic et al., indicated that rs231775 is not associated with susceptibility to RA [12]. The inconsistent results may be caused by small sample sizes, racial or ethnic differences, and clinical or genetic heterogeneity [13]. Therefore, it is important to assess whether the combined evidence would show associations between SNPs and RA.
Meta-analysis is a powerful tool that can improve the statistical performance by combining the results of multiple studies. Using meta-analysis methods, certain SNP loci, such as STAT4 rs7574865 [14,15], PADI4 rs2240340 [16], and PTPN22 rs2476601 [17,18], have been evaluated for their association with RA. However, each meta-analysis report only involved one or a few SNP loci. To comprehensively and systematically assess the associations between all available SNPs (each SNP was reported by multiple RA case-control association studies) and RA susceptibility, we searched the PubMed database, and performed a metaanalysis. One hundred and twenty five SNPs were included in our study. For each SNP, three genetic models were considered: the allele model, dominant model and recessive model. Heterogeneity and publication bias were also assessed. As far as we know, this is the most detailed meta-analysis report of RA-related SNPs yet published.

Data Collection
The PubMed literature database was used to search for appropriate studies. The following key words were used: 'polymorphism', 'single nucleotide polymorphisms', 'genome-wide association study', 'GWAs', 'rheumatoid arthritis' and 'RA'. All the studies were selected in accordance with following criteria: (1) all the articles were published between January 1992 and December 2011; (2) all the studies must be a case-control design, and examine the association between SNPs and RA; (3) the data of SNP genotypes in patients and in controls was available; and (4) the study was published as a full paper, not as an meeting abstract or review. Ultimately, 216 studies involving 125 SNPs were included in the meta-analysis. For each study, the following information was extracted: the polymorphism studied, the first author, year of publication, demographics, the numbers of cases and controls for the study.

Selection of the Genetic Model
To comprehensively analyze the relationships between SNPs and RA, three genetic models were selected: the allele model, the dominant model, and the recessive model. To illustrate the models, we assumed that a SNP marker locus has two alleles, labeled A and a (SNPs normally have only two alleles). A is the high-risk candidate allele and a is the lower-risk allele. Three models are described as follows: (1) Allele model: The effect of the A allele vs. the a allele.
(2) Dominant model: If it produces an RA phenotype when present in either one or two copies of A allele, that is, the A/ A+A/a vs. the a/a genotypes

Evaluation of the Heterogeneity
Cochran's Q-statistics were used to test the heterogeneity of between-and within-study variation [19]. The statistics follow a x 2 distribution with k-1 degrees of freedom (where k is the number of studies). The null hypothesis was that all studies were evaluating the same effect. Rejecting the null hypothesis means heterogeneity exists between studies. The significance level was a~0:01.

Evaluation of the Statistical Association
For each of the 125 loci, Cochran's Q-statistics were used to test heterogeneity. If the Q-statistic was not significant, all differences between studies were considered to be caused by sampling error. Then, a fixed effects model was selected for the meta-analysis. The fixed effects model assumes that all studies in the meta-analysis share a common effect size. In contrast, if the Q-statistic was significant (P,0.01), heterogeneity existed between the studies. In this case, a random effects model was selected for the metaanalysis. The random effects model assumes that each study has a specific effect size, and allows heterogeneity exists between studies [23].

Evaluation of Publication Bias
Funnel plots were used to assess publication bias. The estimated effects were plotted against their standard error. Usually, larger sample studies have smaller standard errors, and smaller sample studies have larger standard errors. Therefore, the estimated effects of small studies are more widely scattered than those of larger studies. If there is no bias, the plot will be a symmetrical inverted funnel. Egger's test was used to test the asymmetry of the funnel plot [24,25].
All statistical analyzes were performed using 'Meta' packages in the R language (http://cran.r-project.org/web/packages/meta/ index.html).

Eligible Studies and Loci
The PubMed database was searched and about 1,500 studies were reviewed. Ultimately, 216 published articles involving 125 SNPs were included in the meta-analysis. Each SNP was reported in at least two studies. The number of studies for each locus was also counted. Twenty SNPs were reported more than five times, 41 SNPs were reported three to four times, and 64 SNPs were reported twice. SNP rs2476601 at the PTPN22 gene locus was reported most frequently (34 times). For each study, the number of SNP genotypes in cases and controls was extracted for subsequent analysis.

Meta-analysis Results under the Allele Genetic Model
For each SNP, the OR and its 95% CI of the A allele (A vs. a) were calculated for the individual study, and the heterogeneity between studies was tested.
After heterogeneity testing, 23 SNPs had a Q-statistic P,0.01. For the meta-analysis of these SNPs, a random effects model was used. For the remaining 102 SNPs, a fixed effects model was used. The meta-analysis showed that 30 SNPs were significantly associated with RA (P,0.01, see Table 1). Among these 30 SNPs, only two showed heterogeneity (p = 0.007 and 0.002 for rs7528684 and rs1748033). For these two loci, the overall OR based on the random effects model were 1.093 (95% CI: 1.031-1.158) for rs7528684 and 1.223 (95% CI 1.066-1.404) for rs1748033. The other 28 SNPs showed an association with RA under the fixed effects model. The most significant locus is rs2476601 (PTPN22 risk allele 1858T). The pooled summary OR based on the fixed effects model was 1.605 (95% CI: 1.540-1.672), suggesting that the rs2476601 T-allele does confer susceptibility to RA. The publication bias was tested using Egger's test. No significant publication biases were observed for any of the 30 significant SNPs.

Meta-analysis Results under the Dominant Genetic Model
Based on the dominant model (AA+Aa vs. aa genotype), the heterogeneity between the studies was tested. Thirteen SNPs had high heterogeneity (P,0.01), and were analyzed using the random effects model. One hundred and twelve SNPs were analyzed using the fixed effects model. Table 2 lists all the 28 significantly associated SNPs. Among these SNPs, only rs1748033 showed heterogeneity (Q = 20.260, P = 0.009). For SNP rs7528684, although heterogeneity was observed under the allele model, it did not show heterogeneity under the dominant model (Q = 21.020, P = 0.101). Therefore, the random effects model was used for the meta-analysis of rs1748033, and fixed effects model for rs7528684. All the meta-analysis results were compared under the allele model and the dominant model. Twenty-two

Meta-analysis Results using the Recessive Genetic Model
For the recessive model (AA vs. Aa+aa), 15 SNPs were analyzed using the random effects model, and 110 SNPs were analyzed using the fixed effects model. Ultimately, 26 SNPs displayed significant associations with RA (P,0.01, see Table 3). Among these 26 SNPs, 25 were assessed for an association with RA using the fixed effects model, and only one SNP, rs7528684, was assessed using the random effects model because of heterogeneity (Q = 36.084, P = 0.001). Compared with the allele model, 20 SNPs were significantly associated with RA under both the recessive model and the allele model, and six SNPs were specificity associated with RA under the recessive model. Compared with the dominant model, 12 SNPs were significantly associated with RA under both the recessive model and the dominant model, and 14 SNPs were specificity associated with RA under recessive model. Twelve SNPs (rs2476601, rs7574865, rs2488457, rs1748033, rs6920220, rs10181656, rs396991, rs6498169, rs8179673, rs11889341, rs7528684 and rs231775) were significantly associated with RA under all the three genetic models. The most significant locus was still rs2476601 (PTPN22 risk allele 1858T; OR = 2.544; 95% CI = 2.173-2.978; P-value P,1.00E215). For all 26 significant SNPs, no significant publication biases were observed.
More detailed meta-analysis results were gathered for each of the 125 loci, including: detailed list of articles (the first author, year of publication, demographics, the numbers of cases and controls for the study), individual and combed OR and 95% CI, results of Cochran Q test (Q and P values), I 2 and its 95% CI, results of metaanalysis (under the fixed effects model and the random effects model), forest plots and funnel plot for publication biases. These results are all available at: http://210.46.85.180/DRAP/index. php/Metaanalysis/index.

Meta-analysis of Special Phenotypes
In the process of data collection, we noticed that some articles also provided additional testing for samples, such as rheumatoid factor (RF, positive or negative) and anti-cyclic citrullinated peptide antibody (anti-CCP, positive or negative). A meta-analysis of the SNPs that included the above information (16 SNPs) was also carried out. The meta-analysis results are shown in Table 4. Three (SNPs rs2476601, rs7021206 and rs7574865) were significantly associated with these phenotypes. For rs2476601 (PTPN22 gene), the T allele was significantly associated with RA in RFpositive, RF-negative and anti-CCP-positive RA patients versus controls. In addition, the T allele also showed a significant difference between RF-positive and -negative subjects. For rs7021206 (TRAF1 gene), the G allele was significantly associated with RA in RF-positive, anti-CCP-positive and anti-CCP-negative RA patients versus controls. For rs7574865 (STAT4 gene), the T allele was significantly associated with RA in RF-positive, RFnegative, anti-CCP-positive, and anti-CCP-negative RA patients versus controls. No heterogeneity was found for any of the 16 SNPs, and the meta-analysis was performed using fixed effects model. For more detailed results for each SNP see: http://210.46. 85.180/DRAP/index.php/Metaanalysis/index.

Meta-analysis of Population Subgroups
In this paragraph, some of the SNPs that showed heterogeneity were subjected to subgroup analysis to explain the causes of the heterogeneity. Some SNPs were reported by a few individual studies, and not suitable for the classification by subgroups. Thus, SNPs were selected that were reported by more than 10 individual studies for the subgroup analysis. Three SNPs, rs7528684 (15 studies), rs1800629 (14 studies) and rs1800896 (11 studies) were selected. Only the allele model was considered. The meta-analysis results of the subgroups are shown in Table 5. For SNP rs7528684, 15 studies were divided into three subgroups: European (five studies), Asian (six studies) and American (four studies). For each subgroup, no heterogeneity was observed. The meta-analysis results showed that rs7528684 was associated with RA only in the Asian subgroup (OR = 1.17, 95% CI: 1.09-1.24, P,1.00E24). No evidence of association was observed in European and American subgroups. This indicated that the heterogeneity of rs7528684 may be caused by regional differences, and the C allele is a risk allele in the Asian population, but not in Europeans and Americans. For SNP rs1800629, 14 studies were divided into three subgroups: European (six studies), Asian (four studies) and American (three studies). One study was excluded because the study population was African. For each of the three subgroups, no heterogeneity was observed. The meta-analysis results showed that rs1800629 was associated with RA in both Asian (OR = 2.17, 95% CI: 1.61-2.92, P,1.00E24) and American (OR = 1.91, 95% CI: 1.40-2.62, P,1.00E24) subgroups. However, there was different risk allele in the two subgroups (G allele for Asians and A allele for Americans). No evidence of association was observed in the European subgroup. This also indicated that regional difference is an important reason for the heterogeneity of rs7528684. The risk allele in Asian is G, while in American is A. For rs1800896, only the European subgroup was analyzed (the numbers of studies in the other subgroups were too small). There was a high heterogeneity in the European subgroup (P = 0.001) and rs1800896 is not associated with RA in the European subgroup. Further studies are required to identify the reasons behind the heterogeneity of the SNP rs1800896.

Discussion
In the past two decades, many SNP loci have been identified as associated with RA by candidate gene association studies and GWAs. However, RA is a complex disease, and many genetic loci contribute to susceptibility to RA. Some association studies are underpowered for detecting the modest contributions of these genetic loci. This will lead to inconsistent results because of falsepositives, false-negatives, or population differences [26,27]. Metaanalysis is a powerful tool that can increase statistical power by pooling the results of independent studies [28] and, therefore, can improve the performance of genetic studies on complex diseases such as RA.
In this study, a comprehensive and systematic meta-analysis was carried out to assess the associations between 125 SNPs and RA susceptibility. Three genetic models were considered: the allele, recessive and dominant models. The meta-analysis results showed that 30, 28 and 26 SNPs were significantly associated with RA under each model, respectively. SNP rs2476601 had the strongest . The SNP is a common SNP, and is located in the PTPN22 gene, which encodes a lymphoid-specific phosphatase (Lyp). rs2476601 is a nonsynonymous SNP, and changes the amino acid at position 620 from an arginine (R) to a tryptophan (W). This change affects the physical association with tyrosine kinase Csk during T cell activation [29,30]. In addition to rs2476601, multiple SNPs in the PTPN22 gene showed significant association with RA: five SNPs in the allele model (see Table 1); three in the dominant model (see Table 2); and four in the recessive model (see Table 3). These data are evidence of the association of PTPN22 with RA. In addition, the STAT4 and PADI4 genes also had multiple SNPs associated with RA (STAT4: four, four and four SNPs for the allele, dominant and recessive models, respectively; PADI4: six, four and four SNPs for the allele, dominant and recessive models, respectively). The publication bias for all the 125 SNPs was evaluated using funnel plots and Egger's test. Only four SNPs showed significant bias: rs4810485 (allele, dominant, recessive model, P,1.00E215, P,1.00E215, P,1.00E215), rs2280714 (dominant model, p = 0.006), TAP2 379A/G (allele model, p = 0.0003), and TNFRII 676T/G (allele model, p = 0.001). For all four sites, the results of the meta-analysis were not significant under all three genetic models. In other words, the significant association results in the meta-analysis were not affected by publication bias. This indicated that our meta-analysis is reliable. Nevertheless, some of the SNPs were tested for association only in a few studies, and there may be a publication bias for such SNPs. More studies on these loci are required.
In summary, a meta-analysis of 125 SNPs was carried out with the aim improving the statistical performance by increasing the sample size. After the meta-analysis, associations between certain SNPs and RA susceptibility were confirmed. However, certain SNP loci were reported by only a few articles, and further more studies are needed to clarify the associations between these SNPs and RA susceptibility.