Genetic Heterogeneity of Susceptibility Gene in Different Ethnic Populations: Refining Association Study of PTPN22 for Graves’ Disease in a Chinese Han Population

In our previous studies, we presumed subtypes of Graves’ disease (GD) may be caused by different major susceptibility genes or different variants of a single susceptibility gene. However, more evidence is needed to support this hypothesis. Single-nucleotide polymorphism (SNP) rs2476601 in PTPN22 is the susceptibility loci of GD in the European population. However, this polymorphism has not been found in Asian populations. Here, we investigate whether PTPN22 is the susceptibility gene for GD in Chinese population and further determine the susceptibility variant of PTPN22 in GD. We conducted an imputation analysis based on the results of our genome-wide association study (GWAS) in 1,536 GD patients and 1,516 control subjects. Imputation revealed that 255 common SNPs on a linkage disequilibrium (LD) block containing PTPN22 were associated with GD (P<0.05). Nine tagSNPs that captured the 255 common variants were selected to be further genotyped in a large cohort including 4,368 GD patients and 4,350 matched controls. There was no significant difference between the nine tagSNPs (P>0.05) in either the genotype distribution or allelic frequencies between patients and controls in the replication study. Although the combined analysis exhibited a weak association signal (P combined = 0.003263 for rs3811021), the false positive report probability (FPRP) analysis indicated it was most likely a false positive finding. Our study did not support an association of common SNPs in PTPN22 LD block with GD in Chinese Han population. This suggests that GD in different ethnic population is probably caused by distinct susceptibility genes.


Introduction
Graves' disease (GD) is one of the most common autoimmune diseases (AIDs) and is characterized by the production of autoantibodies that bind and stimulate the thyroid-stimulating hormone receptor (TSHR), resulting in hyperthyroidism and diffuse enlargement of the thyroid gland. GD is universally considered to be a complex disease triggered by the interaction between susceptibility genes [1][2][3] and nongenetic factors, such as stress, iodine intake, and infection [4,5]. The prevalence of GD is approximately 0.5-2% in Western countries and 2-3.0% in China [6,7]. Family and twin studies showing that 79% of the predisposition to the development of GD is attributable to genetic factors [8], thereby it is of importance to identify the susceptibility genes and loci, which will facilitate diagnosis, prevention, and treatment of this disease.
PTPN22 is located at chromosome 1p13.2 and encodes the intracellular tyrosine phosphatase LYP, which acts as a negative regulator in early T-cell activation and signal transduction through binding to the Csk protein [24]. A functional single-nucleotide polymorphism (SNP) R620W (rs2476601) at position +1858 (+1858C/T) was first identified as a susceptibility locus to Type 1 diabetes (T1D) in a European population [25]. The variant was further reported to be associated with several AIDs, such as Rheumatoid arthritis (RA), autoimmune thyroid disease, and systemic lupus erythematosus [21,[25][26][27]. However, it is noteworthy that the polymorphism of rs2476601 was reported monomorphic in Asian populations [19,[28][29][30], which indicates that it may not have a causal role for GD in the Asian population.
In our previous studies, we presumed that subtypes of GD may be caused by different major susceptibility genes or different variants of a single susceptibility gene [12,17]. Given the genetic heterogeneity of PTPN22 in different ethnic populations, we intend to investigate the association of SNPs in PTPN22 with GD in a large number of samples in order to define whether PTPN22 is the susceptibility gene of GD in Chinese Han population.

Subjects and sample collection
We enrolled 5,904 GD patients (4,635 females and 1,269 males; age 39 ±14 yr) and 5,866 geographically matched healthy controls (4,506 females and 1,360 males; age 48 ± 12 yr) from the Chinese Han population. GD was diagnosed as previously reported [12,16,17]. The patients and control subjects gave their written informed consent, and the project was approved by the local Research Ethics Committee from Ruijin Hospital, the Central Hospital of Xuzhou, the first affiliated hospital of Bengbu Medical College, Medical School Hospital of Qingdao University, Linyi People's Hospital, the Hospital Affiliated to Jiangsu University, and Fujian Province Hospital respectively. Genomic DNA was extracted from peripheral blood leukocytes using FUJIFILM QuickGene-610L system.

SNP Selection, Genotyping, and Quality Control (QC) Filters
DNA samples from 1,536 GD cases and 1,516 controls were genotyped using Illumina Human660-Quad BeadChips at the GWAS stage. Then we performed quality control that excluded call rate< 98%, gender inconsistencies and cryptic relatedness (142 samples). The genotype data for 186 SNPs within large linkage disequilibrium (LD) block region (between 113.7-114.9 MB on chromosome 1, defined by two apparent recombination hotspots) containing PTPN22 were obtained in a cohort, including 1,442 GD cases and 1,468 controls from our previous GWAS [16]. ( Figure 1A) To further define the loci associated with GD, we performed an imputation analysis based on our GWAS data and obtained the genotype data of 1,277 SNPs in the ~1.2Mb-large LD region containing PTPN22 for subsequent analysis (Figures 1A and 1B; Table S1). Notably, the SNPs associated with GD were mostly located in a small region highlighted by two recombination hotspots (marked by the arrow in Figure 1A Table S1).
Among the 474 SNPs, 255 SNPs showed a P value less than 0.05, and nine tagSNPs were selected using Haploview software (version 4.2) based on our GWAS and imputation data of 255 SNPs with a criterion of r 2 >0.8 (Table S2). Furthermore, additional 4,368 GD patients and 4,350 matched controls were genotyped for replication using the TaqMan SNP Genotyping Assays, which was a widely accepted method and also referred in our article [20].
The samples with low call rates (<90%) were discarded, and 8,553 samples (4,254 patients and 4,299 controls) were left for further analysis in replication stage. The Hardy-Weinberg disequilibrium for the nine SNPs genotyped in the controls was calculated by Haploview 4.2 software and the P-values of the nine tagSNPs were more than 0.05, suggesting that all of the tagSNPs showed no significant deviation from Hardy-Weinberg disequilibrium.

Imputation and statistical analysis
We performed the imputation analysis on the PTPN22containing LD region using the program IMPUTE2 [23,31,32] together with the observed genotype data and the 1000 Genomes Project phase 1 interim impute data (Jun 2011) as a reference. After imputation and strict quality control (using only SNPs with confidence scores of ≥0.9, call rates ≥95%, and non-deviation from Hardy-Weinberg equilibrium (P >10 -6 and MAF >1%), our datasets included 1,277 SNPs in the PTPN22containing LD region for subsequent analysis (Table S1).
Association analysis of case-control data at the GWAS stage was conducted by Cochran-Armitage trend test in PLINK 30 . At the replication stage, the association test was assessed using the Cochran-Armitage test for trend applied in PLINK v1.07 [30]. Finally, the Cochran-Mantel-Hanezel stratification analysis was used in the combined population. The LD structure was calculated using Haploview version 4.2 software [33]. False positive report probability (FPRP) was analyzed using the FPRP calculation spreadsheet provided by Wacholder et al. [34] The R package was used to generate the genome-wide Pvalue plot, and the regional plots were generated using SNAP version 2.2 software [35].
Based on the 186 genotyped SNPs in 1,442 GD patients and 1,468 control subjects, the genotype frequencies of 1,227 SNPs in the LD block containing PTPN22 were obtained by imputation analysis ( Figure 1A and Table S1). Interestingly, the SNPs, including the genotyped and the imputed SNPs, located in an LD block of about ~370 kb containing seven genes (MAGI3, PHTF1, RSBN1, PTPN22, BCL2L15, AP4B1, and DCLRE1) at 1p13.2, were strongly associated with GD in our data. The most significant association signal surrounded PTPN22 (P GWAS =0.0007, OR=1.23, 95%CI: 1.09-1.38 for rs6669008), whereas the association signals in the region outside the ~370 kb LD block were relatively weak ( Figure 1B; Tables S1 and S2).

Replication Study and the Combined Analysis
Base on LD analysis in our GWAS data, nine tagSNPs were able to fully tag (r 2 >0.8) the 255 SNPs with P GWAS <0.05 in thẽ 370 kb LD block containing PTPN22 (Table S2). Although the P value of the all 255 SNPs are more than 0.000196 (0.05/255), the nine tagSNPs were further selected and genotyped in a cohort of 4,368 GD patients and 4,350 control subjects in the replication study given that the genotypes of some SNPs were obtained by imputation. Unexpectedly, our data revealed that all nine tagSNPs were not associated with GD in the replication study (Table 1). Specifically, no significant differences in allele or genotype frequencies were observed between the GD patients and healthy controls (P replicated =0.3158 for rs6669008, OR=1.04, 95% CI: 0.97-1.11; Table 1), despite good statistical power (nearly 100%) to detect an effect size of 1.2 ( Table 2).
We further analyzed the LD structure of the nine tagSNPs in case and control subjects based on data from our replication cohort. However, there are no significant differences of the LD structure between GD patients and control subjects, which are also similar with the LD structure of Asia population using the data from 1000-Human-Genome (Table S2, Figure S1). The data suggested that the imputation data in current study are acceptable. Thus, we combined the results of the GWAS and replication stages, and six out of nine tagSNPs in the ~370-kb LD block containing PTPN22 were found norminal associated with GD, with the most association signal at rs3811021 (P combined = 0.0033, OR=1.10, 95% CI: 1.03-1.17;), but no SNPs with P value are less than 0.000196 (0.05/255), the threshold of significant in the current study (Table 1).

FPRP analysis
In order to determine whether the combined result showing nominal association between the 6 SNPs and GD was a false positive signal, the FPRP was analyzed [34]. Here, the FPRP value was calculated under an assigned prior probability ranging from 0.00001 to 0.25, using the statistical power to detect an OR of 1.2 and the observed ORs and P values. Our case-control study for the nine SNPs in a total sample of 5,696 patients with GD and 5,767 control individuals has more than 99.5% statistical power to detect a SNP with a level equal to its reported P value, corresponding to relative risks of 1.2 for GD (Table 2). Notably, the FPRP values of SNP rs3811021 (P combined = 0.0033) were below 0.2 just for the prior probability at 0.25 which was just a relatively high prior probability range. However, the values were more than 0.2 if the prior probability was less than 0.25, suggesting that the six SNPs with a week association signal with GD may be caused by false positive reports.

Polymorphism comparison in the present and previous studies of GD
Until now, most studies of GD mainly focused on investigating the association of functional SNP rs2476601 with GD in Caucasian populations (Table 3) [21,22,[36][37][38][39]. However, a few studies were carried out in Asian populations to investigate the GD associations of SNPs in PTPN22 other than rs2476601 (Table 3) [19,28,29,40]. It is worth noting that a study conducted in a United Kingdom Caucasian population (768 GD patients, 768 control subjects) showed no association with GD of any of the 5 tagSNPs that were selected for genotyping in the PTPN22 region. However, these 5 tagSNPs were in the lower LD with rs2476601 based on 1000 Genomes project data (Table 3) [36]. Ichimura et al. found that one SNP, rs3789604, was significantly associated with GD in 414 patients and 231 control subjects recruited from a Japanese population (P= 0.0085, OR = 1.45; Table 3) [28], which was subsequently replicated by Gu et al. in a Chinese population [19]. Based on our imputation data, rs3789604 is associated with GD (P GWAS =0.001354, Table 3, Table S1). Although this SNP has not been genotyped in our second cohort, one SNP, rs3811021, in the high LD block with rs3789604 (r 2 = 0.99) was selected and genotyped in the replication study. The allele frequency of rs3811021 did not differ significantly between the 4,368 GD patients and the 4,350 controls in the replication study (P=0.138, OR=1.06). Another study performed in a Korean population by Lee et al. reported that rs12730735 was associated with susceptibility to autoimmune thyroid disorders (AITDs) in a total of 212 AITD (84 GD and 128 Hashimoto's thyroiditis) patients and 225 controls, especially with that to Hashimoto's thyroiditis (P < 0.01) [40]. However, there was no evidence to support the association of rs12730735 with GD in our GWAS data (P GWAS = 0.1857; Table 3, Table S1), and rs12753075, in a high LD block with rs12730735 (r 2 =0.93), was also not associated with GD in the replication study (P replicated = 0.7315, OR=0.98, Table  1). Although our combined data did not confirmed the association of SNPs at PTPN22 region with GD, it is necessary to perform a meta-analysis of the SNPs previously reported to be associated with GD, especially in Asian population in future research.

Discussion
Previous studies provided solid evidence for PTPN22 as a susceptibility gene for GD in Caucasian populations. Notably, the rs2476601 polymorphism was reported monomorphic in Asian populations [19,[28][29][30], which indicates that it may not have a causal role for GD in the Asian population. However, we cannot exclude the PTPN22 region harboring other susceptibility SNPs for GD in the Chinese Han population. So this phenomenon provided an excellent model to confirm whether GD is a heterogeneous disease in distinct ethnic populations, which may be caused by different major susceptibility genes or different SNP variants in one susceptibility gene.
Thus, in the current study, nine tagSNPs were selected and genotyped in 4,254 GD and 4,299 control individuals to investigate whether SNPs in the PTPN22 region were associated with GD in the Chinese Han population. Unexpectedly, all nine tagSNPs were not associated with GD in replicated samples (P replicated =0.7315 to 0.1383; Table 1). Although the combined case-control association study still exhibited a nominal association signal (P combined = 0.0033, OR=1.10, 95% CI: 1.03-1.17 for rs3811021; Table 1), but the P value of rs3811021 is more than the signifiacant threshold 0.000196 (0.05/255). Moreover, the follow-up FPRP calculation suggested it was most likely a false positive finding (Table 2). We also calculated the power using the CaTS Power Calculator software to replicate the association between the most significant SNP, rs3811021, and GD at the level of P<5×10 -8 and found that the possibility was less than 1% in our current sample size. Quanto software was also used to  estimate the sample size needed in the association between rs3811021 and GD to reach the GWAS significance level (P < 5×10 -8 ) (http://hydra.usc.edu/gxe). It required nearly 26,500 cases and 26,500 controls to achieve this level of significance, which is too large to fulfill in the current stage. The 1000 Genomes Project data also indicated the quite different allelic frequencies of SNPs in the PTPN22 LD block between different ethnic populations (Table S3), which further demonstrated that PTPN22 may not associated with GD in the Chinese Han population despite the evidence that PTPN22 is a susceptibility gene for GD in European Caucasian populations. The present study suggests that common SNPs from the PTPN22 region 1p13.2 were not associated with GD, which provides more solid evidence to assert that PTPN22 is an ethnicity-specific GD susceptibility gene in Caucasian populations but not in Chinese Han populations. Alhtough the SNP density and sample size in our current study are large enough to provide convincing evidence that PTPN22 was not associated with GD in Chinese Han population, the target resequencing for the PTPN22 region in GD from Chinese Han population will be required in the further study, given that the genotypes of some SNPs were obtained by imputation.
To our knowledge, most of the association studies about PTPN22 and immune-related disease focused only on rs2476601 in relatively small sample sizes. More recently, several GWAS studies using larger sample sizes also indicated that rs2476601 was strongly associated with some autoimmune diseases, such as T1D , RA, and Crohn's disease, especially T1D, with a P value up to 2×10 -111 (OR=2.0) [41]. These data strongly support the hypothesis that PTPN22 is a Table 3. Comparision of the association of SNPs in PTPN22 region with GD in the current study with that in previous reports.

SNP
Alleles Case MAF(%) Control MAF(%) Reported P vaule Study population Study first author (reference) major susceptibility gene for autoimmune disease in Caucasian populations. In our current study, we conducted a comprehensive refining association analysis of the PTPN22 region in relatively large GD cohort, allowing us to have good power (nearly 100%) to detect the previously reported association. However, we failed to find any association of PTPN22 with GD in the Chinese Han population. Moreover, no associations of SNPs in PTPN22 region with autoimmune diseases were found in Asian populations by searching GWAS data from UCSC website. Our results revealed that the SNPs in PTPN22 were not associated with GD in Chinese Han population. However, it is should be considered that one limitation in the current study is the population stratification in our replication cohort might be influence the conclusion. Although our and other previous researches [16] [42,43] did not found significant population stratification in Chinese Han population, it would be much more reasonable to elucidate our negative association results after the population stratification analysis.
In conclusion, we provided the most convincing evidence that PTPN22 was not associated with GD in Chinese Han population and different susceptible genes were responsible for GD in different ethnics. Figure S1.

Supporting Information
Linkage disequilibrium plots of the nine tagSNPs in GD patients (A), controls subjects (B) of replication stage and healthy individuals (C) from 1000-Human-Genome Asia population. The color of each SNP spot reflects its r2, with the top typed SNP (large red diamond) within each association locus changing from black to white.