Association of the CTLA4 Gene with Graves' Disease in the Chinese Han Population

To determine whether genetic heterogeneity exists in patients with Graves' disease (GD), the cytotoxic T-lymphocyte associated 4 (CTLA-4) gene, which is implicated a susceptibility gene for GD by considerable genetic and immunological evidence, was used for association analysis in a Chinese Han cohort recruited from various geographic regions. Our association study for the SNPs in the CTLA4 gene in 2640 GD patients and 2204 control subjects confirmed that CTLA4 is the susceptibility gene for GD in the Chinese Han population. Moreover, the logistic regression analysis in the combined Chinese Han cohort revealed that SNP rs231779 (allele frequencies p = 2.81×10−9, OR = 1.35, and genotype distributions p = 2.75×10−9, OR = 1.42) is likely the susceptibility variant for GD. Interestingly, the logistic regression analysis revealed that SNP rs35219727 may be the susceptibility variant to GD in the Shandong population; however, SNP, rs231779 in the CTLA4 gene probably independently confers GD susceptibility in the Xuzhou and southern China populations. These data suggest that the susceptibility variants of the CTLA4 gene varied between the different geographic populations with GD.


Introduction
Graves' disease, which affects 1.2% of western populations (0.5% clinical and 0.7% subclinical) [1] and 0.25-1.09% of the Chinese population [2], is an autoimmune disorder in which the body produces auto-antibodies to the receptor for thyroidstimulating hormone (TSH), leading to hyperthyroidism. Although environmental agents, such as infection [3] and stress, are undoubtedly important in the development of Graves' disease in susceptible individuals, it has been estimated in twin studies that around 80% of the predisposition to GD is due to genetic factors [4]. However, similar to other common complex diseases, the identification of the susceptibility gene for GD has been challenging. Recently, genome-wide association studies (GWAS) have uncovered the susceptibility genes of some common diseases [5][6][7][8]. However, variability between studies in the measured significance of the validated loci has appeared, and has suggested that genetic heterogeneity exists in type 2 diabetes [5][6][7][8]. In a recent whole genome linkage study by Tomer Y., distinct genes were suggested to predispose to autoimmune thyroid diseases (AITD) in different subsets of patients [9]. Most recently, our data have shown that, similar to most Mendelian monogenic disorders, the susceptibility variants of a gene that predisposes to GD varied among patients from different geographic populations [10]. The goal of the present study was to confirm that genetic heterogeneity exists in patients with GD.
In the present work, in order to confirm that CTLA4 gene is associated with GD, and to ask whether the susceptibility variants in the gene differed among the different geographic populations with GD, SNPs in CTLA4 gene were selected for genotyping in a Chinese Han cohort containing 2640 patients with GD and 2204 control subjects.

Association analysis of the CTLA4 gene in a combined Chinese Han population
Forty-seven SNPs in the CTLA4 gene region were selected for genotyping in the Chinese Han cohort, containing 2640 patients with GD and 2204 control subjects, which were recruited from different geographic regions of China. Among the 47 SNPs, 44 SNPs with call rates of more than 80% were further analyzed in 2640 GD patients and 2204 control subjects. Of those, 17 SNPs with unique alleles and five SNPs with minor allele frequencies (MAF) of less than 1% were removed from the association analysis. In addition, seven SNPs with Hardy-Weinberg equilibrium (HME) of p#1610 26 in controls were also eliminated from the analysis [29]. Finally, 15 of the 47 SNPs in the CTLA4 gene region were included in the association analysis (Table S1). The allele frequencies (Table 1) and the genotype distributions (Table 2) for these 15 SNPs were analyzed in 2640 GD patients and 2204 control subjects from different geographic regions of China. And all samples were analyzed in the same lab and under the same conditions. Out of the 15 SNPs, eight SNPs have significantly different allele frequencies and genotype distributions (at p-value ,0.001 level) between the GD and normal subjects and the strongest association was measured for one SNP in the first intron, rs231779 (allele frequencies p = 2.81610 29 , OR = 1.35, 95%CI = 1.23-1.48 and genotype distributions p = 2.75610 29 , OR = 1.42, 95%CI = 1.27-1.60) (Tables 1 and 2, Fig. 1A and 1B). It was interesting that rs231775 (i.e., A49G polymorphism in exon 1 of the CTLA4 gene) and rs11571302 (i.e., JO31 polymorphism in the 39 untranslated region (UTR) of the CTLA4 gene), which have been reported to be susceptibility loci of GD, also showed significant differences between GD patients and controls in the combined Chinese Han population (rs231775: allele frequencies p = 9.39610 25 , OR = 1.24, 95%CI = 1.12-1.38 and geno-type distributions p = 0.0002, OR = 1.28, 95%CI = 1.13-1.45, rs11571302: allele frequencies p = 2.29610 25 , OR = 1.26, 95%CI = 1.14-1.40 and genotype distributions p = 9.26610 26 , OR = 1.31, 95%CI = 1.16-1.47, respectively) (Tables 1 and 2, Fig. 1A and 1B).
Meanwhile, the linkage disequilibrium (LD) regions of 15 SNPs within the CTLA4 gene were evaluated using the Haploview program [30]. Two LD region composed of these SNPs were observed in the combined Chinese Han population and were located between SNPs rs11571315 and rs231777, and SNPs rs231779 and rs10932025, respectively (Fig. 1G).
Next, to identify the susceptibility variants of GD in the CTLA4 gene region, the genotype data of 15 SNPs suitable for logistic regression analysis in the combined Chinese Han population were further mined by forward and two-locus logistic regression analysis [16,31] (Table 3 and Fig. 1C). Forward logistic regression result suggested that rs11571316, rs231779, rs231725 and rs231730 were the independent susceptibility variants in the combined Chinese Han cohorts. Because no statistical difference in allele frequencies and genotype distributions between GD patients and healthy control was detected at SNP rs231730, the remained three SNPs rs11571316, rs231779 and rs231725 were further analyzed by two-locus regression analysis. Among them, rs231779 can improve all of the models with one of other 14 SNPs, with a cut-off p-value ,0.01. Nevertheless, only three SNPs (rs11571316, rs35219727, and rs231730) could improve this model with SNP rs231779 (p = 3.36610 28 , 0.0069 and 0.0022, respectively). Among the three SNPs that could improve the model with rs231779, two SNPs (rs35219727 and rs231730) did not show a significant difference between the patients with GD and control subjects. Meanwhile, SNP rs11571316 in the promoter of the CTLA4 gene was located on one of the two LD blocks that did not include SNP rs231779 (Fig. 1G). Thus, in the combined Chinese Han population, with regard to the SNPs in the CTLA4 gene region, SNP rs231779 with the lowest p-value among 15 SNPs of CTLA4 was likely the most important SNP for the susceptibility to GD because it could improve the model with each one of other 14 SNPs. However, these results do not exclude the possibility that  SNP rs231779 and rs11571316 act in combination to increase susceptibility to GD. The false positive report probability (FPRP) of the SNPs with significant association to GD in the combined Chinese Han cohort was also analyzed. In the present study, the FPRP value was calculated for each genetic variant using the assigned prior probability range, the statistical power to detect an odds ratio of 1.5, and detected odds ratios and p values. As shown in Table 4, among the 10 genetic variants with a significant difference between the patients with GD and healthy individuals, the FPRP values of five SNPs were below 0.2 for the prior probability from 0.25 to 0.0001, which was a relatively high prior probability range. However, the FPRP values for rs231779 were very low even for low prior probabilities, since the FPRP value remained below 0.2 even for a prior probability of 0.00001 (Table 4). Interestingly, the case control study for these 10 SNPs with significant differences in allele frequencies between the 2640 patients with GD and 2204 control individuals has more than 99.5% statistical power to detect a SNP with an a level equal to their reported p value, corresponding to relative risks of 1.5 for GD (Table 4). Notably, the FPRP values of SNP rs11571316 (p = 0.0004), a possible susceptibility variant for GD in the Chinese Han population by logistic regression analysis, were below 0.2 just for the prior probability from 0.25 to 0.01; whereas, the values were more than 0.2 if the prior probability was less than 0.01, suggesting that the SNP rs11571316 may be a false positive report SNP.

Association analysis in different geographic regions of the Chinese Han population
It was interesting that there was variability in the significance of 15 SNPs in the CTLA4 gene region across different geographic regions of the Chinese Han population. In the Shandong population, there were seven SNPs exhibiting significantly different allele frequencies and genotype distributions between 970 GD patients and 682 control subjects, with p-values ,0.05 (Tables 1 and 2, and Fig. 1A and 1B). Further analysis of these seven SNPs revealed that the most significant difference in allele frequencies and genotype distributions was detected at SNP rs35219727 (p = 1.30610 25 , OR 0.42, 95%CI 0.29-0.61 and p = 3.57610 27 , OR 0.38, 95%CI 0.26-0.56, respectively) (Tables 1 and 2, and Fig. 1A, 1B), which was also in intron 1 of the CTLA4 gene. Next, these 15 SNPs in the Shandong population were further analyzed using forward and two-locus logistic regression analysis. Forward logistic regression result revealed that SNPs rs11571316, rs231777, rs35219727, rs231779 and rs231723 were the independent susceptibility variants. However, no statistical differences were detected at SNPs rs11571316 and rs231777 (Table 1, 2 and Fig. 1A, 1B). At the same time, the two-locus logistic regression result showed that rs35219727 could improve the model with each one of the other 14 SNPs; however, only seven SNPs could weakly improve the model with rs35219727, except SNP rs11571316 (Table 3 and Fig. 1D), with a cut-off pvalue ,0.01. Although SNP rs11571316 could significantly improve the model with rs35219727 in the Shandong population by two-locus logistic regression analysis, it showed no significant difference between patients with GD and control subjects (Tables 1, 2 and 3, and Fig. 1D). Interestingly, unlike Xuzhou and the combined Chinese Han population, only one LD region was discovered in the Shandong population by haploview analysis. This region was located between SNPs rs11571315 and rs10932025 (Fig. 1H). The results suggested that SNP rs35219727 was the susceptibility variant to GD in the Shandong population.
Meanwhile, in the Xuzhou population of 841 GD patients and 818 control subjects, out of 15 SNPs, 10 SNPs demonstrated different distribution patterns (at p-value ,0.01 level) (Tables 1 and 2, and Fig. 1A and 1B). Among those 10 SNPs, five SNPs (rs11571315, rs231775, rs231779, rs231723, and rs231725) exhibited statistically significant differences in allele frequencies and genotype distributions, with the p-values ,0.0001, and the most significant difference in the allele and genotype frequencies located at SNP rs231779 (p = 1.37610 25 Fig. 1A and 1B). Notably, any two loci between these five SNPs (rs11571315, rs231775, rs231779, rs231723, and  rs231725) were tightly linked in the Xuzhou population, and all D' values were greater than 97% (Fig. 1I). Forward logistic regression revealed that only SNP rs231779 was the independent susceptibility variant. At the same time, two-locus logistic regression analysis results showed that no SNP can improve the model with rs231779, however, rs231779 can improve the model with one of other nine out of 14 SNPs, with a cut-off p value ,0.01 (Table 3). Interestingly, four out of five SNPs (rs11571315, rs231775, rs231723, and rs231725), which could not improve the model with rs231779, were strongly linked in the Xuzhou population and not the independent susceptibility variants in the forward logistic analysis. It was possible that SNP rs231779 conferred susceptibility to GD in the Xuzhou population. Haploview analysis results for the Xuzhou population revealed that there were two LD blocks located between SNPs rs11571315 and rs231777, and SNPs rs231779 and rs10932025, which were the same as the results from the combined Chinese Han population (Fig. 1I). At the same time, association analysis was performed in 829 GD patients and 704 normal subjects from southern China, including Shanghai City and Fujian Province. We found that four out of 15 SNPs had allele frequencies of significant difference (Table 1) and seven SNPs had different genotype distributions (Table 2) between patients with GD and control subjects,with p-values ,0.05. Among them, the locus with the most significant difference was located at SNP rs231779, and the p-values for allele frequencies and genotype distributions were 0.0071 and 0.0013, respectively  Tables 1 and 2 for detailed information). C-F: Two-locus logistic regression analyses of rs231779 in the combined Chinese Han population (C), rs35219727 in the Shandong population (D), rs2321779 in the Xuzhou population (E) and rs231779 in the Southern China population (F). SNPs rs231779 and rs35219727were put individually into the regression models as the best makers, and all other markers were sequentially added to see if a second locus could improve the model. In the combined Chinese Han population, three of the 14 SNPs suitable for logistic regression analysis hurt the model with rs231779 (C), at the p value ,0.01. In contrast, we tested a regression model by taking each of 14 loci in turn and adding the test locus to it. All the markers could be improved by adding rs231779 (C). At the same time, in the Shandong population, rs35219727 improved the model with each of the other 14 SNPs; however, seven SNPs hurt the model with rs35219727, except SNP rs11571316 (D) (see Table 3 (Tables 1 and 2). Forward logistic regression result suggested that SNPs rs11571315, rs231775 and rs231779 were the independent susceptibility variants. However, there were no statistical differences between GD patients and control subjects at SNPs rs11571315 and rs231775. Meanwhile, the two-locus logistic regression analysis in the southern China population revealed that rs231779 can improve the model with each of SNPs except three SNPs (rs11571316, rs11571302 and rs231729), however, only three SNPs (rs11571315, rs231775 and rs231730) can improve the model with SNP rs231779. Interestingly, the three SNPs improving the model with rs231779 did not show a significant difference between the patients with GD and control subjects recruited from southern China. Of note, SNPs rs11571316 and rs231779 were linked to each other (D' = 82%) in the southern China population by haploview analysis (Fig. 1J), but they did not influenced each other in the results of the two-locus logistic regression analysis (Table 3). Taken together these results suggested that SNP rs231779, in the CTLA4 gene independently conferred susceptibility to GD in the southern China population.

Discussion
Our case-control study of the SNPs in the CTLA4 gene region from 2640 GD patients and 2204 control subjects verified that CTLA4 is the susceptibility gene for GD in the Chinese Han population. Moreover, the logistic regression analysis in the combined Chinese Han cohorts revealed that SNP rs231779 is likely the susceptibility variant because it improved the model with any one of the other 14 SNPs in the CTLA4 gene region. Interestingly, the FPRP value for SNP rs231779 was very low for the prior probability range and was quite robust even for low prior probabilities. These results suggest that rs231779 in intron 1 of CTLA4 is associated with GD etiology in the combined Chinese Han population; although, it still remains possible that there are other susceptibility SNPs or gene(s) that cause the onset of GD. Similarly, SNP rs231779 was associated with susceptibility to GD in the Shanghai Han population of China with 436 patients with GD and 316 control subjects (allele frequency p = 0.013, OR 1.34, 95%CI 1.06-1.68 and genotype distribution p = 0.017, OR 1.44, 95%CI 1.07-1.93) [28] and in the Taiwan population with 208 Chinese GD patients and 171 healthy controls (genotype distribution p = 0.0008) [26].
The SNPs, rs231775 (Exon 1 +49G.A) [21][22][23]26,32] and rs3087243 (CT60) [16,25,28], have been previously reported to be the major susceptibility variant of GD in Europe Caucasian populations. In our present study, SNP rs231775, which was almost perfectly linked with rs231779 (D' value, 0.88,0.97 in the combined Chinese Han and three different populations), showed a significant difference between GD patients and controls in the combined Chinese Han population, Shandong population and Xuzhou population (allele frequency p = 9.39610 25 , 0.0271 and 3.47610 25 , respectively). However, the logistic regression analysis suggested that the association signal of SNP rs231775 was accounted for the susceptibility variant SNP rs231779 in our combined Chinese Han population. Unfortunately, another SNP rs3087243 (CT60) was not in Hardy-Weinberg equilibrium (Table  S1) and was removed from the final analysis, which was similar to the reported results from the Shanghai Chinese Han population [28]. Notably, SNP rs3087243 was strongly linked with SNPs rs231775 and rs231779 and the D' values were more than 0.95 in the Taiwan Chinese population [26]. Therefore, in our present study, SNP rs3087243 may be tagged by SNPs rs231775 and rs231779.
Although this study provides solid evidence for the association of the CTLA4 gene with GD in our combined Chinese Han population, it was notable that the susceptibility variants of the CTLA4 gene might vary in the patients with GD recruited from different geographic regions of China. In our case control analysis, the loci with the most significant associations to GD were located at SNP rs35219727 in the Shandong populations and rs231779 in the Xuzhou and southern China populations, respectively. Furthermore, the logistic regression analysis revealed that SNP rs35219727 might be the susceptibility variant of GD in the Shandong population. However, SNP rs231779 in the CTLA4 gene probably independently conferred GD susceptibility in the combined Chinese Han population, Shandong and southern China populations. A recent and detailed study of a Japanese GD cohort investigated the SNPs across the TSHR region and identified single SNP associations with GD primarily located in intron 7 of TSHR [11]. However, in the more recent study by Brand, no evidence of an association of the TSHR intron 7 SNPs with GD in the UK European ancestry cohort was found [20]. Interestingly, their data demonstrate that the strongest signals of association with GD were within TSHR intron 1 in the UK European ancestry cohort [20]. Recently, two independently reports have shown that some intricately substructured is exists northern Han, central Han, and southern Han in the Chinese population. However, the genetic differentiation among these clusters are very small (FST _ 0.0002,0.0009) [33,34]. In the present study, the Xuzhou city is closed to Shandong province (subjects was mainly rectuited from Jinan and Linyi cities) and both of them belongs to the central Han population. In fact, the genetic differentiation is the most small in the central Han population in China [33,34]. Thus, it is not obvious that the results were influenced by population substructure in the present study. These data, combined with our findings, suggest that the susceptibility variants of GD vary in populations from different geographic regions. More detailed analysis of different populations is needed to confirm this hypothesis.

Sample recruitment
A total of 2640 unrelated individuals with Graves' disease (GD) were recruited from different geographic regions of China. Among them, the numbers of GD individuals in Shandong Province, Xuzhou City, and southern China regions, including Shanghai and Fujian Province, were 970, 841, and 829, respectively. The control group was made up of 2204 unrelated healthy subjects from the same geographic region screened for the absence of thyroid disease. Within these subjects, 682, 818, and 704 control subjects were collected from Shandong, Xuzhou, and southern China. The diagnosis of GD was based on documented clinical and biochemical evidence of hyperthyroidism, diffused goiter, and the presence of at least one of the following items: positive TSH receptor antibody tests, diffusely increased 131 I (iodine-131) uptake in the thyroid gland, or the presence of exophthalmos. All individuals classified as affected were interviewed and examined by experienced clinicians. All subjects were Chinese Han in origin. After receiving informed consent, 5-ml blood samples were collected from all participants for DNA preparations, as well as for biochemical measurements.
Genotyping methodology, SNP selection, and quality control (QC) filters All genotyping was performed using the Mass-Array TM Technology Platform of Sequenom, Inc (San Diego, California, USA). A total of 63 SNPs were identified in the CTLA4 gene region from NCBI dbSNP (NCBI Human Genome Build 36.3). Subsequently, several procedures were taken for selecting these SNPs. Firstly, SNPs associated with GD in the previous reports and tag SNPs were selected. Secondly, SNPs were chosen with an average space of 50 bp. Finally, several SNPs were removed through the assembly of multiple-PCR primers. Accordingly, 47 SNPs in the CTLA4 gene region in Supplementary Table S1 were genotyped for association analysis in 2640 GD individuals and 2204 normal subjects, collected from different geographic regions of China. Then several steps were taken for the SNPs quality control (QC) filters. Firstly, 22 of 47 SNPs with a unique allele or minor allele frequency (MAF) of less than 1% were removed from the association analysis. Secondly, rs41265961,rs3087245, and rs7565213 were removed from the analysis because of missing data above 20%. Finally, seven SNPs with Hardy-Weinberg equilibria (HME) p#1610 26 in controls were eliminated from the analysis [29].

Statistical analysis of association
In the case-control design, allele/genotype frequencies, odds ratios (ORs), and significance values were analyzed by Chisquare analysis using SPSS (version 13.0; SPSS Inc). In order to exclude false positives, 20 neutral SNPs on different chromosomes were analyzed as genomic controls (GCs) and the GC inflation factor (l GC ) was 1.1734 [10]. All statistical results were normalized to the GC. A p-value ,0.05 was considered significant. The genotype data were further mined by logistic regression analysis, as previously described [16,31]. Linkage disequilibrium (LD) regions were analyzed by Haploview [30]. FPRP was analyzed using the FPRP calculation spreadsheet provided by Wacholder, et al [24]. The statistical power to detect an odds ratio of 1.5, with an a level equal to the reported p-value was also provided and the FPRP value for noteworthiness was preset to 0.5.