Genetic variation of the transcription factor GATA3, not STAT4, is associated with the risk of type 2 diabetes in the Bangladeshi population

Type 2 diabetes mellitus is a multifactorial metabolic disorder caused by environmental factors and has a strong association with hereditary issues. These hereditary issues result in an imbalance in CD4+T cells and a decreased level of naïve CD4+T cells, which may be critical in the pathogenesis of type 2 diabetes. Transcription factors GATA3 and STAT4 mediate the cytokine-induced development of naïve T cells into Th1 or Th2 types. In the present study, genetic analyses of GATA3 SNP rs3824662 and STAT4 SNP rs10181656 were performed to investigate the association of allelic and genotypic variations with the risk of T2D in the Bangladeshi population. A total of 297 unrelated Bangladeshi patients with type 2 diabetes and 247 healthy individuals were included in the study. The allelic and genotypic frequencies of rs10181656 located in the STAT4 gene were not found to be associated with risk of type 2 diabetes. The GATA3 rs3824662 T allele and mutant TT genotype had a significant association with the risk of T2D [OR: 1.52 (1.15–2.02), X2 = 8.66, p = 0.003 and OR: 2.98 (1.36–6.55), X2 = 7.98, p = 0.04, respectively]. Thus, the present study postulates that the genetic variation of the transcription factor GATA3, not STAT4, is associated with the risk of type 2 diabetes in the Bangladeshi population.


Introduction
Diabetes is a multifaceted metabolic disorder caused by impaired glucose metabolism characterized by hyperglycemia and is mainly classified as type 1 mellitus and type 2 diabetes mellitus. Both environmental and hereditary components play pivotal roles in the onset of diabetes [1,2]. Also, risk of type 2 diabetes is higher in certain ethnic groups [3] Impaired glucose metabolism in type 1 diabetes is due to the complete destruction of beta cells, while in the case of type 2 diabetes, this phenomenon arises due to insulin resistance and beta cell dysfunction. Although type 1 diabetes has long been considered an autoimmune disorder, researchers now suggest redefining type 2 diabetes as a disease of the immune system rather than a purely a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 metabolic disorder [4,5]. Based on observations at the genetic level, the candidate loci for type 1 and type 2 diabetes appear to be primarily distinct, and susceptible genes for these diseases and have generally not been shown to overlap [6]. Loss of β cell mass and function is the major phenomena marking the development of both type 1 diabetes and type 2 diabetes. β cell dysfunction-mediated imbalance of insulin sensitivity, followed by the development of insulin resistance, triggers pathogenesis of type 2 diabetes [7]. Thus, as both diseases are due to abnormal β-cell function and destruction, it has been thought that type 1 diabetes and type 2 diabetes may share a degree of common genetic predisposition [8]. Many new loci associated with type 2 diabetes have been revealed through screening of genome-wide association studies [9,10].
T2D has known imbalances of CD4 + T cells and decreased levels of naïve CD4 + T cells, which in turn play an important role in the pathogenesis of type 2 diabetes. GATA 3 binding protein (GATA3) and Signal Transducer and Activator of Transcription-4 (STAT4) transcription factors mediate the cytokine-induced development of naïve T cells into Th1 or Th2 type. GATA3 consists of six exons and encodes a transcription factor with two transactivation domains and two zinc finger domains [11], which are essential for early T cell development [12]. Researchers found that GATA3 polymorphism is significantly associated with the susceptibility of pediatric B-lineage acute lymphoblastic leukemia [13][14][15][16]. Also, studies have demonstrated the upregulation of Th1 cells in adipose tissue and peripheral blood in prediabetic and type 2 diabetic individuals [17]. While a declining trend of naïve CD4 + T cells [18] and imbalance of CD4 + T cell subsets, including Treg, Th1, and Th17 [19], have been observed in patients with type 2 diabetes. GATA3, a transcription factor, is a master regulator of Th2 [20] that controls differentiation of CD4 + T cells. Expression of GATA3 is important to avoid cell death. The SNP rs3824662 mapping is within intron 3 of the transcription factor and putative tumor suppressor gene GATA3. Genome-wide association studies have reported an association of the GATA3 rs3824662 polymorphism (T allele) with childhood acute lymphoblastic leukemia [9] as well as in adolescents and young adults [21]. Studies with animal models have demonstrated that the disappearance of regulatory T cells is a signature of insulin resistance in obese mice models [22]. In addition, impaired expression of GATA3 is associated with obesity [23]. Mutations in the GATA3 gene are linked with several diseases, such as breast cancer, hematological carcinomas and hypoparathyroidism-deafness-renal (HDR) dysplasia syndrome [21,24]. Signal transducer and activator of transcription 4 (STAT4) transduces signals of cytokines to modulate many innate and adaptive immune responses that may contribute to autoimmune responses [25]. STAT4 rs10181656 has been found to be associated with autoimmune diseases like rheumatoid arthritis and systemic lupus erythematosus [26].
An association of specific alleles of transcription factors with type 2 diabetes has been established in other studies that suggested a modulation of this association through obesity, a pivotal risk factor for type 2 diabetes. The rs7903146C>T single nucleotide polymorphism in the transcription factor seven-like 2 (TCF7L2) gene is the locus most strongly associated with type 2 diabetes in different populations, such as Icelandic individuals [27] and individuals of European descent [28]. Polymorphisms present in other genes that encode for transcription factors such as Foxp3, TBX21, and STAT4 are associated with the risk of type 1 diabetes but not with type 2 diabetes [29,30]. More surprisingly, data from genome-wide association studies have revealed that genes associated with the risk of cancer are also risk factors for type 2 diabetes but not for type 1 diabetes [31]. Considering these findings, the present study was undertaken to investigate the allelic and genotypic variations in type 2 diabetic patients in the Bangladeshi population and thus evaluate the probable risk association with the disease.

Subject selection and sample collection
A total of 544 unrelated Bangladeshi individuals were enrolled in this study after getting their consent. Of the 544 participants, 297 (54.6%) individuals were in the patient group with type 2 diabetes, and 247 (45.4%) were healthy individuals. The study was conducted according to the guidelines approved by the ethical review committee of the Faculty of Biological Sciences, University of Dhaka, Bangladesh. After obtaining their verbal consent, an expert phlebotomist collected five to six milliliters of blood from each individual. Plasma samples were used to analyze fasting plasma glucose, glycated hemoglobin (HbA1C), creatinine and alanine aminotransferase using standard methods. Type 2 diabetic patients were confirmed clinically using the levels of fasting plasma glucose (FPG) and HbA 1 C, according to the criteria set by the World Health Organization (WHO). Healthy individuals included students, university employees, and hospital personnel who did not show clinical features of diabetes or other complications, such as acute or chronic infection or kidney and liver diseases. Pregnant women were also excluded from this study. Anthropometric and demographic data such as gender, height, weight, systolic blood pressure (SBP), diastolic blood pressure (DBP), and family history of diabetes were collected and recorded in a well-defined questionnaire.

Extraction of genomic DNA and analyses of genotypes
Cellular fractionation of the collected blood was used to extract genomic DNA. The quantity and quality of the extracted DNA were verified according to our previous method [32,33]. Amplification of the target regions of GATA3 and STAT4 for studying rs3824662 and rs10181656 polymorphisms, came from a web-based primer designing tool available at http:// bioinfo.ut.ee/primer3-0.4.0 as shown in Table 1. Finally, primer sequences were obtained from IDT (USA).
Allele-specific primers were used for the amplification of the GATA3 gene. G or T allelespecific primer is very sensitive and binds only to the DNA fragments containing either G or T nucleotide template DNA. Forward outer and reverse outer primers were designed to amplify a specific 691 base pair region of GATA3. This 691 base pair region contains the desired polymorphic site of interest. Allele-specific primers produce 506 bp products upon the presence of either a G allele or T allele. Briefly, PCR was performed in a total volume of 15 μL with an initial denaturing step of 5 min at 95˚C, followed by 40 cycles of 30 s at 95˚C, 45 s at 55˚C, and 1 min at 72˚C, and a final extension step of 5 min at 72˚C. Primer concentration used was 200 nM each. The PCR products were separated on an ethidium bromide-stained 2.5% agarose gel, visualized with UV light, and photographed. This protocol has been deposited in protocols.io that has been assigned a DOI number http://dx.doi.org/10.17504/protocols.io.k45cyy6.
Upon PCR using specific primers (presented in Table 1) for the STAT4 gene, a 157 bp amplicon that contained the desired polymorphic site for rs10181656 was found. After gel Table 1. Primer sequences for amplifying GATA3 and STAT4 genes to study rs3824662 and rs10181656 polymorphisms.

Gene
Primer sequence Product size electrophoresis, the PCR products were stored at 4˚C and subjected to restriction digestion to analyze whether the polymorphic site contains either a G or C allele. G is the ancestral allele, while C is the mutant allele. Similarly, PCR was performed in a total volume of 15 μL with an initial denaturing step of 5 min at 95˚C. Again, followed by 40 cycles of 30 s at 95˚C, 45 s at 55˚C, and 1 min at 72˚C, and a final extension step of 5 minutes at 72˚C. The PCR products were separated on an ethidium bromide-stained 2.5% agarose gel, visualized with UV light, and photographed. After performing polymerase chain reaction to determine the presence of the rs10181656 polymorphism in the STAT4 gene, 157 base pair PCR products were digested with the DdeI (C'TNAG) (Thermo Fisher Scientific, USA) restriction enzyme according to the manufacturer's instructions and run in a 2.5% agarose gel electrophoresis stained with ethidium bromide. The DdeI cleaved the 157 base pair PCR products into two fragments of 102 base pairs and 55 base pairs when the G allele was present (GG), whereas the C allele was not digested and displayed only the 157 base pair band (CC). In the case of heterozygous GC, 157 base pair, 102 base pair and 55 base pair bands were found. The digestion took 1 hour at 37˚C for completion.

Statistical analyses
The results were expressed as the mean±SD for continuous variables and % for categorical variables. Independent Student's t-tests were performed to compare the differences between variables from the control and diabetic patients.  Table 2.

Demographic, anthropometric and clinical data of study participants
The mean level of fasting plasma glucose in the study participants with type 2 diabetes was 9.14±1.10 mmol/L, while this value in healthy individuals was measured to be 4.27±1.20 mmol/L. The estimated value of glycated hemoglobin in the patients with diabetes was 10.19 ±2.04%, while in control individuals, this value was estimated to be 4.49±1.07%.The mean values of creatinine and activity of alanine transaminase were higher in diabetic patients (1.18 ±0.61 mg/dL, 42.94±3.41, respectively) than in healthy controls (0.94±0.27 mg/dL, 36.35±3.09, respectively). The data on the clinical parameters of the male and female study participants in the two groups are presented in Table 2.
polymerase chain reaction for the polymorphism study. The allele-specific primer pair was the forward inner primer, which produced a band of 506 base pairs with the reverse outer primer. The reaction mixture contained both forward outer and reverse outer primers and contained either the G allele-specific forward inner primer or the T allele-specific forward inner primer. When the G allele was present in a specific DNA sequence, only the 506 base pair band was found by gel electrophoresis of PCR amplicons with the G allele-specific primer, as shown in lane 3-14 in Fig 1. In this case, no such 506 base pair band was found by gel electrophoresis with the T allele-specific primer, as shown in lanes 3-14 in Fig 1. On the other hand, for

Distribution of genotypes and alleles regarding rs3824662 polymorphisms
It was revealed that the distributions of genotypic frequencies (GG, GT, TT) were 51.85%, 39.06%, and 9.09% and 61.94%, 34.41%, and 3.64% in type 2 diabetes patients and control individuals, respectively. For type 2 diabetic patients and control individuals, the frequency distribution of the ancestral G allele was 71.38% and 79.15%, respectively, while for the mutant T allele, it was 28.70% and 20.85%, respectively. Table 3 summarizes the genotypic and allelic distributions of rs3824662 in the GATA3 gene. Statistical analyses revealed that both the mutant allele and genotypes containing the mutant allele were significantly associated with the risk of type 2 diabetes, as shown in Table 3. Further, the distributions of genotypic frequencies (GG, GT, TT) were 51.18%, 39.37%, and 9.44% and 61.61%, 34.95%, and 3.88% in male type 2 diabetes patients and control individuals, respectively. For these study participants, the frequency distribution of the ancestral G allele was 70.87% and 78.64%, respectively, while for mutant T allele, it was 29.13% and 21.36%, respectively. Table 4 summarizes the genotypic and allelic distributions of the rs3824662 polymorphism in male type 2 diabetes patients and control individuals. Statistical analysis revealed that the mutant allele of rs3824662 in GATA3 gene was associated with the risk of type 2 diabetes in male participants. However, genotypic frequencies were not found to be associated with disease risk. In female participants, it was found that the distributions of genotypic frequencies (GG, GT, TT) were 52.35%, 38.82%, and 8.82% and 62.5%, 34.03%, and 3.47% in patients with T2D and healthy individuals, respectively. Their respective frequency distribution of the ancestral G allele was 71.76% and 79.51%, while for the mutant T allele, it was 28.24% and 20.49%, respectively. Table 5 summarizes the genotypic and allelic distributions of rs3824662 in the GATA3 gene in female participants. Statistical analyses revealed that both the mutant allele (T) and rare genotype (TT) with the mutant allele were associated with the risk of type 2 diabetes.

Evaluation of the amplified PCR products of the STAT4 gene
To confirm the sizes of the PCR amplicons of the STAT4 gene from healthy individuals and T2D patients, a 100 base pair DNA ladder (Cleaver Scientific Ltd, UK) was used. Lane NC represents the negative control, which was subjected to polymerase chain reaction without template DNA. Different DNA bands are presented in Fig 2. The presence of the rs10181656 polymorphism in the STAT4 gene was determined after digesting 157 base pair PCR products with DdeI (C#TNAG) (Thermo Fisher Scientific, USA) restriction enzyme followed by 2.5% agarose gel electrophoresis and ethidium bromide staining, as shown in Fig 3. The DdeI cleaved the 157 base pair PCR products into two fragments of 102 base pairs and 55 base pairs when the G allele was present, whereas the C allele was not digested and displayed only the 157 base pair band. In the case of heterozygous GC, 157 base pair, 102 base pair and 55 base pair bands were found.

Distribution of genotypes and alleles regarding rs10181656 polymorphisms
It was revealed that the distributions of genotypic frequencies (CC, CG, GG) were 60.27%, 29.63%, and 10.10% and 57.49%, 33.19%, and 9.31% in type 2 diabetic patients and control individuals, respectively. For type 2 diabetic patients and control individuals, the frequency distribution of the ancestral C allele was 75.09% and 74.09%, respectively, while for mutant G allele, it was 24.91% and 25.91%, respectively. Table 6 summarizes the genotypic and allelic distributions of rs10181656 in the STAT4 gene. Statistical analyses revealed that neither allelic nor genotypic variations showed an association with the risk of type 2 diabetes. Further, the distributions of genotypic frequencies (CC, CG, GG) were 62.12%, 28.79%, and 9.09% and 57.27%, 32.73%, and 10.0% in male type 2 diabetic patients and control individuals, respectively. For these participants, the frequency distribution of the ancestral C allele was 76.52% and 73.64%, respectively, while for the mutant G allele, it was 23.48% and 26.36%, respectively. Table 7 summarizes the genotypic and allelic distributions with respect to the rs3824662 polymorphism in male type 2 diabetic patients and control individuals. Allelic and genotypic frequencies were not found to be associated with the risk of type 2 diabetes in male participants. In the case of female participants, it was found that the distributions of genotypic frequencies (CC, CG, GG) were 58.79%, 30.30%, and 10.91% and 57.66%, 31.39%, and 10.95% in patients with type 2 diabetes and healthy individuals, respectively. Their respective frequency distribution of the ancestral C allele was 73.94% and 73.36%, while for the mutant G allele, it was 26.06% and 26.64%, respectively. Table 8 summarizes the genotypic and allelic distributions of rs10181656 in the STAT4 gene of female participants. Neither allelic nor genotypic variation was found to be associated with the risk of type 2 diabetes in female participants.

Discussion
Diabetes mellitus is a complex polygenic metabolic disorder that is caused by environmental and genetic factors. Prevalence rates differ across ethnicities, with the prevalence in the Bangladeshi population projected to reach 13% by the year 2030. In Bangladesh, almost 60% of the total adult population is under the age of 40 years. With the growing economy of the country, the health of the abundant workforce should be addressed. According to the World Health Organization, non-communicable diseases (NCDs) are silent killers that cause 70% of all deaths in the world, corresponding to 40 million people per year, with people from developing and low-and low-middle-income countries suffering the most. Among the NCDs, diabetes is the 4 th largest cause of morbidity and mortality in the world population [34].  Non-genetic environmental factors have been attributed to the onset of diabetes in different ethnic groups. However, even living in the same environment, different ethnic groups have different prevalence rates of diabetes, which is evidence of an association of genetic factors with diabetes [35]. Different risk alleles or SNPs, either independently or cumulatively, have shown their association with the incidence of diabetes in different populations including in the Asians [36][37][38][39][40] Moreover, intronic variants, the hidden treasures within our genome, have been identified to be associated with different inherited diseases with varying degrees of risk. It is now well scripted that mutations in the exon-intron splice junctions are associated with the pathogenicity of diseases and accounted for~10% of genetic mutations (HGMD1; http:// www.hgmd.org) [41]. A meta-analysis of genome-wide association data revealed relationship of six new loci in European descent with the pathogenesis of type 2 diabetes while another study identified association of 18 SNPs that represented only 6% proportion of heritable linkage [42,43]. Intronic regions contain functional polymorphisms along with pathological  44,45], evidence for an association of SNPs in the Bangladeshi population is very inadequate. Thus, in the present study, an effort is made to determine the association between T2D risk and SNPs present in GATA3 and STAT4 transcription factors. When total study participants were considered, out of three genotypes, only the heterozygous variant (rs3824662 GT) independently had a significant association with the risk of disease [odds ratio 2.98 (95% CI 1.36-6.55), p<0.01], while the over-dominant model had no effect. Even when either one allele or both alleles were mutated, the study participants were at risk of type 2 diabetes, which is reflected in the dominant model (comparing GG vs GT+TT) [odds ratio 1.51 (95% CI 1.07-2.13), p = 0.018] or recessive model (comparing GG+TT vs TT) [odds ratio 0.37 (95% CI 0.17-0.82), p = 0.011] ( Table 3). When study participants were grouped according to their gender, the rs3824662 T allele was associated with the risk of type 2 diabetes in both male and female patients (Tables 4 and 5). However, female participants showed an association with the risk of T2D when co-dominant and dominant models were considered, while none of the models showed any association with disease risk in male participants. Moreover, though mutant genotype rs3824662 TT inclined towards its association with the risk of type 2 diabetes only in female participants, however, a definitive remark can only be drawn from a study with large and almost equal number of male and female participants. The mechanism by which the mutant T allele in GATA3 is associated with the risk of diseases is still unclear. However, by analyzing the SNP rs3824662 using a 3DSNP database, it was revealed that this SNP is associated with 20 different SNPs identified within intronic sequences of GATA binding protein 3 (S1 Table). 3DSNP comprehensively annotates the regulatory function of human non-coding SNPs by investigating three-dimensional interactions with genes and genetically associated SNPs mediated by chromatin loops. According to functionality scores, it was further observed that rs3824662 is involved in modulating regulatory motifs such as enhancers, transcription factor binding motifs, promoter regions (S2 Table). Thus, due to association with a large number of SNPs and regulatory motifs, rs3824662 may alter the expression pattern of the GATA3 gene, which in turn may affect regulatory T cells, followed by onset of type 2 diabetes through developing insulin resistance.
Although diabetes mellitus has been broadly classified as type 1 and type 2, increasing evidence has shed light on the fact that these two diseases overlap in many aspects. For example, classical immunological parameters for type 1 diabetes, such as anti-islet cell antibodies, elevated circulating cytokines and chemokines, are also present in many patients with type 2 diabetes [46,47]. In addition, obesity, which is associated with insulin resistance and type 2 diabetes, shows strong correlations with the increased incidence of type 1 diabetes [48,49]. Signal transducer and activator of transcription 4 (STAT4) is a transcription factor that is involved in modulating immune functions by transducing signals as a result of interaction between cytokines/chemokines and receptors. STAT4 is activated by IL-12 and has been found to be associated with both type 1 and type 2 diabetes. Protection of STAT4 activation inhibits the development of autoimmune diabetes or type 1 diabetes in non-obese diabetic mice [30,50]. As a transcription factor, STAT4 has appeared to be a major regulator of T-cell activation, macrophage inflammatory phenotype, insulin resistance and atherosclerosis [51,52]. Further, STAT4 mediates inflammatory responses in immune cells and adipocytes in diabetes and obesity. STAT4 also has a role in skin wounds, as demonstrated in a typical type 2 diabetic mouse model [53].
The STAT4 gene polymorphism is associated with increased risk for the development of early onset type 1 diabetes [30] but not for type 2 diabetes on the island of Crete, a well-defined area with a genetically homogeneous population [54]. A meta-analysis study reported a significant association of the STAT4 rs7574865 polymorphism with the risk of type 1 diabetes [55]. It was further revealed that both Asians and Caucasians with the STAT4 rs7574865 polymorphism have an increased diabetes risk. Although involvement of STAT4 activation has been an overlapping clinical feature for both type 1 diabetes and type 2 diabetes, the role of rs10181656 in STAT4 is not yet clear and remains to be elucidated in the Bangladeshi population. SNP rs7574865 is located within intron 3 of a noncoding region of STAT4. STAT4, through its interaction with IL-12, plays pivotal role in TH¬1 immune responses and IFN-γ transcription. It is suspected that rs7574865 may influence the gene expression of STAT4 at the level of transcription and variant splicing [56], or it may be linked to causative mutations. Interestingly, a 3DSNP database revealed that rs10181656 is in linkage disequilibrium with 17 other SNPs present in the intronic sequences of STAT4 (S3 Table) and causes change in motifs (S4 Table). Most importantly, our SNP of interest has been found to be associated with the extensively studied, type 1 diabetes-associated STAT4 SNP rs7574865. Genetic variants of STAT4 may be involved in regulating the balance of IL-12 versus IL-23 effects and may affect the prevalence of inflammatory diseases via dysregulation of TH1 and TH17 differentiation. However, allelic and genotypic variations were not found to be associated with the risk of type 2 diabetes in study participants. Moreover, none of the co-dominant, dominant, recessive, or over-dominant models of the STAT4 rs10181656 polymorphism in the total study sample or in the male or female participants with type 2 diabetes showed an association with the risk of disease (Tables 6, 7 and 8).
Our findings demonstrate that genetic variation of the transcription factor GATA3, not STAT4, is associated with the risk of type 2 diabetes in the Bangladeshi population. The mechanism behind these roles needs to be investigated further. Moreover, studies with a large number of study participants from different ethnic populations are warranted to confirm the findings of this study.
Supporting information S1