The polymorphic landscape analysis of GATA1 exons uncovered the genetic variants associated with higher thrombocytopenia in dengue patients

The current study elucidated an association between gene variants and thrombocytopenia through the investigation of the exonic polymorphic landscape of hematopoietic transcription factor—GATA1 gene in dengue patients. A total of 115 unrelated dengue patients with dengue fever (DF) (N = 91) and dengue hemorrhagic fever (DHF) (N = 24) were included in the study. All dengue patients were confirmed through detection of NS1 antigen, IgM, and IgG antibodies against the dengue virus. Polymerase chain reaction using specific primers amplified the exonic regions of GATA1 while Sanger sequencing and chromatogram analyses facilitated the identification of variants. Variants G>A (at chX: 48792009) and C>A (at chX: 4879118) had higher frequency out of 13 variants identified (3 annotated and 10 newly recognized). Patients carrying either nonsynonymous or synonymous variants had significantly lower mean values of platelets compared to those harboring the reference nucleotides (NC_000023.11). Further analyses revealed that the change in amino acid residue leads to the altered three-dimensional structure followed by interaction with neighboring residues. Increased stability of the protein due to substitution of serine by asparagine (S129N at chX: 48792009) may cause increased rigidity followed by reduced structural flexibility which may ultimately disturb the dimerization (an important prerequisite for GATA1 to perform its biological activity) process of the GATA1 protein. This, in turn, may affect the function of GATA1 followed by impaired production of mature platelets which may be reflected by the lower platelet counts in individuals with such variation. In summary, we have identified new variants within the GATA1 gene which were found to be clinically relevant to the outcome of dengue patients and thus, have the potential as candidate biomarkers for the determination of severity and prognosis of thrombocytopenia caused by dengue virus. However, further validation of this study in a large number of dengue patients is warranted. Trial Registration: number SLCTR/2019/037.

The current study elucidated an association between gene variants and thrombocytopenia through the investigation of the exonic polymorphic landscape of hematopoietic transcription factor-GATA1 gene in dengue patients. A total of 115 unrelated dengue patients with dengue fever (DF) (N = 91) and dengue hemorrhagic fever (DHF) (N = 24) were included in the study. All dengue patients were confirmed through detection of NS1 antigen, IgM, and IgG antibodies against the dengue virus. Polymerase chain reaction using specific primers amplified the exonic regions of GATA1 while Sanger sequencing and chromatogram analyses facilitated the identification of variants. Variants G>A (at chX: 48792009) and C>A (at chX: 4879118) had higher frequency out of 13 variants identified (3 annotated and 10 newly recognized). Patients carrying either nonsynonymous or synonymous variants had significantly lower mean values of platelets compared to those harboring the reference nucleotides (NC_000023.11). Further analyses revealed that the change in amino acid residue leads to the altered three-dimensional structure followed by interaction with neighboring residues. Increased stability of the protein due to substitution of serine by asparagine (S129N at chX: 48792009) may cause increased rigidity followed by reduced structural flexibility which may ultimately disturb the dimerization (an important prerequisite for GATA1 to perform its biological activity) process of the GATA1 protein. This, in turn, may affect the function of GATA1 followed by impaired production of mature platelets which may be reflected by the lower platelet counts in individuals with such variation. In summary, we have identified new variants within the GATA1 gene which were found to be clinically relevant to the outcome of dengue patients and thus, have the potential as candidate biomarkers for the determination of severity and prognosis of thrombocytopenia caused by dengue virus. However, further validation of this study in a large number of dengue patients is warranted. Introduction Dengue, the mosquito-transmitted viral disease, caused by dengue virus (DENV) has been the reason for the global pandemic for decades, and recently it has spread across all regions of the world. Approximately 390 million people are at greater risk of getting dengue infection each year [1]. Clinical dengue fever (DF) progresses with a sudden onset of high fever after an incubation period of 3-15 days (normally 5-8 days) [1,2]. However according to World Health Organization (WHO), nearly half of the dengue virus infected patients may develop dengue hemorrhagic fever (DHF) with plasma leakage [3]. Increased vascular permeability and thrombocytopenia (< 150K cells/μL) are the causal factors of these hemorrhagic manifestations [4]. Although platelet count ranging between 30,000 to 50,000 cells/μL, occasionally manifests as purpura, platelet count less than 30,000 cells/μL may cause bleeding with minimal trauma [5]. Subsequently, when the platelet level drops below 5,000 cells/μL, severe outcome of dengue infections occurs with spontaneous bleeding [6]. In 2009, WHO recommended that a significant drop in platelet count or a current platelet count of less than 150,000/mm 3 is one of the markers of waning dengue infection [7]. Thrombocytopenia develops as a result of platelet dysfunction, which includes increased platelet activation [8], clot formation [9], apoptosis [10], and inflammatory cytokine production [10][11][12]. However, understanding DENV pathogenesis remains a difficult task, as further studies are required to fully comprehend the virus's complex interactions with its host. It is proven that several risk factors including host genetic background [13] can modulate the severity of DENV infection. As a result, variation in host genetics particularly in the genes involved with thrombopoiesis may potentiate the complications associated with dengue infection.
In normal physiological process, the subsequent differentiation of hematopoietic stem cells leads to the production of the megakaryocytes (MKs) which further develops to generate mature platelets and the differentiation and maturation of MKs are regulated by several transcription factors [14,15]. Among the transcription factors, GATA1 is considered to be the master transcription factor [16]. GATA1 consists of two different untranslated first exons, IT and IE, and five translated exons, II to VI [4,17]. It is located in X-chromosome (Xp11.23). The mature protein comprises of two zinc finger domains and one activation domain [18,19]. The role of GATA1 protein in megakaryocytes is to recruit transcriptional cofactors, such as FOG (friend of GATA1) to megakaryocyte-expressed genes like NFE2. NFE2 acts as a regulator of proplatelet formation by promoting the final stage of maturation of MKs [20]. Similarly, GATA1 cooperates with Fli-1 for the activation of genes like CD41, CD42b, and GPIX which are associated with the terminal differentiation of megakaryocytes [21]. Substantial studies have documented that mutations in the GATA1 gene results in X-linked thrombocytopenia with thalassemia and X-linked thrombocytopenia with dyserythropoietic anemia. As a result, GATA1-mutated patients have decreased number of platelets with varying degrees of anemia and irregular red blood cell morphology [22].
Infections caused by dengue virus, hepatitis B virus, hepatitis C virus as well as diseases like Down syndrome are associated with thrombocytopenia. Genetic variants of certain GATA1 gene can lead these people to a state of higher risk and thus, preclinical analysis can help to take necessary steps before the impact of the diseases occurs. So, the objectives of this study are to (i) analyze entire exonic regions of the GATA1 gene in dengue patients suffering from thrombocytopenia, (ii) identify the relationship between different variants of the GATA1 gene with thrombocytopenia, and (iii) elucidate the impact of variants on the three-dimensional structure of GATA1 followed by its function. This study will facilitate finding out the underlying genetic cause of thrombocytopenia related to dengue virus infection with respect to host genetics which may ultimately contribute to early prediction of patients with higher risk of thrombocytopenia in terms of disease progression and prognosis.

Ethics statement
This study is an extra work of a Phase II Clinical Trial, which was approved by the ethical review committee of the Bangladesh Medical Research Council (ID: BMRC/NREC/2019/171) and registered with the international clinical trial registry, number SLCTR/2019/037 (https:// slctr.lk/trials/slctr-2019-037). Written consent from each participant was obtained prior to blood collection.

Study design
A total of 115 patients were recruited from Dhaka Medical College Hospital, Dhaka; AMZ Hospital, Dhaka and Better Life Hospital, Dhaka as part of the clinical trial (international clinical trial registry number: SLCTR/2019/037) [23]. Patients who demonstrated clinical manifestations of dengue virus infection like leukopenia, increased hematocrit, reduced platelet count, headache, bone pain, rash were suspected as dengue patients according to the guideline of the World Health Organization [24]. They were confirmed through the positive results of dengue specific antigen (NS1) and antibody (IgM/IgG) tests. Both viral antigens NS1 as well as IgM and IgG antibodies were detected by commercially available kits that used enzyme-linked immunosorbent assay (ELISA) as described in our recently completed study [23]. Pregnant women, patients with thrombocytopenia (due to causes other than dengue), AST/ALT level greater than 5 times of the normal upper limit, past portal vein thrombosis, infection with HCV,HVB or having chronic liver disease, history of taking immunosuppressive therapy and severe co-morbidity were excluded from the study (https://slctr.lk/trials/slctr-2019-037).
The patients were then grouped into two classes: dengue fever (DF) and dengue hemorrhagic fever (DHF) according to the suggestion of the World Health Organization as described in our recent work [24]. Patients were considered to have moderate dengue who exhibited high fever (40˚C/104˚F) along with two of the cautionary signs including abdominal pain and tenderness, persistent vomiting, clinical fluid accumulation, mucosal bleeding, lethargy, restlessness, and liver enlargement, increase in hematocrit followed by quick reduction of platelet count. On the other hand, patients who had one of the following symptoms: plasma leakage leading to shock or respiratory distress, severe bleeding, or organ failure (eg, elevated liver enzyme levels, impaired consciousness, or heart failure) were grouped into DHF [25,26]. The outline of the methodology of the study is shown in Fig 1.

Blood collection and hematological analyses
After getting written consent from each participant, five milliliters of blood was collected in an EDTA containing vacutainer tube with the help of expert phlebotomists. Blood was transferred to the laboratory of population genetics using a sample carrier. Complete Blood Count (CBC) analyses were performed using the Sysmex XN-2000 Hematology Analyzer that uses the direct current sheath flow detection method. From CBC analyses RBC Count, hematocrit level, mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), RBC distribution width (RDW), total WBC count, differential count including percentages of neutrophils, lymphocytes, monocytes, eosinophils, basophils and eosinophils were determined.

DNA extraction, polymerase chain reaction and determination of entire coding sequence of hematopoietic transcription factor GATA1
Genomic DNA was extracted from the cellular fraction of the collected blood, quality and quantity of the extracted DNA were evaluated according to our previous methods [27][28][29]. Primers to amplify exonic regions of GATA1 were designed using Primer3 web-based platform [30]. GATA1 gene consists of 6 exonic regions. Prior to sequencing, a total of 4 primers covering 5 exons (from 2 to 6) were designed to amplify exonic regions (except exon 1 because it remains untranslated) [17,31]. The list of the primers including sizes of each amplicon has been shown in S1 Table. Each pair of primer was obtained from IDT, USA. After dilution of primers, polymerase chain reaction was optimized for each set of primer using the conditions as demonstrated in S2 Table.

Sequencing of PCR amplicons and analyses of chromatograms
A total of 460 amplicons for four pairs of primer sets targeted to amplify the entire GATA1 exonic regions were subjected to Sanger sequencing. After analyzing the chromatograms, to increase the confidence of base call, total forty samples were resequenced using fresh PCR amplicons. The chromatograms of the amplicons were analyzed by aligning and comparing the results with the reference sequence of GATA1, NC_000023.11 that was retrieved from NCBI (www.ncbi.com). Sequences of the reference exonic regions containing known genetic variation [identified by SNPmasker 1.1 [32]] were also used to compare with the sequences. The alignment and analyses of the sequences were performed using Geneious Prime software version 2020.0.2 (https://www.geneious.com/)).

Assessment of the functional impacts of nonsynonymous mutations
Functional effects of nonsynonymous mutations found within the exonic regions of GATA1 were analyzed using different web-based tools. First, SIFT was used to predict tolerated and deleterious substitutions for every position of the GATA1 nucleotide sequence [33]. Then, PolyPhen-2 was used to predict possible impact of an amino acid substitution on the structure

PLOS NEGLECTED TROPICAL DISEASES
Variants in GATA1 exonic regions and thrombocytopenia in dengue patients and function of GATA1 protein [34]. After that, SNPs&GO was used to predict GATA1 point mutations that has the potential to become single handedly reasonable to cause disease in human [35]. Followed by this, PhD-SNP was utilized to classify GATA1 point mutations into disease related and neutral polymorphism class [36]. Subsequently, PANTHER-PSEP was used to estimate the likelihood of a particular nonsynonymous coding SNP to cause a functional impact on GATA1 protein [37]. Finally, I-Mutant 2.0 was used to analyse the protein stability and alterations by taking into account the single-site mutations on GATA1 protein structure [38].

Prediction and validation of three-dimensional structures of GATA1 protein
At first, to validate the effects of the nonsynonymous SNPs on the structure and function of GATA1 protein, the 3D model of GATA1 protein was generated using RaptorX web server tool. The three-dimensional structures were visualized using the PyMOL Molecular Graphics System, Version 2.4.1 Schrödinger, LLC. The quality of the predicted three dimensional structure of GATA1 protein was validated first using PROCHECK. Ramachandran Plot of the modelled structure was generated for the assessment of the overall quality of the structure to be accepted for further use. After that, PROSA was used to check 3D models of GATA1 protein structures for potential errors. ERRAT2 was used to assess the quality of the structure by comparing with highly refined structures that is available in database. Followed by that, MIS-SENSE3D was used to observe the effect of a missense variant on GATA1 protein structure through different parameters including relative solvent accessibility (RSA), disulfide bond breakage, charge introduction, secondary structure alteration, H-bond breakage, cavity alteration etc. Finally, MODELLER was used for mutant protein structure modeling by using the initial modelled 3D structure as template and the structures were then compared by measuring distances between wild amino acids' atoms and mutant amino acids' atoms with nearby amino acids' atoms [39].

Statistical analyses
The results were expressed as mean±SEM (Standard Error of Mean) for continuous variables and as a percentage for categorical variables. To compare the differences between different variables obtained from the two groups of dengue patients with thrombocytopenia, data were analyzed using IBM SPSS Statistics for Windows, Version 23.0. Armonk, NY: IBM Corp. and GraphPad Prism version 8.0.0 for Windows, GraphPad Software, San Diego, California USA, www.graphpad.com. A p-value of less than 0.05 was considered statistically significant.
Association of platelet counts with the GATA1 exonic region variants was performed using R programming language. The differences between the mean platelet counts of the mutant heterozygous genotype and the wild type homozygous were calculated. The association was also adjusted for age and dengue fever type (DF and DHF). The number of study participants were stratified into five groups based on the number of mutations they harbored. Association of the platelet counts with these groups was conducted using R. The association was also adjusted for age and dengue fever type (DF and DHF).

Results
Out of 115 dengue patients enrolled in this study, 80 were male (69.57%) and 35 were female (30.43%). All the patients were dengue infected and had thrombocytopenia on their first day of hospitalization. Detection of non-structural protein NS1 as well as determination of the levels of immunoglobulin G (IgG), and immunoglobulin M (IgM) were used to confirm dengue virus infection. Thrombocytopenia was confirmed using platelet count. According to the guidelines of the World Health Organization [23], 91 patients (79.13%) were suffering from dengue fever (DF) while 24 patients (20.87%) had dengue hemorrhagic fever (DHF) among the total participants. Table 1 represents the data of complete blood analyses of the study participants. The average age of the patients was 27.18±0.78 years, where the DHF occurs in relatively older (p = 0.001) people with a mean age of 32.21±2.21 years compared to DF whose average age was 25.86±0.75 years. All the patients in the study had thrombocytopenia, with the average platelet count 60.18±3.66 K cells/μL. The mean platelets of patients with DHF (48.58 ±6.86 K cells/μL) were lower than that of DF 63.24±4.21 K cells/μL though statistically insignificant. Among the study participants, DHF occurred in older patients compared to their DF counterparts. The comparison of other blood parameters between DHF and DF has been shown in Table 1.
Among 91 DF patients, 65 (71.4%) were male and 26 (28.6%) were female. Gender-based distribution of complete blood and hematological parameters have been presented in Table 1 for patients with DF. It was concluded that the study participants had an almost similar age of 25.6±0.92 years and 26.5±1.27 years in male and female, respectively. However, the female patients had higher levels of platelets (75.15±8.01 K cells/μL) compared to that of male

PLOS NEGLECTED TROPICAL DISEASES
Variants in GATA1 exonic regions and thrombocytopenia in dengue patients counterparts (58.48±4.86 K cells/μL), which was not statistically significant. All other parameters except platelet count were within the normal reference values as shown in Table 1. Table 1 shows the differences in the other blood parameters between male and female DF patients. Out of 24 patients with DHF, 15 (62.5%) were male and 9 (27.5%) were female. Inferring from Table 1, both male and female participants belong to almost similar age group. Also, the mean level of platelets between the male and female patients with DHF did not vary significantly. Table 1 shows the differences in the other blood parameters between male and female DHF patients.

Genetic analyses of the exonic regions of GATA1
GATA1 gene is located in the Xp11.23 position of X chromosome. The gene consists of six exons among which exon number 1 acts as promoter region and exon number 2-6 are encoded into mature protein. Four sets of primers (primer sets 1, 2, 3 and 4) targeting and covering exon 2 to 6 of the GATA1 gene were designed. The primers generated amplicons of 300bp, 690bp, 239bp and 705bp for GATA1. The amplicons generated by the primer sets were visualized via agarose gel electrophoresis and have been presented in Fig 2. Upon sequencing the amplicons using Sanger chemistry, the chromatograms were analyzed using the software Geneious Prime (v2020.2.4). The reference sequence NC_000023.11 was retrieved from the NCBI database and the chromatograms were analyzed using Geneious software. In S1A-S1M  Table 2 exhibits their respective frequency distribution in total patients, patients with DF and DHF as well as annotation in database.
Chromatogram analyses of the 115 dengue patients, revealed the presence of 13 variants in 22 patients. In 93 patients, no variation was observed at any position when compared to that of the reference sequence and these patients were referred to as "Wild type". All the 13 variants were heterozygous and found in 22 females which is quite logical as the GATA1 gene is located on the X chromosome. Among these variants, rs937198370 (C>T), rs184815507 (G>A), rs145355350 (G>A) have already been reported in the database while compared with the reference sequence NC_000023.11. Each variant was identified in these dengue patients with a similar frequency of 0. Frequencies of these two variants were higher in patients with DHF than DF in this study. Out of 13 variants identified, 6 (46.15%) variants caused synonymous mutations while 7 (53.85%) variants resulted in the change of amino acid residues. Thus, 13 variants had been observed among the study participants and the types of variants, their respective location, annotation change of codon, amino acid substitution, exon number, and mutation type have been presented in Table 3.  wild type variants (those harboring the same sequence as the reference nucleotides) were 58.23 ±7.9 K cells/μL and 60.65±4.14 K cells/μL, respectively. Statistical analyses revealed that the values did not vary significantly between the groups. Further analyses were performed after   Fig 3B). Thus, despite there were no significant different in platelet count between patients with non-synonymous mutation genotype and wild genotype, DHF patients with non-synonymous mutation genotype had significantly lower platelet count compared to DF patients with non-synonymous mutation genotype.

Analyses of platelet counts in dengue patients with respect to highly frequent variants of GATA1 gene
Among the 13 variants observed in the GATA1 gene, G>A variant in chX: 48792009 position and C>A variant in chX: 48792118 position showed a higher frequency of 10.43% and 7.0% in study participants. Moreover, 6 patients were identified who had both the variants. The platelet count for patients with G>A (at chX:48792009), C>A (at chX:48792118), patients harboring both variants and wild-type variants were 35.08± 4.15 K cells/μL, 31.38± 5.26 K cells/μL, 24.33± 3.25 K cells/μL and 60.65± 4.14 K cells/μL, respectively. The mean difference of platelets in patients harboring G>A variant located at chX: 48792009 and C>A variant located at chX: 48792118 varied significantly from that of the individuals with wild-type nucleotides. These two variants were still significantly associated with reduced platelet counts even after adjusting age and dengue fever type as shown in Table 4. No association of platelet counts with the groups stratified according to mutation was found with or without adjustments (S4 Table). The group that contained individuals harboring zero mutations was used as reference during the analysis by considering it as wild-type. The differences in the mean platelet counts of individuals harboring one, two, three or four mutations and mean platelet counts with respect to individuals harboring no mutations were calculated. Thus, patients harboring the one or both of the two highly frequent G>A (at chX:48792009), C>A (at chX:48792118) mutations, showed statistically lower platelet count compared to wild genotype patients.

Effect of nonsynonymous mutations identified within the exonic regions on the GATA1 protein and their probable association with disease
A total of seven mutations were identified that caused change in amino acid. The impact of the nonsynonymous mutations on the structure of GATA1 was investigated using web-based tools that include SIFT, PolyPhen-2, I-Mutant 2.0, PhD SNP, SNPs&GO and PANTHER-P-SEP. SIFT server predicted the variation P21H and H289D to be damaging but the rest 5 mutations were termed as "tolerated". The variants S26T and H289D were considered to be damaging by the PolyPhen-2 server (Table 5). While the other mutations P21H, S91L, G99S, S129N and Q262H were considered to be benign. A similar approach was used for PhD SNP, SNPs&GO and PANTHER-PSEP tools for predicting effect mutation on GATA1. The data has been presented in S3 Table. It was found that variant H289D was recognized as an outcome of the tools used. Other variants showed varying effects from neutral effects to disease association in different tools. G99S was 'Damaging' in PANTHER-PSEP and PhD SNP, while S91L was found to be disease-causing in PhD SNP. Other mutations were found to be neutral. The stability of the mutated proteins was analyzed using I-Mutant 2.0 where S91L and Q262H mutations predicted to increase the stability of the protein and the rest of the variants showed a decrease in stability. Thus, through different in silico analysis, it had been observed that the non-synonymous S26T, S91L, G99S, H289D had the potential of disease progression in most web-based platforms.

Analyses of the impact of the mutation on the three-dimensional structure of GATA1 protein
Amino acid sequence of human GATA1 was retrieved from UniProt available at www. uniprot.org (ID: P15976). The three-dimensional structure of GATA1 was predicted using ab initio modelling approach using RaptorX structure prediction web-based platform.
In PROSA, the GATA1 protein was placed in the "X-ray" region with a Z score of -6.86. This interpreted that the modelled structure had a similar structure to the protein whose three-dimensional structure has been discovered by X-ray crystallography method. From using ERRAT2, 81.25% score was achieved for the GATA1 protein model. The score was considered low, as good high-resolution structures generally produce values of more than 95%. Overall, the modelled structure of GATA1 has been considered for use as the original structure has not yet been resolved by NMR or X-ray crystallography method. Fig 4 represents the prediction of the three-dimensional structure of GATA1 protein followed by its validation. The quality of the predicted structure was checked using PROCHECK, Ramachandran Plot, PROSA and ERRAT2 web-based software. From PROCHECK, structure analyses in the Ramachandran plot revealed that a total of 84% amino acid residues of the predicted structure of protein were within the most favored region while 16% of them laid within the additional allowed region. A good quality model would be expected to have over 90% in the most favored region. The modelled structure was then used as a template in Missense3D to measure the Relative Solvent Accessibility (RSA), and cavity volume alteration of the mutated amino acids. For P21H mutation, the cavity volume was contracted by 98.28 Å 3 (angstrom cube, a unit to measure volume). Since it is more than 70 Å 3 , this mutation can alter the protein structure. G99S mutation resulted in structural alteration also due to replacement of structural glycine amino acid. However, no significant RSA change in the mutational amino acids occurred to be considered as a protein structure modifier from MisSense3D. Thus, according to Missense3D, the nonsynonymous mutations P21H and G99S had the potential to generate structural alteration of GATA1 protein and affect the function of the protein.

Comparative three-dimensional structural analyses between wild type and mutant protein of GATA1
Computational homology modelling tool, Modeller, was applied to generate mutated GATA1 protein models with variations found in this study and the modelled structures were refined using GalaxyWeb, a protein structure prediction and refinement web server. The mutated structures were then aligned with the modelled structure using Pymol Software and Root-Mean-Square Deviation (RMSD) score for the mutation was achieved. The surface structure along with the functional domains are presented in Fig 5. Total 7 non-synonymous mutations were observed in the GATA1 gene. Among them, P21H and S26T mutations were found in the AD domain of the protein while Q262H was within the C-ZF domain. Other mutations including S91L, G99S, S129N and H289D were found in regions outside the functional domains of GATA1 protein. The RMSD value along with change in interaction with neighboring amino acids for the variants is presented in Table 6.

PLOS NEGLECTED TROPICAL DISEASES
The substitution of proline by histidine at position 21 within AD domain has resulted in relaxing the structure of the domain as the distance with nearby interacting amino acids Asp20 and Pro88 increased for mutant amino acid while the RMSD value after superimposition with wild-type GATA1 structure was 0.797. The distance of of Pro21 (wild type) with nearby Asp20 and Pro88 residues were 4.4 Å and 4.3 Å while for His21 (mutant) the values were 7.4 Å and 5.4 Å respectively (Fig 6A). For S26T mutation, much change in structure has not been observed and the distance with nearby residue (Pro28 and Phe33) remained almost the same while wild type as well as mutant amino acid was compared and the RMSD score also remained low when superimposed (Fig 6B). S91L mutant showed higher structure relaxation and the distances with neighboring amino acids (Tyr69 and Pro92) was increased and has been shown in Fig 6C and the RMSD score was 1.215.
The mutation G99S was placed inside the protein structure and the distance of glycine in 99th position with neighboring Trp96, Tyr104 and Tyr185 residues were 3.9 Å, 4.0 Å and 5.1 Å as seen in Fig 6D. The distances increased to 5.6 Å, 4.4 Å and 6.6 Å due to substitution of serine with glycine, respectively. The surface structure for G99S mutation resulted in causing the alteration of the structure with a RMSD score of 1.488 when the structures were superimposed. The distance for mutant asparagine at position 129 with nearby Asp123, Thr130 and Phe132 residues were 3.4 Å, 8.1 Å and 3.9 Å, respectively. Replacement of asparagine with serine decreased the distances to 3.8 Å, 7.0 Å and 4.1 Å for wild GATA1 protein structure, accordingly ( Fig 6E) and the RMSD value was 1.464 for the mutant. The Q262H mutant was found within the C terminal Zinc finger domain (C-ZF). Substation of glutamine with histidine resulted in expansion of the surface volume. The distance of glutamine with neighboring residues Ile249 and Thr259 were 6.3 Å and 3.8 Å respectively, which increased to 9.6 Å and 6.6 Å for substituted histidine residue, as shown in Fig 6F. However, the RMSD score was low when the mutant Q262H was superimposed with wild GATA1 modelled structure. The position of H289D mutation was near the end of the C-ZF domain but the mutation did not vary the structure much as seen from the Fig 6G. The distance between wild and mutant amino acids with nearby Val291, Gln290, Tyr286 and Tyr285 didn't had much difference.

Discussion
In 2019, Bangladesh encountered its largest dengue virus (DENV) epidemic affecting more than a hundred thousand people. Although most patients were asymptomatic, a large sum of patients demonstrated dengue fever (DF) along with dengue hemorrhagic fever (DHF)/dengue shock syndrome (DSS) with different degrees of thrombocytopenia. However, the mechanisms and host genetic factors that lead to thrombocytopenia during dengue infection and eventually DHF have not been determined, and a great challenge in the early identification of patients who are more likely to progress to a worse health condition. This study was aimed to investigate potential variations within entire exonic regions of GATA1 gene, one of the master transcription factor for platelet production, in dengue patients with thrombocytopenia and thus, establishes a relationship between gene variants and disease severity. Although all study participants had thrombocytopenia, DHF patients showed a lower mean of platelets than that of DF, which was statistically insignificant. However, thrombocytopenia is a constant feature and one of the diagnostic criteria of DHF. A study on the Brazilian population also supported that patients with DHF had significantly lower platelet counts than patients with DF [40]. Further, though patients had bleeding manifestations, they did not develop any symptoms of plasma leakage which could ultimately lead to dengue shock syndrome as reflected by the values of hematocrit and hemoglobin which were found to be within the reference ranges. Parameters of patients with DF and DHF were also compared independently on the basis of gender as shown in Table 1. Hematocrit, hemoglobin and levels of RBC showed statistically significant differences between male and female individuals in the groups of patients with DF and DHF but the mean values of these parameters were found to be within the reference ranges in both genders. For analyzing genetic variations within the GATA1, all exonic regions covering exon 2 to 6 of the GATA1 gene were amplified using template genomic DNA from all the 115 patients. Exon 1 was excluded as it contains promoter regions for the gene and thus, it remains untranslated. The chromatograms obtained through Sanger sequencing were subjected to further analyses to identify the genetic variations. From the analysis, 13 variants were identified in 22 female patients and all the mutants were heterozygous. No heterozygosity were observed in male patients and this is because GATA1 is located in chromosome X. In fact no polymorphisms were observed in any of the 13 positions where variants were observed in females. Since no polymorphism was observed in males and none of the mutants were homozygous in females, there is a possibility that these mutations are deleterious in nature. According to Ensembl (https://asia.ensembl.org/) database, a total of 6334 single nucleotide polymorphisms (SNPs) have been reported to be identified within the GATA1 gene, among which 537 are missense variants. Till to date, a total of 66 SNPs within GATA1 have been reported to be associated with different diseases including thrombocytopenia and Diamond-Blackfan anemia [41]. Among them, V205M, G208R, D218Y, G208S, D218G, D218N, R216Q, R216W, R224L, D237H, Q237H, S329R, G356V have been reported to be associated with thrombocytopenia [22,[42][43][44][45][46][47] However, variations concerning other amino acids have not been elucidated yet and the high frequent variants observed from this study may have the potential to exacerbate thrombocytopenia during dengue infection. Out of 13 variants, 7 were non-synonymous and 6 were synonymous. The variations found in the exons 2, 3 and 5 of the GATA1 gene. While 3 variants (one nonsynonymous and two synonymous) were found to be reported in the NCBI database, 10 were recognized as new variants. Among the annotated variants, no disease association was revealed so far. Variants G>A at chX: 48792009 and C>A at chX: 48792118 were found as high frequent variants and were respectively present in 10.43% and 7.0% of the total study population. The subsequent changes of amino acids in GATA1 protein due to the mutations recognized have been displayed in Table 3. The variants with the highest frequencies, G>A at chX: 48792009 and C>A at chX: 48792118 showed nonsynonymous (S129N) and synonymous effects, respectively. The nonsynonymous variants were further analyzed to investigate their probable impacts on the functions of proteins using different web-based approaches and it was revealed that only substitution of histidine by glutamic acid at position 289 (H289D) showed to be deleterious with respect to all the tools used in the study as shown in Tables S3 and 5. This may be caused by the complete shift of charge from positive to negative that led to changed interaction pattern between the neighboring amino acids followed by the biological function of protein.
Later, the three-dimensional structure of GATA1 was modelled using RaptorX and then, quality of the modelled structure was evaluated through different in silico platforms that has been portrayed in Fig 4A. The modelled protein had 84% residues in the most favorable region according to the Ramachandran plot ( Fig 4B) and Z-Score was found to be -6.86 (Fig 4C). A well-structured model should have more than 95% residues in the most favorable region. But proteins whose full structure has not yet been resolved from X-ray crystallography or NMR, computational structures that have more than 80% residues in the most favorable region of the Ramachandran plot can be considered as a good structure [48]. As the predicted structure of protein had more than 80% residues in the desired region and the Z score lied in the region with established structures, the modelled structure was subjected to further analyses. Further, the ERRAT2 (a measure of the quality of non-bonded interactions) quality score 81.25 ( Fig  4D) reassured the overall quality of the modelled structure as well as reliability of the prediction. The impact of the mutations on the three-dimensional structure of the protein was first analyzed in MisSense3D and later, the surface structure, as well as side chain interactions, were analyzed using Pymol.
The nonsynonymous variant with high frequency (G>A at chX: 48792009) was present in 7.7% of patients with dengue fever and in 20.8% patients with dengue hemorrhagic fever. It conferred a nonsynonymous effect on GATA1 by changing the codon (AGC>AaC) and replacing serine with asparagine at the 129 th amino acid position. Though, variant S129N was predicted to be neutral in different in silico approaches but patients harboring S129N mutation had a significantly (p = 0.02) lower platelet count (35.08±4.15 Kcells/μL) compared to the platelets of patients (60.65±4.14 Kcells/μL) carrying wild-type nucleotide as that present in the reference sequence of GATA1 (NC_000023.11) and same result was observed even after adjusting age and dengue fever type (Table 4). However, I-Mutant 2.0 predicted that the mutation tends to increase the stability of the protein. The three-dimensional structure analyses from Fig 6E, revealed that the substitution of serine to asparagine leads to a change in the structure of the protein GATA1 with a RMSD score of 1.464. The distance of wild type amino acid (Ser129) from nearby Asp123, Thr130 and Ser131 residues were 3.8 Å, 7.0 Å and 4.1Å. But the distances reduced to 3.4 Å and 3.9 Å for Asp123 and Phe 132 while increased about 1.1 Å for Thr130 when serine was substituted by asparagine. Though serine and asparagine are uncharged polar amino acids, the side chain of asparagine (-CH2-CO-NH2) is relatively large compared to that of serine (-CH2-OH). Thus, asparagine was shown to be more exposed towards outer surface compared to serine and decreased the cavity volume. As from Fig 6E, serine at position 129 (Ser129) lies close to the N-terminal activation domain of GATA1 protein and mutation to asparagine can hamper the function of the domain by affecting the interaction pattern with the nearby residues. The function of the N-terminal activation domain is to form homodimer as well as heterodimer with GATA2 [49]. We hypothesized that increased stability (S3 Table) of the protein due to substitution of serine by asparagine may cause increased rigidity followed by reducing structural flexibility which may ultimately disturb the dimerization process of the GATA1 protein. This, in turn, may affect the function followed by impaired production of mature platelets which may be reflected by the lower platelet counts in individuals with such variation.
A synonymous (GGC-GGa) variant with high frequency obtained from this study was C>A at chX: 48792118. The variation was present in 5.5% of dengue fever and 12.5% of dengue hemorrhagic fever participants (Table 2). Patients with this variant had significantly (p = 0.03 and 0.04, respectively before and after adjustments, Table 4) lower platelet count (31.38±5.26 K cells/μL) compared to that of platelet count (60.65±4.14 Kcells/μL) measured in patients who harbored reference nucleotide (NC_000023.11) at the same position. It is important to note that even synonymous variations can lead to altered expression of protein due to codon bias [50]. From, GenScript Codon Usage Frequency Table Tool (www.genscript.com), the frequency of the wild-type codon GGC is 34% while the mutant codon GGa is 25% for Homo sapiens. The more frequent the codon usage is, the higher the availability of the respective tRNA is found during protein production [50]. Thus, change to a less frequent codon can decrease the production of GATA1 protein due to the lower availability of tRNA which respond to the changed codon. As a result, synonymous variation to less frequent codon could lead to altered expression of GATA1 protein which eventually may affect thrombopoiesis followed by thrombocytopenia. Furthermore, the mean value of platelet measured in 6 patients harboring both the high frequent variants (G>A at chX:48792009 and C>A at chX:48792118) was 24.33±3.25 K cells/μL which was lower than the individuals having either of the single variants. This suggests that patients with these two variants are more prone to thrombocytopenia. Impact of these variants on the activation domain of GATA1 and probable effect on the expression of GATA1 may support the underlying cause of clinical outcome of the study participants reflected by the values of platelets.

Conclusion
In this study, new variants have been identified in the exonic regions of GATA1 gene. Of them, G>A at chX: 48792009 and C>A at chX: 48792118 were highly frequent and patients harboring any one or both of the mutations showed severe thrombocytopenia. From, in silico analyses it was also observed that the nonsynonymous mutation exerted by G>A at chX: 48792009 had further impact on the structure and function of GATA1 protein. Thus, due to the importance of the GATA1 gene in thrombopoiesis and in dengue severity, this study needs to be further validated in a large number of populations residing in different geographical regions.