African Glucose-6-Phosphate Dehydrogenase Alleles Associated with Protection from Severe Malaria in Heterozygous Females in Tanzania

X-linked Glucose-6-phosphate dehydrogenase (G6PD) A- deficiency is prevalent in sub-Saharan Africa populations, and has been associated with protection from severe malaria. Whether females and/or males are protected by G6PD deficiency is uncertain, due in part to G6PD and malaria phenotypic complexity and misclassification. Almost all large association studies have genotyped a limited number of G6PD SNPs (e.g. G6PD202 / G6PD376), and this approach has been too blunt to capture the complete epidemiological picture. Here we have identified 68 G6PD polymorphisms and analysed 29 of these (i.e. those with a minor allele frequency greater than 1%) in 983 severe malaria cases and controls in Tanzania. We establish, across a number of SNPs including G6PD376, that only female heterozygotes are protected from severe malaria. Haplotype analysis reveals the G6PD locus to be under balancing selection, suggesting a mechanism of protection relying on alleles at modest frequency and avoiding fixation, where protection provided by G6PD deficiency against severe malaria is offset by increased risk of life-threatening complications. Our study also demonstrates that the much-needed large-scale studies of severe malaria and G6PD enzymatic function across African populations require the identification and analysis of the full repertoire of G6PD genetic markers.


Introduction
Amongst the approximately 190 genetic variants causing clinical deficiency of Glucose-6-phosphate dehydrogenase (G6PD) that have been characterised [1], the A-deficiency is the most common in sub-Saharan Africa populations, and is associated with protection from severe malaria [2,3]. An understanding of how this protection works may assist with the design of antimalarial vaccines and drugs. Establishing whether malaria patients are G6PD deficient is also important because of the potential use of 8-aminoquinoline drugs (e g, primaquine and its derivatives) for malaria elimination in sub-Saharan Africa [4]. Primaquine is active against all liver stages of Plasmodium, and also offers activity against P. falciparum gametocytes, thereby blocking transmission to mosquitoes [4]. However, primaquine is haemotoxic, and can cause haemolytic anaemia in G6PD-deficient individuals. G6PD status can be quantified using enzymatic activity assays and is required for unambiguous identification of G6PD-deficiency, especially in mosaic female heterozygotes due to the X-linkage of the trait [5]. Cytochemical methods have been suggested as an alternative [5], but are not efficient for large studies, and genotyping has been used as a high throughput approach. Whilst genotyping approaches have been advocated, there is evidence of extensive diversity at the G6PD locus (X chromosome, 16.2kb), with more than 150 single nucleotide polymorphisms (SNPs) reported [1]. Many of these known genetic variants result in amino acid changes and have been detected through sequencing the G6PD gene locus in enzyme deficient individuals. The G6PD and the Inhibitor of kappa light polypeptide gene (IKBKG, involved in immunity, inflammation and cell survival pathways [6], and with mutations linked to Incontinentia Pigmenti [7]) loci overlap each other, including a shared conserved promoter region that has bidirectional housekeeping activity [7]. The region containing the G6PD gene and the 5-prime end of the IKBKG gene contains Alu elements [7]. The genetic variability in G6PD and IKBKG is complex [7], and new alleles are still being discovered, making a simple G6PD genetic approach unreliable [8,9].
Despite these limitations, genotyping of the 202A/376G G6PD A-allele (with *12% of normal enzymatic activity [10]) has been used extensively in epidemiological studies to investigate protection against severe malaria [8,[10][11][12][13][14][15][16][17][18][19]. It has been shown that coexistence of the two mutations is responsible for enzyme deficiency in G6PD A-because they act synergistically in causing instability of the enzyme [20]. They also lead to structural changes in the enzyme protein. However, even in large well-powered studies, associations between 202A/376G G6PD and protection from severe disease have been inconsistent, revealing protective effects in female heterozygotes [8,11,17,18,19], in male hemizygotes [12,13], in both [14], or no protection [15]. These phenotype-genotype inconsistencies may be explained in part by variation in study design, G6PD and malaria phenotypic complexity and misclassification and incomplete experimental data [8]. However, it has been recognised that allelic heterogeneity, specifically other unknown polymorphisms, has a role [3,5,8], with evidence from studies in West Africa [5,8] for A-deficiency and in Southeast Asia and Oceania for other deficiency types [3]. In particular, in the West African setting, the frequency of the 202A allele is often substantially lower than rates of enzyme deficiency indicating a role for other alleles; inclusion of other G6PD polymorphisms (Santamaria 542T/376G-*2% residual enzymatic activity, Betica-Selma 968C/376G-*11% activity) [10,16] was required to capture an association between G6PD deficiency and severe malaria in The Gambia [8].
Further understanding is required of the true extent of genetic diversity within the G6PD locus, how this relates to enzyme function, and how it varies between regions and ethnic groups, if genetic epidemiological studies are to provide robust and reproducible findings. A recent study in Mali using 58 SNPs across the G6PD gene found differences in core haplotypes and their frequencies between Dogon and Fulani ethnic groups [9]. The latter group is known to have substantially reduced susceptibility to malaria when compared to sympatric populations [9]. Whilst some ethnicity specific SNP associations were observed with mild malaria, the prevalence of severe malaria was too low for any robust associations to be detected.
Here we investigate associations between 68 SNPs within the G6PD and surrounding loci (IKBKG and CTAG1A/B), including the 202, 376, 542, 680 and 968 A-deficiency polymorphisms (referred to here as G6PD202, G6PD376, and so forth), and severe malaria. The work is set within a case-control study (n = 983; 506 cases and 477 controls) conducted in an area of intense malaria transmission in the Tanga region in northeastern Tanzania [17]. To complement the case-control collection, we genotyped samples from 60 healthy parental and child trios (120 parents, 60 children), collected in the same geographical region. We find very strong associations between multiple SNPs across the G6PD gene and protection from severe malaria in female heterozygotes but not in hemizygous males. Very high linkage disequilibrium across this locus allowed us to distil this SNP diversity into just 4 G6PD alleles, ranging in frequency from *6% to >60%, and 8 common genotypes (>1%), 2 of which are associated with protection from severe malaria.
In summary, this study identifies specific G6PD alleles that confer resistance to severe malaria in this population and reveals a potentially important role of female heterozygotes in maintaining the high frequency of G6PD polymorphisms in malaria endemic populations.
The correlation between the 29 SNPs was high (linkage disequilibrium D' median (IQR): all subjects 0.987 (0.811-0.997); female controls 0.988 (0.731-0.998)). Similarly, LD was high across this region in the trio parents (all: 0.998 (0.995-0.999); female only: 0.998 (0.995-0.999)) and children (0.998 (0.996-0.999)) (S2 Fig.). This high LD allowed us to define a small number of haplotypes/G6PD alleles (4) that accounted for 99.6% of all alleles typed for the "core" region (haplotype 1 = GGGAGTC, 2 = AACGGCT (6 mutations), 3 = AACGACT (7 mutations), 4 = AGGGGCC (3 mutations)). Female controls had a higher frequency of the three haplotypes (2-4) containing mutations. Whilst protective effects were observed in females (and not males) for these three haplotypes (OR 0.683-0.783) compared to the common type (haplotype 1, frequency *60%), they were not statistically significant (P>0.186), due to the heterozygous nature of the protection in females (S4 Table). Further analysis accounting for the genotypic combinations of G6PD alleles confirmed that a combination of haplotypes 1 and either 2 or 3 were protective (OR<0.38, P<0.006) compared to a double haplotype 1 (wild-type) genotype (Table 3). This result shows that haplotypes with the 376G mutation have similar protective effect in heterozygotes irrespective of the presence or absence of the 202A mutation, indicating that the 376G mutation is causal. The genotypic combination of haplotypes 1 and 4 also had a potentially protective effect (OR = 0.599), but it failed to reach statistical significance (P = 0.11).
It is possible that the greater protective effects of haplotypes 2 and 3, could be due to the presence of more mutations (!6), leading to a possible compound heterozygous advantage effect. The number of heterozygous genotype calls in female controls was greater than in cases (case vs. control median / mean: All SNPs 10 / 9.1 vs. 7 / 7.6, P<0.001; 7 core SNPs, 3 / 3.2 vs. 0 / 2.1, P<0.0001). The Tajima's D metric was applied to assess if the excess number of heterozygous alleles led to evidence of balancing selection in the G6PD gene. There was very strong evidence of balancing selection across all groups (Tajima's D > 2.6, female controls 2.9). The magnitude of effect is at the extreme positive tail of an observed negatively centred African population distribution [21], where predominantly negative values demonstrate either slow growth from a small population size, or a bottleneck that is much older than that of non-Africans [21]. This result implies that the (high) allele frequency of the SNPs in the G6PD gene is maintained mainly, and perhaps entirely, by the protection against severe malaria of heterozygous females through a balancing selection mechanism. This selection mechanism is also predicted by population genetic theory [22], and consistent with empirical data from other studies  [8,18]. Such mechanisms exist at other malaria candidate loci in the autosomal regions, for example at the HbAS sickle trait [23]. There was no evidence of epistatic effects between HbS and G6PD on severe malaria in females (P = 0.34), nor males (P = 0.98). Similarly, no evidence of epistasis between alpha thalassaemia and G6PD (female P = 0.44; male P = 0.21).

Discussion
Although G6PD A-deficiency is known to protect against severe malaria in African populations, the underlying genetic mechanisms are not well understood. P. falciparum development is hindered in G6PD deficient red cells [24], slowing the rate of parasite replication and reducing the likelihood of severe disease. Suggested mechanisms include more efficient clearance of the infected erythrocytes [25], lower abundance of P. falciparum 6-phosphogluconolactonase mRNA in parasites from G6PD-deficient children [26], and impaired parasite replication [27]. By using the largest set of G6PD (and surrounding loci) SNPs (n = 68) in a genetic association study, within a Tanzanian case-control setting, we have established a set of new G6PD alleles associated with protection. These SNPs need to be further investigated to assess their effect on enzyme function in light of potential use of primaquine for malaria elimination. After validation, these SNPs may be used to identify G6PD-deficient individuals in studies of primaquine efficacy. Further, we have shown that the protective effect of G6PD deficiency is limited to female heterozygotes. This is entirely consistent with heterozygote advantage and balancing selection, relying on alleles at modest frequency and avoiding fixation, where protection provided by this G6PD deficiency against severe malaria is offset by increased risk of life-threatening complications, such as neonatal jaundice and haemolytic crises. In female heterozygotes, random inactivation of one of the two X chromosomes results in some cells with normal enzyme and others with mutant enzyme [11,28,29], reducing the risk of both anaemia and severe malaria. We expect that the fitness of normal male hemizygotes is the same as that of normal female homozygotes (since all red cells will contain fully functional enzyme), and population genetic theory also suggests that the fitness of G6PD-deficient male hemizygotes is the same as that of G6PDdeficient female homozygotes. Under these conditions, it is expected that the female heterozygote must be the genotype with the highest fitness [22]. Two independent studies [8,18] in two different populations, nearly 40 years apart, are consistent in this regard, with G6PD deficiency A− being a balanced polymorphism with heterozygote advantage. Similarly, as the G6PD deficiency A− has been estimated to be at least 5000 years old [3], balancing selection would account for it not having gone to fixation [22]. Further, balancing selection has been observed in autosomal malaria candidate regions like FREM3, the major histocompatibility complex, and the sickle cell trait loci [23]. Hitherto, there has been much uncertainty about the relationship between G6PD status and susceptibility to malaria, due in part to G6PD and malaria phenotypic complexity and misclassification, and potentially also from the genetic complexity of the G6PD locus with the presence of multiple functional SNPs, each of which may separately modify an individual's enzyme status and susceptibility to malaria. Until very recently, almost all-large association studies genotyped a limited number of G6PD SNPs (e.g. G6PD202 / G6PD376 for A-deficiency), and this approach has been too blunt to capture the full picture. However, analysis of 58 G6PD SNPs has demonstrated major G6PD haplotypic differences between sympatric ethnic groups in Mali [9] and genotyping of the G6PD968 polymorphism in addition to 202/376 revealed a female protective in a Gambian population [8]. With hindsight, it is clear that genotyping of G6PD968 in another study in the same population [14] would have prevented misclassification of two-thirds of the G6PD-deficient samples and the erroneous reporting of a male hemizygous protective effect. Other studies reporting male hemizygous protective effects may also be confounded by allelic heterogeneity, which could be avoided by more comprehensive genotyping and by phenotypic testing for G6PD enzyme activity. A comprehensive study would include a full genetic survey of the G6PD and surrounding regions, with multiple populations and ethnic groups, leading to a more complete map of G6PD that would guide future evolutionary and association studies.
A surprising association result is that the G6PD376 mutation is potentially more influential than G6PD202 and haplotypes that contain the 376G with or without the 202A mutation appear to be similar in terms of protective effect on heterozygotes. The 202A mutation is thought to have a more severe effect on enzyme function than the 376G mutation (*12% and *83% of normal function, respectively [10,30]) and coexistence of 202A/376G is responsible for G6PD A-enzyme deficiency [20], but it is possible that more subtle changes in enzyme structure or function also affect the outcome of malaria infection. Fully understanding the role of G6PD requires further correlation of enzymatic activity with full sequences of G6PD and surrounding loci, set within large severe-malaria case and control studies. There have been no such studies to date. A recent study of four G6PD deficiency polymorphisms (202, 376, 968, Ilesha) and associated enzymatic activities for 110 sequenced genes in African Americans [31] but included only 54 heterozygous females. Enzymatic activity for G6PD376G (A+, n = 28), 376G/202A (A-deficiency, n = 23), 376G/968C (A-, n = 1), 376G/202A/968C (A-, n = 1) and Ilesha (E156K, Nigeria, non A-, n = 1) alleles was estimated to be *83%, *53% *58%, *11% and *75% of normal, respectively. These results are consistent with deficiency increasing with additional A-related polymorphism, and by implication will change levels of protection or susceptibility to malaria. Another recent study [32] in 1,828 Kenyan children suggested that G6PD202 was responsible for the majority of G6PD enzyme deficiency but that 376G increases the risk of deficiency in 202AG heterozygotes. Neither study considered malaria outcomes.
In summary, through a much better understanding of the true extent of genetic diversity within and around the G6PD locus, we have identified alleles associated with protection from severe malaria in Tanzania, driven by a balancing heterozygous advantage mechanism. Further work should extend the mapping of diversity at this genomic region, and identify how the resulting mutations relate to enzyme function, and how they vary between region and ethnic group. In doing so, genetic epidemiological studies are likely to provide robust and repeatable data, which may be used to develop interventions, and improve malaria disease control.

Study participants
The study was conducted in the Teule district hospital and surrounding villages in Muheza district, Tanga region. In this region, mortality in children under 5 years of age is 165 per 1000 (Tanzanian census 2002) and transmission of P. falciparum malaria is intense (50-700 infected bites/person/year) and perennial, with two seasonal peaks [17]. The community prevalence of P. falciparum parasites in children aged 2-5 years in the study area was recorded as 88.2% in 2002 [17].
Severe malaria cases (n = 506), aged six months to ten years, were recruited during a oneyear period between June 2006 and May 2007, with patent parasitaemia, and fulfilling any one of the following eligibility criteria; history of 2 or more convulsions in last 24 hours, prostration (unable to sit unsupported if <9 months of age or drink at any age), reduced consciousness (Blantyre Coma scale<5), respiratory distress, jaundice, severe anaemia (Hemocue Hb < 5g/dL), acidosis (Blood lactate ! 5 mmol/L), hypoglycaemia (blood glucose < 2.5mmol/L). Cases were defined as having had cerebral malaria if their Blantyre coma score was less than or equal to 3 on presentation or early during admission. Participants with co-existing severe or chronic medical conditions (e.g. bacterial pneumonia, kwashiorkor) unrelated to a severe malarial infection were excluded. All cases were confirmed as having P. falciparum malaria parasites. Parasite infection was initially assessed by rapid diagnostic test (HRP-2-Parascreen Pan/Pf) and confirmed by double read Geimsa-stained thick blood films. Residence and ethnic group of both parents was recorded from information provided by the caregiver for each child [17].
Controls (n = 477) were recruited matched on ward of residence, ethnicity and age using household lists during a four-week period in August 2008. Study participants resided in 33 geographical wards (including Mtindiro 9.6%, Kwafungo 8.5%, Mkata 6.3%, Kwedizinga 6.0%, others each < 5.0%) surrounding Muheza town in the Tanga region. The participants had a median age of *2.6 years, and were predominantly from seven ethnic groups (see Table 1). Because of limited sample size, we did not perform a detailed analysis based on different ethnic groups or wards of residence.
To complement the case-control collection, we collected samples anonymously from 60 healthy parental and child trios (120 parents, 60 children) during 2007 and 2008 from lowland villages near the West Usambara mountains in the Tanga region of Tanzania, which ranges from high to medium levels of malaria transmission. No malaria phenotypic data is available on these individuals, but their genotypic profiles were used to provide validation data of the genetic aspects of the case-control study.

Sample collection and preparation
Approximately 3ml of venous blood was collected from participants into EDTA vacutainers. A blood film was prepared and haemoglobin levels measured by hemocue. Children in the control group with haemoglobin levels of <11g/DL were referred to the nearest health facility; those with a positive blood film were treated in line with Tanzanian national treatment guidelines and excluded from the genetic analysis. Samples were spun at 5000rpm for 5 minutes and the plasma removed and stored for future analysis. DNA was extracted and purified from the blood cell pellet using a nucleon kit (see [17] for details).

Sample genotyping
Genomic DNA samples were genotyped on a Sequenom MassArray genotyping platform [17,33]. The iPlex genotyping assays included 68 G6PD single nucleotide polymorphism (SNP) positions (identified through resequencing and the 1000 genomes project, described in [9,32]), HbS (rs334), HbC (rs33930165), HbE (rs33950507), and two SNPs that allow an estimate of the ABO blood group rs8176719, rs8176746). In particular, the rs8176719 derived allele results in a non-functional enzyme, and group O individuals are DD, while non-O Individuals are either II or ID. In addition, rs8176746 is involved in the enzyme's substrate selection and therefore defines either the A or B blood groups. A full list of SNPs can be found in S1 Table. The α 3.7 -thalssaemia deletion was typed separately by PCR [17].

Statistical analysis
All analyses involving SNPs were stratified by gender. Genotypic deviations from Hardy-Weinberg equilibrium (HWE) in females were assessed using a Chi-square statistical test. SNPs were excluded from analysis if they had at least 10% of genotype calls missing, more than 2% of males genotype calls were (falsely) called heterozygous, or if there was a distortion from HWE in female controls (HWE Chi-square P<0.00001) [9]. On this basis, 6 SNPs were excluded (rs766419, rs743545, rs743548, b36_153424319, rs2472393, b36_153426354). A further 33 SNPs with minor allele frequency less than 1% were also removed, leaving 29 high quality SNPs for association analysis (listed in S2 Table). The 29 SNPs are located in a genomic region with known regulatory capacity (transcription factor binding and DNase peaks (regulomedb.org), and both promoter and enhancer histone marks, with a number of different binding proteins and regulatory motif changes (compbio.mit.edu/HaploReg)).
Case-control association analysis using SNP alleles or genotypes was undertaken within a logistic regression framework, and included age and ethnic group as covariates. In this approach we modelled the SNP of interest assuming several related genotypic mechanisms (additive, dominant, recessive, heterozygous advantage and general models) and reported the minimum p-value from these correlated tests. Epistatic effects between polymorphisms were considered by inclusion of statistical interactions in these models. The haplotypes of females were inferred from genotypes using an expectation-maximization algorithm [34]. Haplotype association testing was performed using the regression models [34]. Linkage disequilibrium was estimated using the pairwise D-prime (and R-square) metrics [35]. Performing multiple statistical tests leads to inflation in the occurrence of false positives. A Bonferroni correction would be too conservative because all SNPs are from the same gene. A permutation approach that accounted for the correlation between tests estimated that a p-value cut-off of 0.006 would ensure a global significance level of 5%. All analyses were performed using the R statistical software. The R haplo.stat library was used to implement haplotype analysis. The Tajima's D metric was used to quantify evidence of balancing selection based on the allele frequency spectrum [36]. A negative Tajima's D indicates purifying selection and/or population size expansion, while positive values may indicate balancing selection. Values greater than +2 or less than -2 are likely to be significant [36].

Ethics
All DNA samples were collected and genotyped following signed and informed written consent from a parent or guardian. Ethics approval for all procedures was obtained from both LSHTM (#2087) and the Tanzanian National Institute of Medical Research (NIMR/ HQ/R.8a/Vol.IX/392).