Case-Control Approach to Identify Plasmodium falciparum Polymorphisms Associated with Severe Malaria

Background Studies to identify phenotypically-associated polymorphisms in the Plasmodium falciparum 23 Mb genome will require a dense array of marker loci. It was considered promising to undertake initial allelic association studies to prospect for virulence polymorphisms in Thailand, as the low endemicity would allow higher levels of linkage disequilibrium (LD) than would exist in more highly endemic areas. Methodology/Principal Findings Assessment of LD was first made with 11 microsatellite loci widely dispersed in the parasite genome, and 16 microsatellite loci covering a ∼140 kb region of chromosome 2 (an arbitrarily representative non-telomeric part of the genome), in a sample of 100 P. falciparum isolates. The dispersed loci showed minimal LD (Index of Association, ISA = 0.013, P = 0.10), while those on chromosome 2 showed significant LD values mostly between loci <5 kb apart. A disease association study was then performed comparing parasites in 113 severe malaria cases and 245 mild malaria controls. Genotyping was performed on almost all polymorphisms in the binding domains of three erythrocyte binding antigens (eba175, eba140 and eba181), and repeat sequence polymorphisms ∼2 kb apart in each of three reticulocyte binding homologues (Rh1, Rh2a/b, and Rh4). Differences between cases and controls were seen for (i) codons 388-90 in eba175, and (ii) a repeat sequence centred on Rh1 codon 667. Conclusions/Significance Allelic association studies on P. falciparum require dense genotypic markers, even in a population of only moderate endemicity that has more extensive LD than highly endemic populations. Disease-associated polymorphisms in the eba175 and Rh1 genes encode differences in the middle of previously characterised erythrocyte binding domains, marking these for further investigation.


Introduction
A wide and unexplained spectrum of clinical disease caused by the malaria parasite Plasmodium falciparum is responsible for approximately one million deaths each year, mostly in African children in highly endemic populations [1] but also in adults in areas of lower endemicity [2]. Experimental studies have elegantly demonstrated polymorphic virulence traits within rodent malaria parasite species [3][4][5], and such virulence polymorphism is theoretically expected in P. falciparum [6]. Genes encoding proteins that interact with host cell receptors and immune responses are candidates as virulence determinants. There are occasional reports of allele frequency differences between severe and mild malaria isolates, for commonly genotyped polymorphic antigen loci [7][8][9][10][11][12][13], although associations differ among studies and it is possible that studies showing no differences are unreported. Some attention has focused on sequence motifs in variable multi-copy (var) genes that determine infected erythrocyte adhesion phenotypes and are clustered near the telomeres of most of the 14 chromosomes and subject to ectopic recombination and re-positioning [14,15]. The extent of genetic polymorphism throughout the rest of the genome that displays more conventional Mendelian inheritance [16] is now being more effectively surveyed [17][18][19]. Given the imminent potential of large scale genotyping, there is a need to address whether a genome wide allelic association approach is feasible to identify loci in P. falciparum which affect the severity of malaria [20].
An important consideration is that a spectrum of P. falciparum genetic population structures exist, depending on endemic infection levels that determine the amount of mixing between unrelated male and female gametocytes, and consequently the effective recombination rate [21]. The highest effective recombination rates (and thereby lowest levels of linkage disequilibrium, LD) are in endemic areas of Africa where multiple genotype infections are common, while intermediate rates typically occur in Southeast Asia, and South American populations of lowest endemicity tend to have a more 'clonal' structure with occasional recombination [21][22][23][24]. On top of this inter-continental trend, there is a variety of population structures in the diverse endemic foci of Southeast Asia [21,25] and South America [21,24], and evidence that continental Africa may contain some populations with a relatively low effective recombination rate [22,26].
Because of this spectrum, despite limited knowledge of variation in LD throughout the P. falciparum genome [27], it could be useful to first screen for allelic associations in a population of low to moderate endemicity (in which LD will tend to be more extensive), and subsequently attempt confirmation and finer mapping in a more highly endemic population with less LD. Thailand is a country mostly free of malaria, but it still contains some areas of moderate endemicity from which large numbers of severe as well as mild malaria cases have been admitted into well managed hospital facilities [28].
Here, to investigate the potential for allelic association studies in Thailand, an analysis of linkage disequilibrium was first undertaken with a sample of 100 P. falciparum clinical isolates, using one set of microsatellite loci widely distributed in the genome, and another set clustered within a ,140 kb region of one chromosome. Results show that LD is weak and rarely significant above distances of ,5 kb, so allelic association requires genotype information from within genes that contain causally relevant variants. The second phase of the study then involved a disease association analysis comparing a sample of 113 severe malaria cases and 245 mild malaria controls for polymorphisms in six candidate merozoite stage genes (three eba and three Rh genes) previously shown to be important in determining erythrocyte invasion phenotypes in vitro [29] and thus potentially related to parasite growth and virulence in vivo.

Ethics Statement
The proposal was approved by the Ethics Review Committee of the Faculty of Tropical Medicine, Mahidol University. Participants gave written informed consent to provide a blood sample for studies including analysis of parasite DNA.

P. falciparum DNA samples from malaria cases
Patients with P. falciparum malaria presented at the Hospital for Tropical Diseases, Mahidol University, Bangkok. For the initial study of linkage disequilibrium, blood samples were selected from 100 patients who presented in 1999 with uncomplicated malaria, having acquired infection in the Thai-Myanmar border region. For the severe malaria case-control study, 113 patients with severe malaria as defined by criteria from WHO that were admitted to the Hospital for Tropical Diseases, and 245 mild malaria patients with uncomplicated non-severe malaria (termed 'mild malaria') were recruited in 2002 and 2003. Mild malaria patients were recruited either from the ward or from the outpatient department of the Hospital for Tropical Diseases, matching to a severe case by residential location. Exclusion criteria included pregnancy, mixed malaria species infection, HIV infection, and reported prior treatment with any antimalarial drug. After consent, 5 ml of venous blood was drawn from each participant, of which 1 ml was collected in an EDTA tube and stored at 220uC prior to extraction of DNA using QIAamp DNA Blood Mini Kits (Qiagen).

Analysis of multi-locus LD
For each locus in each isolate, a single allele (the only allele detected, or the predominant allele in an isolate containing more than one genotype) was counted. For the analysis of multi-locus LD among the 11 widely separated microsatellites, the standard index of association (I S A ), was calculated and tested for departure from random allelic association using LIAN 3.0 software [31]. Multiple-clone isolates containing more than one allele at any locus were not included in this analysis. For the 16 chr 2 microsatellites, the D9 pairwise index of linkage disequilibrium (LD) was calculated using alleles that had a frequency of .0.1, and mean values were derived for each pair of loci with multiple alleles. Data on all isolates were incorporated, but individual pairwise genotype data points were excluded if both loci had mixed alleles in a given isolate. The relationship between LD and distance between nucleotide sites was plotted, and the proportion of pairs of loci that had statistically significant LD was compared for increasing distance categories by Fisher's exact test.

Genotyping polymorphisms in the eba and Rh genes
Polymorphic sites in eba175 (chr 7), eba140 (chr 13), and eba181 (chr 1) were studied (Figure 2A), including 11 loci in eba175 (9 separate SNPs, a pair of closely situated SNPs, and a run of SNPs next to an indel), 4 SNP loci in eba140 gene, and 3 SNP loci in eba181 gene, by allele sequence-specific oligonucleotide probing of PCR products (PCR-SSOP) following methods described for other loci [22] (specific primer and probe sequences and conditions are given in Supplementary  Table S2). A set of polymorphic simple sequence repeat loci was chosen from allele sequence alignments of the Rh1 gene (chr 4), the identical part of the adjacent Rh2a and Rh2b genes (chr 13), and the Rh4 gene (chr 4) ( Figure 3A). Oligonucleotide primers and fluorescent labels used are listed in Supplementary Table S3. Size standards (500 LIZH, Applied Biosystems, UK), were run together with PCR products on an ABI PRISM TM 3730 Genetic Analyzer, and the GeneMapper TM 3.0 program (Applied Biosystems, UK) was used for automated measurement of allele length and peak height.

Multi-locus LD analysis of microsatellites on different chromosomes
Complete genotype data for 11 widely separated microsatellite loci ( Figure 1A) were obtained for 98 of 100 Thai P. falciparum isolates tested, with numbers of alleles per locus ranging from 4 in Pfg377 up to 17 in Polya (allele frequencies are shown in Supplementary Figure S1). Fifty of these isolates had single alleles at each of the loci indicating unmixed haploid genotype infections, and these were suitable for analysis of multilocus linkage disequilibrium, as they exclude possibility of scoring a false haplotype from the presence of mixed genotypes. The standardised index of association (I S A ) (a multi-locus test of LD among all the 11 loci) showed a low value, I S A = 0.013, not significantly different from the randomly expected value of zero (P = 0.1), indicating very little LD amongst unlinked loci.
LD among microsatellite loci in a 140 kb region of chr 2. The next analysis of LD in the same set of 100 Thai isolates focused on a representative non-telomeric part of the genome for which a dense set of microsatellite markers could be identified (as described in the Methods section). For 16 microsatellite loci within a 140 kb region of chr 2 ( Figure 1B), numbers of alleles per locus ranged from 2 (M3596) to 20 (M6554) (allele frequencies are shown in Supplementary Figure S2). For each of the 120 pairs among the 16 loci, the D linkage disequilibrium index was plotted (using the values for each pair of alleles that had frequencies of .0.1 and taking the mean wherever there were multiple alleles at either or both loci). As expected, the strength and significance of linkage disequilibrium was inversely related to physical distance ( Figure 1C). The 120 pairs of loci were ordered into four equal sized categories (n = 30) representing the quartiles of pairwise data (,4.5 kb, 4.5-29 kb, 29-68 kb, and .68 kb), revealing highly significant declines in significant LD throughout the distance range ( Figure 1D). There was a much higher proportion of significant LD in the smallest distance category of ,4.5 kb (93%) than over the next distance category of 4.5-29 kb (63%, P = 0.0051). Moreover, the proportion of significant tests in the 4.5-29 kb category was much higher than in the next category of 29-68 kb (30%, P = 0.0097), and the latter proportion was higher than in the .68 kb category (3%, P = 0.0061).
Allelic association of candidate genes in a severe malaria case-control study As the average gene density in the P. falciparum genome is one every ,4.5 kb [32], LD extending beyond individual genes will generally be very weak, so genome-wide approaches will need adequate markers within every gene. The next stage of the present study focused on polymorphisms within a set of candidate virulence genes. Six genes encoding merozoite ligands that use alternative receptors for erythrocyte invasion were investigated in a case-control study of severe malaria (n = 113 cases) compared with mild malaria (n = 245 controls). For three eba genes (eba175, eba140 and eba181), most of the previously identified polymorphisms encoding amino acid changes within the erythrocyte binding domain (Region II) of each protein were genotyped (Figure 2A). For three Rh genes (Rh1, Rh2a/b, and Rh4) microsatellite loci within the intron and coding sequence of each gene were identified and genotyped ( Figure 3A). A single allele per locus per isolate (the only detectable allele, or most abundant allele in isolates which contain more than one genotype) was counted for the analysis comparing frequencies between severe and mild malaria groups.
The case and control groups were well matched for most variables, but there were frequency differences between the groups in gender, year of recruitment, and history of having malaria previously (Table 1), so these variables were incorporated in a logistic regression analysis with any polymorphisms that emerged as different between the groups in a univariate analysis (Tables 2  and 3). After this adjustment, there were significant differences between severe and mild malaria groups for one polymorphism in eba175 ( Figure 2B and Table 2) and one in Rh1 ( Figure 3B and Table 3). The eba175 allele KP at codons 388 and 390 was significantly more common in severe than mild malaria (45% versus 32%), with an odds ratio (OR) of 1.93 (95% CI, 1.18-3.17) compared to the common allele (P = 0.009). The 99bp allele in the polymorphic repeat centred on codon 667 of Rh1 was more common in severe than mild malaria (12% versus 4%), with an OR of 3.14 (95% CI 1. 20-8.19) compared to the most common allele (P = 0.02). In the 59 region of Rh2a/b there were alleles that had borderline significant univariate associations with severe malaria (in the intron and the microsatellite centred on codon 747), but these did not remain significant after adjustment for potential confounding by the logistic regression analysis ( Table 3). The allele frequencies for all other polymorphisms in these genes and in eba140, eba181, Rh4 were similar between the groups. The Rh2 locus contains 2 adjacent genes (a and b) with most sequence in common and the two polymorphisms typed here are in the region with haplotypes shared between the genes by frequent gene conversion (the region specific to locus a or b was not investigated). B. Frequencies of the alleles in severe malaria cases (rare alleles with a frequency ,10% in this group are not shown) and mild malaria controls. Significant difference between the groups is shown with an asterisk (*). doi:10.1371/journal.pone.0005454.g003 The two significant associations were examined further. In eba175, codons 388/390 KP are always present with particular alleles at flanking polymorphisms, consistent with a recent sequence analysis in which an indel at codons 401-402 (IS/-) was in complete LD with 336 D, 388/390 KP, and 403-5 ENK in Thailand [33], but not in a Kenyan [33] or Nigerian [34] population. Here, haplotypes were derived from genotypes of single clone infection isolates (excluding any isolates with mixed alleles detected). After adjusting for potential confounding, the most common codon 336-405 haplotype DKPISENK was over-represented among severe malaria cases (40%, 41/102) compared with mild malaria controls (30%, 69/231), with an OR of 1.68 (95% CI, 1.01-2.81; P = 0.047 compared with other haplotypes combined). Frequencies of two other common haplotypes DNS-KKM and YNSISKNK were not different from those in the mild malaria group. In Rh1, the polymorphism centred on codon 667 is not in strong LD with the flanking polymorphisms typed that are respectively ,2 kb upstream (intron repeat) and ,2 kb downstream (repeat centred on codon 1309).

Discussion
An earlier survey of unlinked microsatellites in parasites sampled over a single month in a small area on the Thai-Burma border (Shoklo) showed a higher index of association (multilocus LD) than seen here, but this difference disappeared when only unique haplotypes were analysed in that study, indicating a locally 'epidemic' population structure [21]. The absence of such a population structure in the present study is more typical of a broad population sample that would be suitable for association studies. The modest LD in this study, rarely extending beyond 5 kb, confirms that genome-wide allelic association studies will require a very dense set of markers covering all genes. A current effort has already discovered .50,000 SNPs [17][18][19] (data available at www. plasmodb.org), and further discovery is ongoing with an aim to eventually design a more dense genotyping array [35]. It is possible that a genome-wide array of ,50,000 SNPs would capture sufficient LD for population based association studies to be attempted in Thailand. If half of the globally discovered SNPs were polymorphic in Thailand, so that ,25,000 SNPs were typed with such an array, the SNP marker density would average ,1 per  kb, at which distance pairwise LD indices in the Thai P. falciparum population are often moderately strong (D9.0.5) [23,33,36]. Most African P. falciparum populations will have less LD than the Thai parasite population [21][22][23]27], so an initial screen for allelic associations in Africa with the same SNP marker density would be less likely to score a 'hit', although it might pick up alleles that have recently been under strong positive selection and have led to a selective sweep of associated haplotypes, of which drug resistance alleles are extreme examples [37][38][39]. Commitment to discovering and developing higher density SNP genotyping arrays is therefore necessary for association studies in Africa, and would also allow the benefit of finer mapping of associated polymorphisms. In practice, it would be difficult to conduct a very large case-control study of severe malaria outside of Africa due to limited numbers of severe cases in any one population, although a multi-centre study in Southeast Asia might be feasible [28]. Even for studies within Africa, a large consortial approach would help to achieve a desirable sample size of more than 1000 P. falciparum isolates from severe cases compared with at least an equivalent number of mild controls, and ideally also asymptomatic infected controls. Ultimately it would be beneficial to also compare results across populations that have somewhat different haplotype structure in the parasite populations, as is now being applied to human genetic studies of common diseases [40,41].
The polymorphisms in six candidate parasite virulence genes tested here in a modestly sized severe malaria case-control study yielded two associations at a P,0.05 level of significance, unadjusted for multiple hypothesis testing. These could have arisen by chance given the number of loci tested, and should only be considered as preliminary 'hits' for further examination. The significant disease association in eba175 tags a short haplotype in the middle of the binding region II incorporating a 2-codon indel and flanking amino acid polymorphisms in the cterminal part of the F1 sub-domain, immediately before the hinge region that precedes the F2 sub-domain [42]. The disease association in Rh1 at the microsatellite centred on codon 667 may tag a region encoding a binding domain or a target of immunity (neither of the flanking microsatellites ,2 kb on either side showed an association). This part of the Rh1 molecule contains a number of amino acid polymorphisms [43], and most significantly, a recent study has identified it as an important erythrocyte binding domain against which antibodies can block invasion [44].
Polymorphisms in eba140 and eba181 previously reported to affect specificity of erythrocyte binding in a transfected COS cell assay [45,46] were tested but not associated with severe malaria here. These polymorphisms were not shown to be associated with erythrocyte invasion phenotypes among isolates in Brazil [47]. The polymorphisms in the 59-part of the Rh2 gene sequence that is shared between paralogues Rh2a and Rh2b showed marginal associations with disease that were not significant after adjusting for potential confounders between the groups. Polymorphisms that are specific for Rh2a and Rh2b towards the 39-end of each gene may be associated with differences in erythrocyte invasion phenotype, as suggested for Rh2b in a recent study in Senegal [48] and in Brazil [49], and should be tested in future disease association studies. None of the Rh4 polymorphisms tested here showed disease associations, although there are differences between parasite lines in their ability to switch to Rh4-dependent invasion that involves a neuraminidase-resistant erythrocyte receptor [50,51]. An adjacent paralogous gene Rh5 has very recently been shown to encode a smaller protein with amino acid polymorphisms that affect erythrocyte receptor binding [52], and is a candidate for future studies. It is also possible that noncoding polymorphisms influence transcription of these genes in cisor trans-acting regulatory mechanisms, and these would be among the potentially important loci to be discovered by a genome-wide approach.     The common allele at each locus is the reference allele (Odds Ratio = 1). Adjusted odds ratios result from a multivariate analysis adjusting for gender, previous history of ever having malaria, and the year of recruitment. doi:10.1371/journal.pone.0005454.t003 Table 3. cont.