Genetic Variation in the Plasmodium falciparum Circumsporozoite Protein in India and Its Relevance to RTS,S Malaria Vaccine

RTS,S is the most advanced malaria vaccine candidate, currently under phase-III clinical trials in Africa. This Plasmodium falciparum vaccine contains part of the central repeat region and the complete C-terminal T cell epitope region (Th2R and Th3R) of the circumsporozoite protein (CSP). Since naturally occurring polymorphisms at the vaccine candidate loci are critical determinants of the protective efficacy of the vaccines, it is imperative to investigate these polymorphisms in field isolates. In this study we have investigated the genetic diversity at the central repeat, C-terminal T cell epitope (Th2R and Th3R) and N-terminal T cell epitope regions of the CSP, in P. falciparum isolates from Madhya Pradesh state of India. These isolates were collected through a 5-year prospective study aimed to develop a well-characterized field-site for the future evaluation of malaria vaccine in India. Our results revealed that the central repeat (63 haplotypes, n = 161) and C-terminal Th2R/Th3R epitope (24 haplotypes, n = 179) regions were highly polymorphic, whereas N-terminal non-repeat region was less polymorphic (5 haplotypes, n = 161) in this population. We did not find any evidence of the role of positive natural selection in maintaining the genetic diversity at the Th2R/Th3R regions of CSP. Comparative analysis of the Th2R/Th3R sequences from this study to the global isolates (n = 1160) retrieved from the GenBank database revealed two important points. First, the majority of the sequences (∼61%, n = 179) from this study were identical to the Dd2/Indochina type, which is also the predominant Th2R/Th3R haplotype in Asia (∼59%, n = 974). Second, the Th2R/Th3R sequences in Asia, South America and Africa are geographically distinct with little allele sharing between continents. In conclusion, this study provides an insight on the existing polymorphisms in the CSP in a parasite population from India that could potentially influence the efficacy of RTS,S vaccine in this region.


Introduction
Malaria, especially that caused by Plasmodium falciparum is responsible for nearly 800,000 deaths each year worldwide, most of whom are young children in Sub-Saharan Africa [1]. Approximately 1.5 million cases of malaria are reported in India each year, of which 50% are due to P. falciparum [1]. Given the high impact of malaria on human health, a highly effective vaccine is definitely needed for long-term control, elimination and possible eradication of malaria.
As of today there is no licensed vaccine against malaria. However, a number of potential vaccine candidates targeted against pre-erythrocytic, erythrocytic and sexual stages of P. falciparum are under various stages of clinical development [2]. The most advanced among all, is RTS,S, a pre-erythrocytic stage vaccine based on the parasite's circumsporozoite protein (CSP) [3]. The CSP is the most abundant protein on sporozoite surface and consists of a highly polymorphic central repeat region flanked by a less polymorphic N-terminal and highly polymorphic Cterminal non-repeat regions [4]. The central region, which is predominantly consisting of tandem repeats of NANP (N, Asparagine; A, Alanine and P, Proline), in addition to small number of NVDP (N, Asparagine; V, Valine; D, Aspartic acid and P, Proline) repeats, constitutes immunodominant B cell epitopes. Whereas the C-terminal region, which is concentrated in two subregions, called Th2R and Th3R, makes both B cell and T cell epitopes. In RTS,S recombinant vaccine, 19 NANP repeats and entire C-terminal sequence of the CSP from NF54/3D7 P. falciparum strain (amino acid residue 207 to 395) are fused to the hepatitis B surface antigen (HBsAg), which in turn is co-expressed with additional unfused HBsAg in Saccharomyces cerevisiae yeast [5].
Since 1992, when the first trial of RTS,S, was conducted, it has progressed through multiple phase-I and II trials on children and infants in several African countries [3,[6][7][8][9]. After obtaining substantial level of protective efficacies in phase II trials, which ranged from 30 to 50%, it is currently going through large-scale phase-III trials at 11 sites in seven countries in Africa [3,[10][11][12][13][14][15]. In fact, initial results of the phase-III trials have been published recently showing that the RTS,S vaccine provide African children aged 5 to 17 months with significant protection against clinical (56%) and severe (47%) malaria [16]. This study marks an important milestone in the development of malaria vaccine, and there is a hope that the first generation malaria vaccine will be licensed by the year 2015. The malaria vaccine community has set a goal to license a safe and affordable vaccine by 2025, that has .80% efficacy and lasts longer than four years [17].
The occurrence of high genetic diversity in the malaria parasite, especially at the surface-expressed molecules poses the greatest challenge in developing a universally effective malaria vaccine. Almost all P. falciparum antigens currently under consideration for vaccine development including CSP have been observed to exhibit polymorphisms in field isolates from various malaria-endemic regions of the world [18][19][20][21][22][23][24][25]. In addition, polymorphisms in the CSP has been shown to restrict T cell reactivity to specific epitope and affect binding to HLA indicating selection of variants due to immune pressure [26,27]. Given the importance of antigenic diversity in influencing the outcome of any vaccine, it is very important to characterize the prevailing level of variation in CSP in different endemic regions as this will help to determine if vaccine escape variants will compromise the efficacy of RTS,S vaccine. Therefore, we have conducted a five-year prospective cohort study in the Madhya Pradesh state of India, to examine the genetic polymorphisms both at the central repeat and C-terminal regions of the CSP, included in RTS,S, and also at the N-terminal T cell epitope region. The data from this study were compared with the sequences available for the P. falciparum isolates from other malaria endemic countries in the world. The global distribution of various allelic forms of the CSP has also been discussed.

Ethics Statement
The study was conducted in accordance with the procedure and guidelines approved by the Indian Council of Medical Research (ICMR), Government of India, and with ethical approval from the institutional review board of the All India Institute of Medical Sciences (AIIMS), New Delhi, Regional Medical Research Centre for Tribals (RMRCT), Jabalpur, India, and the Centers for Disease Control and Prevention, Atlanta, U.S.A. Written informed consent was obtained from each participant or their parents or guardians before being included in this study.

Study design, study sites and sample collection
This five-year prospective study (2005)(2006)(2007)(2008)(2009)(2010) was designed to develop a well-characterized field site, where the epidemiology of malaria, immune responses to key parasite antigens, genetic diversity at leading vaccine candidate antigens, and Anopheles vector characteristics could be understood. There were two cohort groups in this study as the source of our samples namely: the hospital-based cohort and the community-based cohort. The hospital-based cohort group represents the patients who attended either the Netaji Subhash Chandra Bose (NSCB) Medical College at Jabalpur or the Civil Hospital Maihar in Satna district. The community-based cohort group represents patients who were enrolled from study sites in Bargi and Sihora; both in the Jabalpur district (Fig 1). Approximately, 200-300 ml of peripheral blood samples were collected from each patient who had fever or feverlike symptoms at the time of enrollment. The malaria epidemiology in both the Jabalpur and Satna districts, which are 150 kilometers apart, are identical with transmission level varying from hypo-endemic to meso-endemic. The details of patient enrollment, sample collection and other epidemiological parameters are available (Text S1).

Genomic DNA extraction and amplification of csp gene
Genomic DNA was extracted from P. falciparum infected blood using Genomic DNA Extraction Kit (Bioneer Corporation, Korea), in accordance with the manufacturer's protocol. The CSP1forward (59-TTAGCTATTTTATCTGTTTCTTCC-39) and CSP2 reverse (59-TAAGGAACAAGAAGGATAATACC-39) primers designed using 3D7 strain as a reference sequence, were used to amplify 1177 bp of the 1194 bp complete pfcsp gene. The PCR cycling conditions for this primer pair were: 10 minutes initial denaturation at 94uC followed by 35 cycles with 1 minute denaturation at 94uC, 1 minute annealing at 57uC, 90 seconds extension at 72uC and a final 10 minute extension at 72uC. The resulting PCR products were diluted 1:10 and 2 ml of this was used as a template to amplify the internal 1026 bp fragment using CSP3 forward (59-GAAATGAATTATTATGGGAAACAG-39) and CSP4 reverse (59-GAAGGATAATACCATTATTAATCC-39) primers. The 1026 bp fragment encompassed the N-terminal T cell epitope, central repeat and C-terminal T ell epitope regions. The PCR cycling conditions for CSP3/CSP4 primer pair were: 10 minutes initial denaturation at 94uC followed by 35 cycles with 1 minute denaturation at 94uC, 40 seconds annealing at 57uC, 80 seconds extension at 72uC and a final 10 minute extension at 72uC. Proof reading polymerase Pfx (Invitrogen Life Sciences, Carlsbad, CA, USA) was used to avoid introduction of any error during PCR. Care was also taken to exclude the possibility of cross-contamination where a negative control without the template DNA was always used. Further, the DNA from a culture adapted P.falciparum was used as a control to check the reliability of the sequencing. The amplified products were resolved on 1.2% agarose gel.

Sequencing of the amplified products and sequence analysis
The desired sized bands were excised from the gel and purified using the Gel Extraction Kit (Bioneer Corporation), in accordance with the manufacturer's protocol. The methods for cycle sequencing PCR and cleanup were same as described earlier [28]. The products were sequenced on both strands using CSP3 forward, CSP4 reverse and CSP-D reverse (59-TGGGTCATTTGGCATATTGTG-39) primers, using ABI Big Dye Terminator Ready Reaction Kit Version 3.1 (PE Applied Biosystems, CA, USA) on an ABI-310 genetic analyzer (ABI 310 Genetic Analyzer; PE Applied Biosystems, CA, USA). The BioEdit Sequence Alignment Editor [29] and GeneDoc-Version 2.6.002 [30] were used to analyze the sequencing electropherograms and generate sequence alignment, respectively. The csp sequences of the eight laboratory-adapted P. falciparum strains (Dd2, K1, MAD20, Wellcome, 7G8, HB3, 3D7, and RO33) were also included in the alignment to make comparison. All of these sequences generated in this study have been submitted into the NCBI GenBank database under the accession numbers HM756094-HM756109 and HM582036-HM582081.
We also compared our sequences with all the published and unpublished CSP sequences deposited in the NCBI GenBank database from around the world. The details about these isolates have been provided as (Table S1). Here we could only compare C-terminal Th2R/Th3R sequences since this region of the CSP has most widely been sequenced. The genetic relationship among the global CSP sequence (Th2/Th3R) haplotypes was deduced by the algorithm Minimal Spanning Tree (MST), implemented in BioNumerics version 6.6 (Applied-Maths, Inc. Austin, TX). In this algorithm, the haplotype with the highest numbers of single locus variants (SLVs) is considered as a root haplotype and all other haplotypes as relatives. This not by any means suggests the origin or ancestry of a particular haplotype.

Statistical analysis
The different parameters of genetic diversity such as numbers of haplotypes (H), segregating sites (S), haplotype diversity (Hd) and average number of nucleotide differences per site between two sequences (p), for the isolates from each country were calculated using DnaSP ver. 4. 10. 9 [31]. Only Th2R/Th3R region sequences were included in these analyses. The difference between non-synonymous (dN) and synonymous (dS) mutations were estimated in MEGA version 4.0 [32] using the method of Nei and Gojobori's [33] with the Jukes and Cantor (JC) correction to test the evidence of positive (balancing) selection. We also performed Fu & Li's F* [34] and Tajima's D [35] test statistics to test the neutral theory of evolution using DnaSP. In order to look for pattern of selection across the Th2R/Th3R region, a sliding window analysis of p,

Results
A total of 2336 (n = 603, Hospital cohort; n = 1733, Community cohort) patients enrolled over the 5 years period based on the adopted inclusion criteria. By light microscopy, only 780 (n = 603 Hospital cohort; n = 177 Community cohort) patients were confirmed to have P. falciparum infection. Of them, 626 samples were subjected to PCR amplification of pfcsp gene while remaining 154 samples were not available (Details of these patients are given in Table S2). A total of 216 samples gave PCR amplification for this gene. The PCR positivity rate for pfcsp gene was lower (34.5%) as compared to other vaccine candidate antigen genes (44% to 56%) and P.falciparum dihydrofolate reductase (pfdhfr) gene (66%) among these 626 isolates (Unpublished data). Such variation in PCR positivity among the isolates can occur due to various reasons such as variable copy number of the gene, annealing efficiency of the primers, level of parasitemia in the samples etc. Successful sequencing data was obtained from 161 samples each for the Nterminal T cell epitope and central repeat regions (n = 126 Hospital cohort; n = 35 Community cohort), while 179 samples provided sequence data for the C-terminal T cell epitope region (n = 142 Hospital cohort; n = 37 Community cohort) ( Table 1). All the samples were of mono-infections as checked by the PCR analysis of the merozoite surface protein (msp1 and msp2) genes (Data not shown). Further, there were no mixed peaks in pfcsp gene sequences for any of the isolate which also ruled out the possibility of mixed infections with other P.falciparum strains. Sequence diversity in the central repeat region Sixty-three unique haplotypes (CR1 to CR63) were observed among 161 isolates in the central repeat region (Fig 2A). The number of tetrapeptide repeats that includes NANP and other minor variants, in this region varied from 35 to 53 among these haplotypes. Approximately 58% of the samples (n = 161) had number of tetrapeptide repeats between 42 and 46 ( Fig 2B). Six haplotypes were exclusive to community cohort, 18 were common to both community and hospital cohorts, while remaining 39 were only found in hospital cohort. Sixteen samples (CR12 to CR15) had same number of repeats (number of repeats = 42) as in 3D7 strain, however, they had variation in sequences (Fig 2A). None of the samples were identical to either 3D7 or Dd2 strain sequences at this locus.

Sequence diversity in the N-terminal non-repeat region
As expected the N-terminal non-repeat region was highly conserved among the samples analyzed and resulted into only five haplotypes-H1 to H5 (Fig 3A). The H4 and H2 haplotypes observed in ,55% (n = 161) and ,40% (n = 161) samples respectively, were predominantly present in our study sites. The H2 haplotype was identical to 7G8, Dd2, MAD20, RO33, K1 and Wellcome strain sequences at this locus. The H1 haplotype observed in seven samples was identical to 3D7 and HB3 sequences. The haplotype H4 was exclusively found in this study.

Sequence diversity in the C-terminal non-repeat region
A total of 24 Th2R/Th3R sequence haplotypes (H1 to H24; 1 specific to community cohort, 15 specific to hospital cohort and 8 common to both cohorts) were defined from 179 samples analyzed, predominant being the H1/Dd2 type sequence (,61%, n = 179) (Fig 3B). This most common haplotype H1 differed from 3D7 sequence at 4 codons (3 in the Th2R and 1 in the Th3R region). The second most common haplotype H2 (,12%, n = 179) found only in hospital cohort samples was a single locus variant (SLV) of H1, since it differed from H1 only by one mutation (Valine to Alanine) in the Th3R region. In comparison to 3D7 sequence that is represented in RTS,S vaccine, polymorphism was observed at 13 codons in the Th2R region and at 7 codons in the Th3R region (Fig 3B). None of the sample had Th2R/Th3R sequence identical to the 3D7 strain. The over all haplotype diversity (Hd) for the combined Th2R/Th3R region was 0.61460.041 and p nucleotide diversity was 0.006560.0065, suggesting moderate level of genetic diversity at the C-terminal region of CSP in this population ( Table 2). The evidence of selection occurring on this gene was not very conclusive as both the Fu & Li's F* (23.53, P,0.05) and Tajima's D (22.13, P,0.05) were negative for the whole Th2R/Th3R region. Sliding window analysis also showed negative values of these indices across the shorter segments of the Th2R and Th3R (Fig S2). However, the dN-dS difference (0.00860.003) was positive for this region.

Global analysis of the Th2R/Th3R sequences
The multiple sequence alignment and minimal spanning tree (MST) analysis of Th2R/Th3R sequences of all 1339 global isolates including 179 from the current study resulted into 117 unique haplotypes ( Table 2, Fig 4, Fig S1). These included 53 haplotypes from Asia (n = 974), 10 haplotypes from South America (n = 181) and 63 haplotypes from Africa (n = 184). Other parameters of genetic diversity also suggested that the African populations show greater diversity at the Th2R/Th3R locus followed by Asian and South American populations ( Table 2). The level of genetic diversity at the Th2R/Th3R locus in our study population was very close to the diversity shown by the isolates from other Asian countries. Broadly, there were two predominant Th2R/Th3R sequence haplotypes in South America, one was H52 (7G8 type) and other was H53 (HB3 type) (Fig 4). In Asia, five Th2R/Th3R sequence haplotype groups can be described. The first group includes H1 (Dd2 type) and its SLVs (directly connected to H1 by red line in Fig 4). The second group includes H51 (MAD20 type) sequence and its SLV H28. The third group H25 differed from the Dd2 type sequence by 2 mutations (Double locus variants or DLVs). Interestingly, the fourth group H52 was identical to the South American haplotype 7G8. The fifth group H30 was very distant from its other Asian relatives. Though there were 63 unique haplotypes in Africa, they were distributed into 3 major clusters: H60 (3D7 type), H61 and H62. The haplotype H52 (7G8 type) was the only Th2R/Th3R sequence common among Asian, African and South American population (Fig 4).

Discussion
To date, several antigens in the P. falciparum have been identified and considered for the development of vaccine(s) against malaria. One of the underlying challenges, which have been delaying the successful development of malaria vaccine, is the high level of antigenic diversity present in almost all candidate vaccine antigens. The antigenic diversity in the parasite arises over time as a result of immune selection pressure within the human host. Also, almost all vaccine candidates under clinical development are based on a single allelic form of the antigen and thus may not be able to provide protective immunity against all the parasite strains circulating in the population. The RTS,S, which is a subunit vaccine has shown only partial protection (about 50%) in recent phase III trials against clinical episodes of malaria [16]. It is unclear whether polymorphisms observed in the CSP is a contributing factor in the partial efficacy of this vaccine and this may become clear when follow up studies related to this vaccine trial are completed. The results from this study are relevant for future RTS,S vaccine trials in India as this is the first major study to comprehensively analyze the diversity of CSP in Indian P.falciparum population. The N-terminal region of CSP, which contains T cell proliferation determinants, putative hepatocyte binding site and B cell epitopes [36], [37], did not show much polymorphisms among the 161 samples sequenced (Fig 3A). This is similar to earlier studies where only limited numbers of polymorphisms have been observed within this region [21,22,[38][39][40][41][42]. Global analysis of this region revealed that H2 (Dd2 type) is the most common haplotype found in Asia followed by H1 (3D7 type) [21,22,[38][39][40]. Both H1 and H2 haplotypes have been previously reported from Africa and H1 being the predominant one [22,41]. A previous study using various  deletion constructs of the CSP has demonstrated that the Nterminal region of this protein contains the motif required for binding and subsequent invasion of liver cells by the sporozoites [43]. Specifically, the 82-100 amino acid residues (DNEKLRKPKHKKLKQPADG) called 'region I-plus' is critical and has the highest binding affinity to heparan sulfate (HS) ligand expressed on the liver cell surface [44]. Furthermore, antibodies raised against the N-terminal region have been found to be protective in nature, and were able to inhibit the binding and invasion of the liver cells by the sporozoites [45][46][47]. Since this region is important for sporozoite attachment and invasion, the presence of non-synonymous mutations in this region may cause structural changes in the protein leading to a reduction in its binding affinity to the liver cell [45]. We observed a very high level of repeat polymorphisms in the central repeat region where almost 88% of the samples contained 42 to 49 tetrapeptide repeats (Fig 2A, 2B). This is very similar to a previous study by Escalante et al where they had observed 37 to 49 repeats in 75 samples analyzed from different countries including 11 samples from India [22]. The central repeat region, apart from making immunodominant B cell epitopes also provides structural stability to CSP [22]. The simulation study by Escalante et al [22] has shown that the stability of the type-I b turn in CSP increases with the number of repeats. It is not clearly understood how the variable number of NVDP, NADP, NAHP, NVNP and other minor repeats influence antibody response against CSP.
However, it has been suggested that the diversity in the repeat region is maintained by balancing selection [22].
Like other malaria-endemic countries we also found here high level of polymorphisms in the C-terminal region of CSP among Indian isolates. The Th2R region was more polymorphic than Th3R (Fig 3B). The sequence polymorphisms in the Th3R domain are very critical as they are involved in cytotoxic T cell activity and HLA binding [26,27]. These polymorphisms may help parasites to escape the immune pressure of the host and will have a significant impact on vaccine efficacy. However, several studies in Africa have shown that the current RTS,S vaccine induces a cross-reactive immune response against a wide range of CSP alleles and the protection is not strain-specific [48][49][50]. These studies analyzed CSP sequences of the parasites strains in the RTS,S-vaccinated individuals who became re-infected, and in the control population, and found that both groups had almost similar distribution of vaccine-type and other CSP allelic variants. This suggests that the RTS,S vaccine does not favor selection or expansion of the parasite with a particular CSP allele(s) in the vaccinated individuals. Given that the emergence and subsequent expansion of an advantageous allele in a population depends on several factors, including the strength and duration of the applied selection pressure as well as transmission dynamics in the region, it will be interesting to monitor the effect of the long-term and widespread use of RTS,S vaccine on the emergence of any selective CSP variants. Interestingly, majority of our samples were identical or nearly identical to Dd2/Indochina type, and almost all samples clustered with the Asian type sequences (Fig 4). As illustrated in figure 4, Th2R/Th3R sequence haplotypes are geographically distinct and have a very distinct pattern of polymorphism in populations in Asia, South America and Africa. The Dd2 type or closely related Th2R/Th3R sequences are predominant in Asia whereas 7G8 and HB3 types are predominant in South America [22,38,[51][52][53][54]. As expected, Africa has lot more unique Th2R/Th3R sequence haplotypes apart from the predominant 3D7 type [22,41]. It is also worth noting that there is not much sharing of alleles between continents at least at Th2R/Th3R region (Fig 4, Fig S1). Consistent with previous study, we also found that the African populations (Hd 6 SD = 0.95960.006) exhibit greater diversity at CSP compared to the populations from Asia (Hd 6 SD = 0.63660.017) and South America (Hd 6 SD = 0.50660.0036) ( Table 2) [22]. Haplotype diversity (Hd) and average nucleotide diversity (p) in our study population was similar to that of the population from other Asian countries. We could not find any conclusive evidence of the role of positive natural selection in maintaining the diversity at CSP in Indian population. This is in agreement with previous studies where signatures of selection at CSP were not found [41]. Our analysis on the global isolates also confirms that the signature of selection at CSP is not uniform in all populations ( Table 2, Fig S2).
In conclusion, this study makes an important contribution in understanding the type and distribution of naturally occurring polymorphisms in RTS,S vaccine candidate antigen in a population from Madhya Pradesh, India, which is endemic to malaria. The N-terminal region of the CSP showed limited polymorphisms, whereas the central repeat and C-terminal regions were highly polymorphic. Almost all Th2R/Th3R sequences were identical or nearly identical to the Dd2 type or other Asian type sequences but distinct from African and South American sequences. This data would be helpful in the future trials of the RTS,S vaccine in India and to monitor changes in parasite population with different CSP variants before and after vaccine administration. Also, the global analyses of CSP allelic variants reported in this study may be helpful in identifying the predominant allele(s) prevalent in Asia, Africa and South America, and may aid in designing a region or population-specific CSP-based malaria vaccine in the future.  Table S1, Table S3 and Fig S1 for more details on these sequence haplotypes and their country-wise distributions. doi:10.1371/journal.pone.0043430.g004 Text S1 Information on participants' enrolment, characteristics, and sample collection.

Supporting Information
(DOC) Table S1 The available accession numbers, country origin and references of the isolates analyzed in this study. (XLS)