Genetic diversity in two leading Plasmodium vivax malaria vaccine candidates AMA1 and MSP119 at three sites in India

Plasmodium vivax, a major contributor to the malaria burden in India, has the broadest geographic distribution and shows higher genetic diversity than P. falciparum. Here, we investigated the genetic diversity of two leading P. vivax vaccine candidate antigens, at three geographically diverse malaria-endemic regions in India. Pvama1 and Pvmsp119 partial coding sequences were generated from one hundred P. vivax isolates in India (Chennai n = 28, Nadiad n = 50 and Rourkela n = 22) and ~1100 published sequences from Asia, South America, North America, and Oceania regions included. These data were used to assess the genetic diversity and potential for vaccine candidacy of both antigens on a global scale. A total of 44 single nucleotide polymorphism (SNPs) were identified among 100 Indian Pvama1 sequences, including 10 synonymous and 34 nonsynonymous mutations. Nucleotide diversity was higher in Rourkela and Nadiad as compared to Chennai. Nucleotide diversity measures showed a strong balancing selection in Indian and global population for domain I of Pvama1, which suggests that it is a dominant target of the protective immune response. In contrast, the Pvmsp119 region showed highly conserved sequences in India and across the Oceania, South America, North America and Asia, demonstrating low genetic diversity in the global population when compared to Pvama1. Results suggest the possibility of including Pvmsp119 in a multivalent vaccine formulation against P. vivax infections. However, the high genetic diversity seen in Pvama1 would be more challenging for vaccine development.

Introduction Plasmodium vivax is a major malaria species which affects a large part of the world's population [1]. Worldwide, more than 2.5 billion people live in high P. vivax malaria transmission areas and its burden has been increased as compared to P. falciparum over the last few years [1]. India reported largest drop in malaria cases in 2019, however in the WHO South-East Asia Region about 86% of malaria deaths were accounted from India [2]. P. vivax has a higher genetic diversity than the most common species Plasmodium falciparum indicating better fitness and potential to escape the immune response [3]. In addition, P. vivax resistance to the antimalarial drug chloroquine in various countries including India has raised a threat on malaria elimination [4]. India's diverse geography and climate are ideal for the transmission of multiple Plasmodium species and have augmented the malaria burden in India [5]. The twomalaria species P. vivax and P. falciparum are unevenly distributed but also co-circulate in several Indian states. In the southern states (e.g. Tamil Nadu), P. vivax is the dominant malaria parasite species [6]. Eastern states like Odisha reported 47% of all P. falciparum cases and state like Gujarat (from the western part of India) showed the prevalence of both species, including mixed species infections [6]. To control malaria in such diverse regions, new tools including safe and effective vaccines are needed for malaria elimination, although extreme diversity in leading vaccine candidate antigens remains a major barrier for their development.
Hundreds of malaria vaccine candidates have been characterized, of which Apical membrane antigen 1(AMA1) and Merozoite surface protein 1(MSP1) are the most promising blood-stage vaccine candidates for P. falciparum and P. vivax [7,8]. P. vivax Apical membrane antigen 1 (Pvama1), required for both merozoite invasion of erythrocytes and sporozoite invasion of hepatocytes [9], is an antigen of 66 kDa, secreted by microneme organelles and transported to the parasite's surface just prior to erythrocyte invasion. The 1686 bp coding region of Pvama1 codes for a cysteine-rich ectodomain, a conserved cytoplasmic region, and a transmembrane region [10]. The PvAMA1 ectodomain is divided into three subdomains by disulfide bridges referred to as Domain I (DI), Domain II (DII) and Domain III (DIII) [11]. The ectodomain of PvAMA1 is highly immunogenic and high antibody titers against the ectodomain have been reported in natural infections, which effectively inhibit erythrocyte invasion [9,10]. DI and DII have been reported to be highly polymorphic and serological studies showed DII as most immunogenic [12].
Plasmodium vivax merozoite surface protein 1 (PvMSP1) is a highly abundant protein present on the surface of merozoites. Initially it is synthesized as a high molecular weight precursor, which is further processed to various low molecular weight fragments [13]. The C terminal MSP-1 19 fragment contains two epidermal growth factor-like domains that have been shown to be efficient blood-stage vaccine candidates [14]. At the time of red blood cell invasion, only MSP-1 19 remains on the surface of the merozoite and participates in the initial adhesion to the reticulocyte [15]. Several studies have reported that acquired antibody against MSP-1 19 can inhibit parasite growth [16][17][18][19] and are associated with protective immunity against infection [20][21][22][23].
To develop an effective malaria vaccine which can work in diverse regions of the world, it is important to study genetic diversity of vaccine candidates from geographically diverse regions to include alleles that can induce the immune response and cover the antigenic diversity of P. vivax population [24]. India contributes significantly in P. vivax malaria burden from Southeast Asia region, however, genetic diversity in PvAMA1 and PvMSP1 19 vaccine candidate antigens from malaria endemic region of India has rarely been done [25][26][27]. Therefore, aim of this study were to investigate the genetic diversity of two leading vaccine candidate antigens PvAMA1 and PvMSP1 19 kDa region from three geographically diverse malaria endemic regions of India. Published Pvama1 and Pvmsp1 19 sequences from the global parasite population and sequences generated in this study was analyzed to identify genetic polymorphism and selective forces acting on these genes . The results of present study thus have important implications in Pvama1 and Pvmsp1 19 based vaccine designing.

Ethics statement
The clinical samples collected in this study received ethical approval from the Institutional ethic committee of ICMR-NIMR in New Delhi, India and Institutional Review Board at New York University, School of Medicine in New York, New York, USA. Samples were collected after written consent was obtained from all participants.

Study sites and sample collection
Samples were collected as part of a U.S. National Institutes of Health Center for the Study of Complex Malaria in India (CSCMi) project [5] from three geographically diverse malariaendemic regions of India namely Chennai, Nadiad, and Rourkela (S1 Fig). The detailed description of the study sites has been given previously [28]. Briefly, Chennai is the largest city of the southern state of Tamil Nadu, with P. vivax as the dominant species during the malaria season in July and October. Samples were collected at Besant Nagar Malaria Clinic, and in cross-sectional surveys conducted in the coastal fishing community, including a few slums and middle and upper-class urban dwellings. Nadiad is a small town located in the central part of Gujarat state, which is considered as hypo-endemic for P. vivax and P. falciparum, with a slightly higher prevalence of P. vivax. Samples were collected at the Malaria Clinic, located in the Civil Hospital of Nadiad and in cross-sectional surveys conducted in the rural area of Nadiad town. The city of Rourkela located in the Sundargarh district of the eastern state of Odisha, where P. falciparum is the major infecting malaria species, displays meso to hyper endemic transmission in this region. Samples were collected at a malaria clinic in a suburb of Rourkela and cross-sectional surveys were conducted in a forest area of Sundargarh.
P. vivax infection was determined by RDT and microscopic examination of thick and thin blood smear. From each individual 2-3 ml of blood was collected in EDTA vacutainers (Thermo Fisher, Massachusetts, USA). Samples were centrifuged at room temperature at 1500 g for 15 min. Plasma and platelets were removed and stored at -80˚C. DNA was extracted from infected red blood cells using DNA Blood MIDI kit (Qiagen, California, USA). A total of 100 P. vivax samples collected from infected individuals in Chennai (C, n = 28), Nadiad (N, n = 50) and Rourkela (R, n = 22) between January 2013 and May 2015 were used in this study (S1 Fig). P. vivax infections were confirmed by species-specific nested-polymerase chain reaction (nPCR) at each study site [6], and reconfirmed by the same nPCR at National Institute of Malaria Research, New Delhi.

PCR amplification and sequencing
Primers were designed for Pvama1 (PVX_092275) and Pvmsp1 19 (PVX_099980) with Primer3 (version0.4.0) using the P. vivax Sal-1 genome reference from the malaria database PlasmoDB (www.PlasmoDB.org). Two primer sets were designed for Pvama1 and one primer set was designed for C-terminal, 19kDa region of the Pvmsp1gene (S1 Table) and synthesized by Integrated DNA Technologies (IDT), India.
The Pvama1 complete ectodomain (corresponding to nucleotide 127-1461 encoding amino acid 43-487 relative to Sal-I, PVX_092275) was amplified using two sets of primers ( Fig  1). The amplifications were performed in a 25μl volume reaction with 1.5μl of DNA isolated from clinical isolates as the template with GoTaq Green PCR Master Mix (Promega), and 0.2 μM forward and reverse primers. The reaction was run at 94 o c for 5 min, followed by 35 cycles of 94 o c for 30 s, 55 o c for 1 min, and 62 o c for 3 min, with a final extension at 62 o c for 10 min. PCR fragments were visualized by electrophoresis on 1.5% agarose gel, using a 1-kb DNA ladder (Promega).MSP-1 19 C terminal region(corresponding to nucleotide 4921-5187 encoding amino acid 1641-1729 relative to Sal-I PVX_099980) was PCR amplified (Fig 1) as above and the thermo profile was 94 o c for 5 min, followed by 35 cycles of 94 o c for 30 s, 60 o c for 1 min, and 62 o c for 3 min, with a final extension at 62 o c for 10 min. PCR fragments were visualized by electrophoresis on a 2% agarose gel, using a 100-kb DNA ladder (Promega). PCR products were purified using the Exonuclease-I and Shrimp Alkaline Phosphatase (Exo-SAP, Thermo Fisher Scientific) followed by sequencing reactions in both forward and reverse directions (2x coverage) by a commercial sequencing facility using ABI BigDye Terminator Cycle Sequencing kit on an ABI 3730XL automatic DNA Analyzer (Macrogen, Seoul, Korea).

DNA sequence assembly and polymorphism analysis
Raw sequence data were assembled and edited using the SeqMan (DNASTAR, Lasergene, Seq-Man Pro). The DNA sequence chromatograms were carefully inspected to identify the SNP's. All Pvama1 sequences (n = 100) and Pvmsp1 19 sequences (n = 100) were aligned to the P. vivax Sal-I reference strain sequence. A 1335 bp region encompasses the complete ectodomain (DI, DII & DIII) of Pvama1, and 266 bp region of Pvmsp1 19 was assembled excluding primer regions.
To investigate the global genetic diversity of Pvama1 and Pvmsp1 19 , published sequences were retrieved from GenBank (S2 Table). A total of 616 Pvama1 complete gene sequences from eight distinct subpopulations, and sequences of six primate-adapted P. vivax isolates used for vaccine research (Belem, India VII, Palo Alto, North Korea, Indonesia XIX, Chesson) were retrieved from GenBank and added into the analysis. A total of 582 Pvmsp1 19 sequences (full-length Pvmsp1 19 gene sequence and Pvmsp1 42 region sequence) from 14 distinct P. vivax subpopulations and two reference strains (Sal-I and Belem) were retrieved for the analysis. Combined, these represent Pvama1 and Pvmsp1 19 sequences from seven global regions: East Asia (South Korea, China and Myanmar border), South Asia (India, Sri Lanka, and Bangladesh), Southeast Asia (Thailand, Myanmar, Singapore, and Cambodia), West Asia (Iran, Turkey), South America (Venezuela, Brazil), North America (Mexico), and Oceania (PNG. Vanuatu).
All sequences were aligned by the CLUSTAL W program in MEGA6.0 [29] and nucleotide diversity analysis was performed in the DnaSP v5.10.01 software [30]. Several diversity statistics were calculated for each studied Indian population and the global data set of Pvama1 and Pvmsp1 19 , which includes total number of polymorphic sites (S), nucleotide diversity (π), the number of haplotypes (h), haplotype diversity (Hd), the number of synonymous mutation (SP), non-synonymous mutation (NS), using a 100 bp sliding window with a 25 bp step size in DnaSP.
To test the neutral theory of evolution in Pvama1 and Pvmsp1 19 , Tajima's D, Fu and Li's D and F were calculated in DnaSP. Further, to evaluate natural selection the McDonald-Kreitman (MK) test was performed [31].The sequence of Plasmodium cynomolgi AMA1 was used as the interspecies outgroup (GenBank accession no. XM_004222507.1) [32]. Wright's fixation index F ST by DnaSP was calculated to determine the degree of genetic variance in Pvama1 and Pvmsp1 19 kDa region among global isolates. The minimum number of recombination events (Rm), and linkage disequilibrium (LD) was calculated. The relationship between LD and nucleotide sites distance was plotted by using indices D' and R 2 . To investigate the genetic relatedness among Indian and global Pvama1 haplotypes, network analysis was performed using a haplotype frequency > 1. For Pvmsp1 19 all Indian and global haplotypes were used and analysis were performed using Network software version 5.0.0.3 [33]. Mutation sites in PvAMA1 ectodomain were visualized as a three dimensional structure of PvAMA1 (PDB ID: 1W81) by Pymol [34].

Accession numbers
Sequences generated in this study have been deposited in GenBank under the accession numbers: 100 Pvama1 sequences MH657021-MH657120, and 100 Pvmsp1 19 sequences MH657121-MH657220

Genetic diversity and signatures of selection within Pvama1 at three sites in India
A 1,335 bp region of Pvama1 was amplified from 100 P. vivax isolates collected from Chennai (C; n = 28), Nadiad (N; n = 50) and Rourkela (R; n = 22). The average number of nucleotide differences (K) were higher in Rourkela and Nadiad population when compared to Chennai (Table 1). A total of 44 SNP's was identified among these 100 Indian Pvama1 sequences, including 10 synonymous and 34 nonsynonymous mutations. Nucleotide diversity was higher in Rourkela and Nadiad population and lower in Chennai Population (Table 1).
Natural selection was assessed across the region encoding the PvAMA1 ectodomain for Indian population. Balancing selection was observed in Nadiad and Rourkela population, whereas Chennai population showed no evidence of balancing selection. The positive value of Tajima's D, Fu and Li (F and D) for Nadiad and Rourkela populations indicates deviation from neutral evolution and the tendency for diversifying selection in the corresponding regions. The negative Tajima's value, Fu and Li (F and D) value for Chennai population indicates purifying selection. McDonald-Kreitman test showed that the entire Pvama1 ectodomain, particularly DI, had more nonsynonymous substitution as compared to synonymous substitution.
The DNA sequence polymorphism analysis were performed for each domain of Pvama1. The average number of nucleotide differences (K) for the domain I was higher in the Nadiad and Rourkela population and lower in the Chennai population (Table 1). A total of 45 haplotypes (C = 11, N = 31, R = 18) were identified among Indian isolates in domain I, determined an extremely high level of haplotype diversity (Hd = 0.938). Haplotype diversity was higher in Rourkela and Nadiad in comparison to Chennai population. The significant positive value of Tajima's D for Nadiad population, indicates deviation from neutral evolution and the tendency for diversifying selection, however, these values were not statistically significant for Rourkela and Chennai population. The DNA sequence polymorphism analyses of domain II indicated that the average number of nucleotide differences (K) for domain II was highest in the Chennai population and lowest in Nadiad and Rourkela populations. A total of 24 haplotypes (C = 10, N = 19, R = 12) were identified among Indian isolates in domain II and high level of haplotype diversity (Hd = 0.900) was observed. Haplotype diversity was highest in the populations from Rourkela and Nadiad and lowest in isolates from Chennai. For domain III the average number of nucleotide differences (K) was higher in the Rourkela and Nadiad populations and lowest in isolates from Chennai (Table 1). A total of four haplotypes (C = 2, N = 4, R = 4) were identified among these isolates, which demonstrated a low level of haplotype diversity (Hd = 0.46) ( Table 1). Haplotype diversity was higher in the Rourkela and Nadiad populations and lowest in the Chennai population.

Polymorphic amino acids residues within PvAMA-1 at three sites in India
The frequency of amino acid polymorphisms at each study site and for each domain were calculated (Fig 2A). A total of 29 amino acid changes were detected in the PvAMA1 ectodomain. A total of 18 amino acid changes were found in Domain I, nine in Domain II and two amino acid changes were found in Domain III of PvAMA1. DI was found to be the most variable as compared to DII and DIII. A three-dimensional model of PvAMA1 was generated to understand the location of the polymorphic residues on the PvAMA1 active face and silent face, represented in Fig 2B. Only one polymorphic residue on position 66 was located on the opposing face of the PvAMA1 molecule. Other residues were located on the active face of the PvAMA1 structure ( Fig 2B).

Global genetic diversity in Pvama1 and evidence of selection
To study Pvama1 global genetic diversity, sequence data obtained from Chennai, Nadiad, and Rourkela was then compared to published Indian sequences (n = 11) as well as seven geographically diverse P. vivax populations, including from Thailand (n = 231), Sri Lanka (n = 23), Iran (n = 29), Korea (n = 66), China-Myanmar border (n = 73), Venezuela (n = 73), PNG (n = 102), and a set of monkey adapted reference isolates (n = 8) ( Table 2). A total 716 Pvama1 sequences here onwards referred as the "global" (including the 100 Indian sequences obtained in this study and 616 published sequences retrieved from GenBank described above) were used to determine nucleotide diversity and genetic differentiation. A total of 126 SNP's was identified amongst the 716 global isolates, including 41 synonymous mutations and 85 nonsynonymous mutations. The average number of nucleotide difference (k) for the Pvama1 ectodomain across global isolates was 13.568. The higher number of nucleotide differences was found in Iran and India and the least number of differences were seen in isolates from South Korea. The haplotype diversity for the entire Pvama1 ectodomain of global  isolates was (Hd = 0.9898). This value was higher in Iran, PNG, and India when compared to the other populations. The nucleotide diversity (π) was higher in Domain-I (DI) of the Pvama1 among global isolates (Fig 3).
We assessed selection in global population and the results of Tajima's D, Fu and Li D and F statistics indicate balancing selection in Pvama1 ectodomain in India, Sri Lanka, China Myanmar border, Thailand, PNG, Iran, Venezuela population. Only the South Korea population indicates negative Tajima's D value which was not significant. For the global Pvama1 isolates, the minimum number of recombination events, (Rm) estimated was 23 (Fig 4).

Genetic diversity within Pvmsp1 19 at three sites in India
Nucleotide sequence analysis of the 100 Pvmsp1 19 (C = 28, N = 50, R = 22) sequences revealed that this region is highly conserved, and it didn't show any polymorphic site in three geographically diverse malaria endemic regions of India.
A total 682 Pvmsp1 19 sequences here onwards referred as "global" (including 100 Indian sequences obtained in this study and 582 published sequences retrieved from GenBank described above) were used to determine nucleotide diversity and genetic differentiation. There was no genetic or nucleotide diversity found in Indian, South American (Brazil), and Oceanian (Vanuatu) isolates.
A total of 14 SNP's was identified among the 682 global sequences, including four synonymous mutations and ten nonsynonymous mutations. The average number of nucleotide difference (k) for the Pvmsp1 19 kDa region across global isolates was 0.229. The highest number of nucleotide differences were found in Turkey (K = 0.508) and Bangladesh (0.4) while the least was seen in South Korea (K = 0.091) and Thailand (K = 0.181). The haplotype diversity (Hd) for the entire Pvmsp1 19 region of global isolates was Hd = 0.2. This value was higher in Turkey (Hd = 0.508) than the other populations. The nucleotide diversity (π) of the entire Pvmsp1 19 kDa region for global isolates was π = 0.00937 (Table 2). Overall nucleotide diversity observed across the global population indicated low nucleotide diversity in Pvmsp1 19 region (Fig 5). Haplotypes composed of nucleotide polymorphism in Pvama1 ectodomain with a frequency > 1 were used to create a medianjoining network. This network represents the mutational paths connecting Pvama1 haplotypes that may explain the observed sequence diversity. Each node represents one haplotype, node size indicates haplotype frequency and node color corresponds to the country of origin. Line length is proportional to genetic distance.
The estimated Tajima's D value was -1.97046 (P < 0.05), indicating that Pvmsp1 19 is under purifying selection. The Fu and Li's D and F values for Pvmsp1 19 global isolates -5.13595 (P < 0.02) and -4.74044 (P < 0.02) were also negative. For the global Pvmsp1 19 isolates, the minimum number of recombination events (Rm), estimated was 1. The LD index (R 2 ) was also very low suggesting the absence of recombination events and low LD may have contributed to the low diversity in Pvmsp1 19 (Fig 6).
Sequence analysis of the global data resulted in 14 different haplotypes with amino acid changes at 11 codons as aligned to Sal I. All 11 amino acid changes were dimorphic and region specific. Amino acid change 1706E was found only in Turkey (2010, 2012). We observed 8 amino acid changes specific to isolates from Singapore S1643P, D1649N, R1669G, C1681R, C1689R, A1697V, N1708D and P1721Q and one amino acid change specific to South Korea N1692K. Only one amino acid change (K1709E) was found in three different populations from Thailand, Cambodia and Myanmar.

F ST analysis
To understand population differences in Pvama1 due to genetic variation among the three India populations, we calculated the F ST value using Pvama1 sequences from the Chennai, Nadiad and Rourkela. A moderate level of genetic differentiation (F ST = 0. 143) was detected between Nadiad and Chennai isolates, a low level of genetic differentiation (F ST = 0.075) was detected between the Rourkela and Chennai population, and the F ST value observed between Nadiad and Rourkela (F ST = -0.003) demonstrated a high degree of genetic similarity between these two populations.
F ST values for the eight worldwide Pvama1 populations with full-length ectodomain sequence were also calculated ( Table 3). A high level of genetic differentiation was detected sequences from diverse geographical regions were used to create a median-joining network. This network represents the mutational paths connecting Pvmsp1 19 haplotypes that may explain the observed sequence diversity. Each node represents one haplotype, node size indicates haplotype frequency and node color corresponds to the country of origin. Line length is proportional to genetic distance.

Haplotype diversity
A total of 55 haplotypes were identified among Indian isolates of Pvama1, demonstrating an extremely high level of haplotype diversity across the three sites (Hd 0.968). The Pvama1 data showed only 4 haplotypes (Hap1, Hap3, Hap5 and Hap7) shared among these three populations. One haplotype (Hap6) is shared between the Chennai and Rourkela populations, one haplotype (Hap8) is shared between the Chennai and Nadiad populations, and three haplotypes (Hap17, Hap16, and Hap34) are shared between the Nadiad and Rourkela isolates. We observed region specific haplotypes in Chennai (n = 6), Nadiad (n = 27) and Rourkela (n = 12) (S2 Fig). Based on the polymorphism observed in the entire Pvama1 ectodomain, a total of 352 haplotypes were identified in 716 global isolates with no matches to the Sal1 reference strain; only one isolate from Thailand matched to the Belem reference strain. A haplotype network formed using haplotypes with a frequency > 1 revealed a clearer network. Haplotype 2, haplotype 4 and haplotype 22 were shared among the Indian, Sri Lankan and Iranian populations with frequencies of 2.3, 1.9 and 0.69 respectively. Haplotypes 6, 7, 11 and 20 were shared among the Indian and Iranian population with a frequency of 1.1, 0.5, 0.5, and 0.2 respectively (Fig 5).
For the Pvmsp1 19 kDa region, a total of 14 unique haplotypes were identified in 682 global isolates. Haplotype 1 matched to the Sal1 reference strain and was the predominant haplotype in the global population shared among India, Bangladesh, China, Brazil, Turkey, Thailand, South Korea, Sri Lanka, Singapore, Cambodia and Vanuatu, with a frequency of 89.2. Haplotype 2 was restricted to Turkey isolates with a frequency of 2.6. Haplotype 3 was shared among Thailand, Sri Lanka, Cambodia and Myanmar populations with a frequency of 4.9. Haplotype 4 was shared among Bangladesh and Myanmar with a frequency of 0.29, haplotype 5 was restricted to Thailand, haplotype 6-8 were restricted to Sri Lanka, haplotype 9-13 were restricted to Singapore and haplotype 14 was restricted to South Korea population with a frequency of 0.14-0.73 (Fig 6)

Discussion
Plasmodium vivax remains a major cause of malaria in Southeast Asia region, particularly in India and thus must be targeted to achieve a malaria free India by 2030 [2,35]. P. vivax is likely more difficult to eliminate than P. falciparum [36] due to its ability to relapse from long-lasting dormant liver stages, a shorter development cycle in mosquitoes, and a more genetically diverse genome [37][38][39]. Therefore, research towards malaria vaccine development targeting P. vivax must be prioritized. This is the first study from India that provides a comprehensive evaluation of genetic diversity of two leading blood stage malaria vaccine candidates of P. vivax, Pvama1 complete ectodomain and Pvmsp1 19 kDa region in three geographically diverse malaria-endemic regions of India.
In Pvama1 high genetic diversity was observed in all the three regions of India, with substantial geographic variability and lower diversity in Chennai as compared to Rourkela and Nadiad. In Chennai P. vivax is a dominant species responsible for high burden of asymptomatic and submicroscopic malaria cases [6]. The prevalence of malaria in Chennai is lower as compared to Nadiad and Rourkela [6] which possibly contributed to the lower genetic diversity observed in Chennai. Genetic diversity in Pvama1 from India, Iran, and Thailand was high compared to Korea and Venezuela. Lower diversity in Korea and Venezuela could be due to lower malaria transmission in these areas [8, 40,41]. As polymorphism in Pvama1 can help to reveal the P. vivax population history [42], higher genetic diversity observed in Pvama1 from Asian countries supports the Southeast Asian origin of human P. vivax [43]. In Pvama1 majority of diversity was detected in domain I (DI) in Rourkela and Nadiad as previously reported for both P. vivax and P. falciparum from other regions [44,45], suggesting that DI is a dominant target of host immune responses [45][46][47]. Results from Tajima's D, Fu and Li (D and F), and the MK test from Nadiad and Rourkela suggested that DI is under balancing selection, results in present study are consistent with previous studies that showed strong evidence of balancing selection in domain I [10, 36,48,49]. In Chennai negative Tajima's D value were obtained which indicates population expansion or recent bottlenecks, were not significant here. No evidence of balancing selection was found in DII and DIII in P. vivax from Chennai, Nadiad and Rourkela as analyzed by Tajima's D and Fu and Li (D and F) test which is in agreement with other studies from Venezuela, China-Myanmar Border, and PNG [10, 36,45,49]. In contrast, P. vivax population from Sri Lanka showed evidence of positive selection in domain II, reported as highly immunogenic [50,51]. These results imply that the evolutionary pressure on Pvama1 is different in geographically diverse regions of the world.
Polymorphic sites were distributed throughout PvAMA1 ectodomain. However, domain I was the most variable, suggesting that this portion of the protein is potentially more exposed and accessible to host immune responses [36], which could have a significant effect on the immunological properties of the antigen. A total of 18 amino acid changes were observed in domain I, including codon changes at positions 172 (A/T) and 218 (L/V) which was not reported previously from India [25,27]. At position 120, only two amino acid changes R/K were found in present study, whereas three amino acids R/K/S at position 120 were reported previously from India [25,27]. A total of nine amino acid changes were found in domain II, six changes found in present study at position 253, 288, 352, 380, 384, 385 were not reported from Indian isolates previously and three changes at position 277, 368, 382 were reported from Rajasthan, northern part of India [27]. In domain III amino acid change at position 400 (K/R) found in Nadiad and Rourkela only, whereas amino acid change at position 438(R/H) was found in Chennai, Nadiad and Rourkela and reported previously in P. vivax isolates from Rajasthan [27]. In PvAMA1 ectodomain 23 residues were located on one side, suggesting that this side might be targeted by host immune responses. As reported previously [36,49], residue E145A, P210S, G253E, K352E and R438H mapped to predicted B-cell epitopes. In addition, residue E145A, important for immune evasion [49] was found in P. vivax isolates at all three study sites in India. In DII of PvAMA1 highly antigenic region (aa 290-307 SASDQPT-QYEEEMTDYQK) during natural infection [52] was found to be conserved in Indian isolates. Furthermore, immunological studies indicate that DII of AMA1 is more immunogenic than DI during natural human infections and this region could be used as part of a sub-unit vaccine to prevent P. vivax malaria [12]. An effective malaria vaccine should include alleles (haplotypes) that can induce host immune responses against polymorphic antigens to cover the antigenic diversity [24,53]. In India 55 Pvama1 haplotypes were observed in present study and only few haplotypes were shared between the three Indian subpopulations. In 716 global dataset, 352 haplotypes were identified with no matches to the Sal1 reference strain, which has been used for immune-epidemiological studies and vaccine development. Malaria vaccine based on only reference strains may result in poor clinical efficacy due to high genetic diversity and strain specific immunity in vaccine candidates [54], which further shows the necessity to include alleles that are representative of natural parasite population. F ST estimation between India and the closest geographical site Iran showed little genetic differentiation, which may be due to human population movement between the two countries. Only few haplotypes were shared between India, Sri Lanka and Iran indicating that similar allelic forms of Pvama1 circulating in these regions. The complex network of Pvama1 connected by rare admixed haplotypes and evidence of geographically restricted haplotypes suggest that covering Pvama1 diversity would be more challenging [36]. Genetic diversity in Pvama1 gene may contribute to antibody binding and escape, which needs further investigation to understand weather Pvama1 haplotypes are antigenically distinct as previously reported for PfAMA1 [55]. Genetic diversity may vary greatly in different malaria endemic regions, thus three study sites included in present study and limited number of positive samples could be a limiting factor.
PvMSP1 is a merozoite surface antigen, The 42 kDa region breaks into two fragments 33 kDa and 19 kDa. The predicted structure shows that 33 kDa fragment covers the 19 kDa fragment. This observation could explain the diversity and balancing selection present in the 33 kDa fragment but not in 19 kDa fragment [13]. Sequence analysis of the Pvmsp1 19 kDa fragment demonstrated that this region is highly conserved in P. vivax isolates from Chennai, Nadiad and Rourkela. A study conducted at same study sites reported high antibody response against PvMSP1 19 antigen as compared to PvAMA1 in people living in these areas [28]. The Pvmsp1 19 region was found to be conserved in many parts of India i.e. New Delhi (Delhi), Aligarh (Uttar Pradesh), Panjim (Goa), Panna (Madhya Pradesh) Sonapur (Assam), and Car Nicobar (Andaman & Nicobar Islands) [26]. Similar conserved amino acids in the Pvmsp1 19 region were observed among India, China, Vanuatu and Mexico and found to be identical at the nucleotide level to the Belem and Salvador-1 types. Low genetic diversity and geographic clustering was observed in Pvmsp1 19, among the parasite populations from Turkey, Thailand, Bangladesh, South Korea, Sri Lanka, Singapore, Cambodia, and Myanmar suggesting that immune selection maintains similar Pvmsp1 19 alleles around the globe.
A total of 14 Pvmsp1 19 haplotypes was identified in 682 global isolates. Since negative Tajima's D was observed in multiple populations, and only Turkey and Cambodia showed positive Tajima's D values, which indicates that Pvmsp1 19 region has undergone a recent bottleneck and lacks evidence of balancing selection. The level of selection pressure appears to vary on different parts of the protein in different Plasmodium species. In P. falciparum, msp1 19 is under positive selection pressure and Pvmsp1 33 is under purifying selection [56]. As reported previously [57] Pvmsp1 19 region maintains low genetic diversity in geographically diverse malaria endemic regions shows that Pvmsp1 19 based vaccines may work effectively worldwide. Multiple-allele immunogens appear to be protective against heterologous infection and vaccine constructs based on two or more representative strains and antigens may offer an effective solution to the problem of polymorphism [47,58]. A recent study [58] indicates that recombinant protein containing PvMSP1 19 is advantageous, because of the improved cellular immune response.
In summary, this study presents the genetic diversity in PvMSP1 19 and PvAMA1 vaccine candidate antigens in three geographically diverse malaria endemic regions of India and global dataset available in public domain. This is the first extensive study from India which contributes towards understanding the genetic diversity of PvAMA1 and PvMSP1 19 vaccine candidates. Highly conserved Pvmsp1 19 region is observed in present study and previously reported high antibody response against PvMSP1 19 antigen in malaria endemic populations of India confirm it as one of the most promising malaria vaccine candidates. In contrast, exceptionally high genetic diversity is observed in PvAMA1 among Indian and global population. Here, domain I of PvAMA1 is observed to be a dominant target of balancing selection. Overall, it seems that both the studied malaria antigens are under distinct selective pressure by the human immune system and thus genetic diversity in vaccine candidate antigens must be considered while developing a broadly efficacious malaria vaccine. Pvama1 ectodomain sequences from three geographical diverse malariaendemic regions of India were used to create a median-joining network. This network represents the mutational paths connecting Pvama1 haplotypes that may explain the observed sequence diversity. Each node represents one haplotype, node size indicates haplotype frequency and node color corresponds to the country of origin. Line length is proportional to genetic distance. Neighbor-joining tree constructed using 100 Pvama1 ectodomain sequences from India. Chennai sequences are in yellow, Nadiad sequences are in red and Rourkela sequences are in blue. The Sal-1 reference sequence was also included and indicated as a triangle. The unrooted neighbor-joining tree was constructed using 10,000 bootstrap replicates and the branch lengths are in the same units as evolutionary distance used to infer the tree. (PPTX) S1