Spatial Variation in Genetic Diversity and Natural Selection on the Thrombospondin-Related Adhesive Protein Locus of Plasmodium vivax (PvTRAP)

Thrombospondin-related adhesive protein (TRAP) of malaria parasites is essential for sporozoite motility and invasions into mosquito’s salivary gland and vertebrate’s hepatocyte; thereby, it is a promising target for pre-erythrocytic vaccine. TRAP of Plasmodium vivax (PvTRAP) exhibits sequence heterogeneity among isolates, an issue relevant to vaccine development. To gain insights into variation in the complete PvTRAP sequences of parasites in Thailand, 114 vivax malaria patients were recruited in 2006–2007 from 4 major endemic provinces bordering Myanmar (Tak in the northwest, n = 30 and Prachuap Khirikhan in the southwest, n = 25), Cambodia (Chanthaburi in the east, n = 29) and Malaysia (Yala and Narathiwat in the south, n = 30). In total, 26 amino acid substitutions were detected and 9 of which were novel, resulting in 44 distinct haplotypes. Haplotype and nucleotide diversities were lowest in southern P. vivax population while higher levels of diversities were observed in other populations. Evidences of positive selection on PvTRAP were demonstrated in domains II and IV and purifying selection in domains I, II and VI. Genetic differentiation was significant between each population except that between populations bordering Myanmar where transmigration was common. Regression analysis of pairwise linearized Fst and geographic distance suggests that P. vivax populations in Thailand have been isolated by distance. Sequence diversity of PvTRAP seems to be temporally stable over one decade in Tak province based on comparison of isolates collected in 1996 (n = 36) and 2006–2007. Besides natural selection, evidences of intragenic recombination have been supported in this study that could maintain and further generate diversity in this locus. It remains to be investigated whether amino acid substitutions in PvTRAP could influence host immune responses although several predicted variant T cell epitopes drastically altered the epitope scores. Knowledge on geographic diversity in PvTRAP constitutes an important basis for vaccine design provided that vaccination largely confers variant-specific immunity.


Introduction
In low-and middle-income countries in tropical areas, malaria remains one of the leading ten causes of morbidity and mortality, resulting in an estimated economic loss of nearly 40 million disability-adjusted life-years (DALYs) [1]. Although Plasmodium falciparum is the most pernicious and prevalent species, the significance of P. vivax should not be underappreciated because it can cause chronic relapsing illness due to reactivation of hypnozoites and it can potentially lead to severe complications similar to those caused by P. falciparum [2]. Furthermore, vivax malaria occupies a broader geographical range than falciparum malaria whilst the emergences of chloroquine-and primaquineresistant P. vivax are of particular concern if they would be widespreading as chloroquine-resistant P. falciparum [3]. The progress of malaria control and elimination has been hampered by not only the emergence of drug resistant parasites but also insecticide resistant mosquito vectors; thereby, development of a malaria vaccine offers an important preventive measure [4]. Because P. vivax and P. falciparum co-circulate in several endemic areas outside Africa where co-infections of both species are not uncommon [5], effective malaria control requires vaccines against both species.
To date, malarial circumsporozoite (CS) protein is a prime candidate for pre-erythrocytic vaccine development [4]. Although CSP-derived immunogens could elicit immunity against sporozoites, the subunit vaccines derived from this molecule such as RTS, S recombinant vaccine against P. falciparum has resulted in limited clinical efficacy in field studies [6]. Because vaccines derived from irradiation-attenuated or live sporozoites consistently outperform vaccines incorporating single sporozoite proteins, a more effective pre-erythrocytic stage vaccine may require combination of multiple protective immunogens [7,8]. In a murine model, co-immunization of CSP with thrombospondin-related adhesive protein (TRAP), a protein mobilized from microneme to the surface of sporozoite [9], has conferred complete protection against parasite challenge whereas vaccination using each of these immunogens could elicit only partial protection [10]. It is important to note that TRAP-specific CD8+ T lymphocytes are prime mediators for protection against sporozoite challenge in mouse vaccination trials, resulting in significant reduction in liver stage parasites [11]. Furthermore, seroepidemiological study has shown that anti-TRAP antibodies were negatively correlated with parasite density among infected individuals in malaria endemic areas [12].
TRAP has been shown to mediate gliding motility and invasion processes of malarial sporozoites into vertebrate's hepatocyte and mosquito's salivary gland [13,14]. TRAP contains a hydrophobic N-terminal peptide (domain I), an integrin-like magnesium binding (or von Willebrand factor) A domain (domain II), thrombospondin type I repeats (domain III), an acidic proline/ asparagine-rich region (domain IV), hydrophobic transmembrane domain (domain V) and a cytoplasmic tail (domain VI) [15,16]. The locomotion of sporozoites is mediated by the subpellicular actomyosin system that linked to the cytoplasmic tail of TRAP [17]. Despite functional importance of TRAP in parasite survival, analysis of the TRAP loci of P. falciparum (PfTRAP) and of P. vivax (PvTRAP) from clinical isolates revealed microheterogeneity of sequence that has been maintained by positive selective pressure [18][19][20]. Although it remains to be explored whether polymorphism in T cell epitopes of malarial TRAP could alter host cell immune recognition as that observed in CSP of P. falciparum [21], a rationale design of a malaria subunit vaccine targeting TRAP requires knowledge on the extent and pattern of sequence diversity in this molecule from diverse endemic areas.
Both P. vivax and P. falciparum have been circulating in Thailand with almost comparable prevalence since the past 2 decades. However, regional prevalence of P. vivax relative to P. falciparum seems to differ across major endemic areas of Thailand [22,23]. Our recent studies have shown that P. vivax populations in this country exhibited spatial variation in the extent of sequence diversity of major malaria vaccine candidate loci [24,25]. An effective malaria vaccine design undoubtedly requires knowledge on this issue to circumvent possible vaccine escape variants. Herein, we analyzed sequence diversity in the PvTRAP locus of clinical isolates from 4 major malaria endemic areas of Thailand.

Human Ethics Statement
The study protocol was approved by the Institutional Review Board on Human Research of Faculty of Medicine, Chulalongkorn University (IRB303/56). Written informed consent was obtained from all participants.

Parasite populations
Blood samples were obtained from 114 P. vivax-infected patients in northwestern (Tak province, n = 30), southwestern (Prachuap Khirikhan province, n = 25), eastern (Chanthaburi province, n = 29) and southern (Yala and Narathiwat provinces, n = 30) Thailand during 2006 and 2007. Venous blood samples were taken from each patient and were preserved in EDTA anticoagulant. These isolates have been previously determined to contain single clone infection based on sequence analysis of the merozoite surface protein-1 gene (Pvmsp-1) as described previously [24,26]. DNA extraction, amplification and sequencing DNA was prepared from blood samples using QIAamp kit (Qiagen, Hilden, Germany). The complete PvTRAP sequence was amplified by nested PCR using a forward outer primer (PvTRAP-F0: 59-ATGTGTGTACATTTGCGTATG-39, nucleotides 2442 to 2422 before the start codon of the Salvador I sequence, GenBank accession number XM_001614097) and a reverse outer primer (PvTRAP-R0: 59-TCATGAAGTGGC-GAAACAAAC-39, nucleotides 1961 to 1968 from the start codon). The inner pairs of primers were PvTRAP-F1 (59-ATGTCTGTGTGAGCGCGCGGT-39, nucleotides 2398 to 2378) and PvTRAP-R1 (59-TCATCAGAAGCAGTTCCAAG-39, nucleotides 1791 to 1810). PCR amplification was performed in a total volume of 30 ml of the reaction mixture containing template DNA, 2.5 mM MgCl 2 , 300 mM each deoxynucleoside triphosphate, 3 ml of 10x ExTaq PCR buffer, 0.3 mM of each primer and 1.25 units of ExTaq DNA polymerase (Takara, Seta, Japan). Thermal cycling profile for primary PCR included the preamplification denaturation at 94uC for 2 min followed by 35 cycles of 94uC for 30 s, 55uC for 30 s and 72uC for 2 min, and a final extension at 72uC for 5 min. Nested PCR was done using the same thermal profile except that amplification was done for 25 cycles. DNA amplification was carried out by using a GeneAmp 9700 PCR thermal cycler (Applied Biosystems, Foster City, CA). ExTaq DNA polymerase reportedly possesses efficient 59R39 exonuclease activity to increase fidelity and no strand displacement (Takara, Japan). The PCR product was examined by electrophoresis in a 1% agarose gel and was purified by using QIAquick PCR purification kit (QIAGEN, Germany). The PvTRAP sequences were determined directly and bi-directionally from PCR-purified templates and were performed on an ABI3100 Genetic Analyzer using the Big Dye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, USA). Whenever singleton substitution occurred, sequence was re-determined using PCR products from two independent amplifications from the same DNA template. Sequences in this study have been deposited in the GenBank Database under the accession numbers KJ807186 to KJ807299.

Statistical analyses
Sequence alignment with the Clustal X program [27] was done by using the PvTRAP sequence of the Salvador I strain as reference. The PvTRAP sequences of our previous Thai and Brazilian isolates previously published [19] and those reported by others [28] were included for analysis. For interspecific comparison, all PvTRAP sequences were aligned with the TRAP sequences of P. knowlesi (PkTRAP) and P. cynomolgi (PcyTRAP) (GenBank accession numbers XM_002259951 and XM_004223600, respectively). Haplotype diversity (h) and its sampling variance were computed according to equations 8.4 and 8.12 but replacing 2n by n [29]. Nucleotide diversity (p) was computed from the average number of pairwise sequence differences in the sample [29] using the MEGA version 6 program [30]. Evidence of genetic recombination was determined by 2 different approaches: (i) estimation of the minimum number of recombination events (Rm) that can be parsimoniously inferred from a sample of sequences with Monte Carlo simulations [31] using the DnaSP version 5 software [32] and (ii) searching for recombination breakpoints by phylogenetic approach by the Genetic Algorithm Recombination Detection (GARD) method. Goodness of fit for the GARD method was calculated by Akaike Information Criterion derived from a maximum likelihood model fit to each segment (AICc) using the HyPhy package [33,34].
The rates of synonymous substitutions per synonymous site (d S ) and nonsynonymous substitutions per nonsynonymous site (d N ) were computed by using Nei and Gojobori's method [35] with Juke and Cantor correction [36] and their standard errors of these parameters were estimated by the bootstrap method with 1,000 pseudoreplicates as implemented in the MEGA 6.0 program [30]. A statistical significance level of differences in d S and d N was set at 5%.
Tests for departure from neutral evolution based on intraspecific comparisons were done by Tajima's D, Fu and Li's D* and F* statistics. Tajima's D test determines the differences between the average number of nucleotide differences and an estimate of h from the number of segregating sites where h = 4Nem, in which Ne and m are the effective population size and the mutation rate, respectively [37]. Fu and Li's statistics measure the differences between the number of singletons and the total number of mutations (D* test) and the differences between the number of singletons and the average number of nucleotide differences between pairs of sequences (F* test) [38]. To examine whether deviation from neutrality occurs in particular fragments of PvTRAP, a sliding window analysis of Tajima's D across the coding sequences was done using a window size of 250 bp and a step size of 10 bp [32].
Evidence of departure from neutrality based on interspecific comparsion included Fu and Li's D and F tests, and McDonald-Kreitman tests. Fu and Li's D statistics measure the differences between the number of mutations in external branches of the genealogy and the number of mutations while their F statistic compares the differences between the number of mutations in external branches of the genealogy and the average number of nucleotide differences between pairs of sequences [38]. Coalescence simulation with 10,000 pseudoreplicates was done to estimate significance levels of these parameters as implemented in the DnaSP program [32]. The McDonald-Kreitman test contrasts intraspecific polymorphism with interspecific divergence of closely related species. Statistical departure from neutral expectation was computed by Fischer's exact test [39].
Tests of departure from neutrality at specific codons were performed based on estimation of the global ratio of the rate of non-synonymous to synonymous substitutions (dN/dS) across the PvTRAP gene. The single-likelihood ancestor counting (SLAC), fixed effects likelihood (FEL), random effects likelihood (REL) and mixed effects model of evolution (MEME) methods implemented in the HyPhy package [33] were used for analysis. SLAC model is highly conservative based on the maximum likelihood reconstruction of the ancestral sequences and the counts of synonymous and nonsynonymous changes at each codon position in a phylogeny under the assumption of neutral evolution. FEL model compares the ratio of nonsynonymous to synonymous substitution on a siteby-site basis, without assuming an a priori distribution of rates across sites whereas REL model first fits a distribution of rates across sites and then infers the substitution rate for individual sites. MEM algorithm detects codons under episodic positive selection unmasked by the abundance of purifying selection along the lineages [40]. Significance level settings for SLAC, FEL and MEME were p values,0.1 and Bayes Factor.1000 for REL followed the default values available on the Datamonkey Web Server [41].
The population genetic structure was analyzed by molecular variance approach (AMOVA) using the Arlequin 3.5 software which is similar to the Weir and Cockerham's method but it takes into account the number of mutations between haplotypes [42]. The implemented algorithm calculates the fixation index F ST identical to the weighted average F-statistic over loci, hw [43] and the significance levels of the fixation indices were estimated by a non-parametric permutation [42]. Pairwise Slakin's linearized Fst was calculated based on Fst/(1-Fst) to address the relationship between the magnitude of population differentiation and distance between endemic areas [44].
The genetic structure was also determined by STRUCTURE 2.3.4 program that deploys the Bayesian model-based clustering approach [45]. The most probable number of populations (K) was estimated using an admixture model. All sample data were run for values K = 1-10, each with a total of 15 iterations. We used 500,000 Markov Chain Monte Carlo generations for each run following a burn-in of 50,000 steps. The most likely number K in the data was estimated by calculating DK values [46] and identifying the K value that maximizes the log probability of data, lnP(D) [45] as implemented in STRUCTURE HARVESTER program [47].
The phylogenetic tree of the PvTRAP haplotypes was constructed by using the Maximum Likelihood method based on the model with the lowest Bayesian Information Criterion (BIC) scores [30]. All positions containing gaps and missing data were eliminated. Initial tree(s) for the heuristic search were obtained by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach. The final tree topology was selected based on superior log likelihood value as implemented in the MEGA version 6 program [30].
Amino acid properties were characterized based on polarity and charge. Polarity of each residue is categorized as polar (S, Y, C, W, H, Q, T and N) or nonpolar (the remainder). Charge property includes positive (H, R and K), negative (D and E) or neutral (the remainder). Prediction of HLA-I binding peptides was based on the method taken into account proteasomal C terminal cleavage and transporter associated with antigen processing (TAP) transport efficiency. The predicted scores were determined by the NetCTL program [48].

Genetic diversity in PvTRAP
Of 114 complete PvTRAP sequences, 109 isolates contained 1668 bp and 5 isolates from Tak province whose sequences were identical had 1677 bp. Sequence alignment with the Salvador I strain revealed 36 nucleotide substitutions within 33 codons, resulting in 26 nonsynonymous and 7 synonymous changes. Comparing with previous data from Brazilian, Thai [19], South Korean isolates [28], 9 new amino acid substitutions were identified (Table 1). Like our previous findings, the majority of amino acid changes occurred in domains II and IV [19]. Three codons contained more than two substituted nucleotides: codon 206 (after the Salvador 1 sequence) having 4 different substitutions: c.616C.T (P206S), c.617C.G (P206S), c.617C.T (P206L) and c.616_617delinsTT (P206L), codon 268 having double mutations: c.802_803delinsGG (K268G), and codon 424 containing 2 changes: c.1271A.C (N424T) and c.1272C.A (N424K). Among isolates examined herein, no discernible bias in frequency of nucleotide substitutions in each position of the codons occurred, i.e. 11, 11 and 13 substitutions in the first, second and third positions of the codons, respectively. In total, 44 distinct haplotypes were identified in this analysis, one of which had a duplication of 9 nucleotides encoding PDS in domain IV as previously noted [19] and responsible for size variation (Tables 1  and S1).
The distribution of PvTRAP haplotypes in southern Thailand (Yala and Narathiwat provinces) was rather skewed towards few haplotypes as shown by a relatively lower value of haplotype diversity (h = 0.65360.063) whereas parasite populations from Table 1. Distribution of PvTRAP haplotypes from 4 major malaria endemic areas of Thailand.  I   II II  II II II II II II II II II II II II III III IV IV IV IV IV IV IV IV IV IV IV IV Salvador-1 . .
.    I   II II  II II II II II II II II II II II II III III IV IV IV IV IV IV IV IV IV IV IV IV Salvador-1 . Tak, Prachuap Khirikhan and Chanthaburi provinces had significantly higher haplotype diversity values, suggesting a more even distribution of haplotype frequencies in these latter endemic regions (Table 2). It is noteworthy that the predominant haplotypes differed between endemic areas. For examples, haplotypes #39 (n = 15) and #42 (n = 10) were exclusively found in Yala and Narathiwat provinces whereas haplotypes #28 (n = 10) and #30 (n = 8) were unique for Chanthaburi province. In contrast, several haplotypes were shared between isolates from Tak and Prachuap Khirikhan provinces ( Table 1). The nucleotide diversity (p) of PvTRAP varied from 0.0011760.00014 (isolates from Yala and Narathiwat provinces) to 0.0027260.00032 (isolates from Prachuap Khirikhan province). The magnitudes of nucleotide diversity in this locus did not significantly differ among isolates from Tak, Prachuap Khirikhan and Chanthaburi provinces but these were significantly greater than that from Yala and Narathiwat provinces (p,10 24 ) ( Table 2).

Population differentiation
We did not observe population differentiation between isolates from Tak and Prachuap Khirikhan provinces in which the F ST values did not deviate significantly from zero, implying that they shared a common origin or a high genetic admixture. However, P. vivax populations from Tak and Prachuap Khirikhan provinces displayed the F ST values that were significantly greater than zero when compared with populations from Chanthaburi province (p, 10 25 ) as well as that from Yala and Narathiwat provinces (p, 10 25 ). A significant departure from neutrality was also noted when isolates from Yala and Narathiwat provinces were compared with those from Chanthaburi province (p,10 25 ), implying genetic differentiation between populations with minimum or absence of gene flow between these areas. Furthermore, the Fst values between Tak isolates in the present study (collected in 2006-2007) and those collected in 1996 [19] did not differ significantly (p = 0.23), suggesting genetic stability in P. vivax population in this area (Table 3). On the other hand, P. vivax populations from Brazil, South Korea and Thailand were significantly different from one another, indicating strong differentiation and reproductive isolation among populations. These results were in line with analysis using STRUCTURE version 2.3.4 software in which the subpopulation likelihood reached a plateau from K = 3 to K = 6. Consistently, a greatest delta K value was also obtained at K = 3 ( Figure 1). The genetic structure at K = 3 could differentiate Brazilian isolates from Thai and South Korean parasite populations while a separation of South Korean and southern Thai P. vivax populations were not perceived. Similar findings were observed at K = 4. A more concordant results between those obtained by STRUCTURE program and the Fst parameters occurred when K = 5 in which isolates from Tak collected during different periods and from Prachuap Khirikhan were indistinguishable whereas South Korean and southern Thai populations were distinct (Figure 1). Regression analysis has shown positive correlation between pairwise linearized Fst and the natural logarithm of linear distance between endemic areas (Mantel test, r = 0.4619, p = 0.024). Likewise, positive correlation was found when geographic linear distance between endemic areas was considered (Mantel test, r = 0.4503, p = 0.028), implying that the majority of P. vivax populations in Thailand were genetically isolated by distance (Figure 2). Tests for departure from neutral evolution Analysis of patterns of nucleotide substitutions in the PvTRAP sequences of isolates in this study reveals that the rate of nonsynonymous substitutions per nonsynonymous sites or d N significantly exceeds that of synonymous substitutions per synonymous sites or d S in all parasite populations (p,0.05 or p,0.01) except P. vivax isolates from Yala and Narathiwat provinces ( Table 2). In contrast, d S was greater than d N for isolates from Yala and Narathiwat provinces although the difference was not statistically meaningful (p = 0.467). Closer looks into the rate of nucleotide substitutions in each domain of PvTRAP has revealed that d N significantly outnumbered d S in domains II and IV of  isolates from Tak, Prachuap Khirikhan and Chanthaburi provinces. Although no significant difference in d S and d N for the entire coding region of the PvTRAP gene was observed when all parasite populations were considered, d N in domain IV was obviously greater than d S (Table 2). Meanwhile, statistics based on Tajima's D and Fu & Li's D* and F* did not show evidences of departure from neutral expectation (Table 4). However, sliding window analysis reveals significant negative Tajima's D in domain II of P. vivax populations from Prachuap Khirikhan and Chanthaburi provinces, suggesting purifying selection at certain residues in this gene. In contrast, evidence of balancing selection was detected in domain IV of Yala and Narathiwat isolates as shown by significant positive D values. Meanwhile, no departure from neutrality was detected in malaria population from Tak province (Figure 3). Results from interspecific comparison between PvTRAP and PcyTRAP using the MacDonald-Kreitman test has shown that each parasite population except that from Yala and Narathiwat provinces had significant deviation in positive direction from neutral expectation (p,0.01), reflecting a signature of positive or balancing selection. Consistent results were obtained when the TRAP gene of P. knowlesi was used as the outgroup sequence (Table 5). In contrast, application of the Fu & Li's D and F tests using either PcyTRAP or PkTRAP as outgroup sequences did not yield significant departure from neutrality. Meanwhile, positively selected sites were identified at 4 codons in domain II and 3 codons in domain IV by one or more of the methods implemented in the HyPhy package (Table 6). Likewise, negatively selected sites were detected in residue 21 in domain I, residues 30, 100, 133 in domain II and residues 519 and 539 in domain VI, suggesting functional constraint occurring at certain residues in this protein.

Amino acid substitutions and predicted HLA binding
The amino acid substitutions in PvTRAP of Thai isolates had slightly higher percentage of conservative changes with respect to polarity and charge property as 67.7% and 58.1% of these substitutions, respectively, were unchanged. Similar percentages of these changes were observed among worldwide isolates, being 64.6% and 56.9%, respectively (Table S2). Closer look into amino acid substitutions and potential HLA-binding peptides as predicted by the high scores for the C-terminal cleavage and the transporter associated with antigen processing efficiency [48] have shown that a number of substituted residues have remarkably reduced predicted scores; thereby, the property of these epitopes could be altered. It is noteworthy that substituted residues in several of these potential epitopes occurred in domains II and IV of PvTRAP (Table 6).

Intragenic recombination
Evidence of recombination in the PvTRAP locus analyzed by using the HyPhy package has identified evidence of 1 recombination breakpoint with significant topological incongruence between AICc score of the best fitting GARD model (p,0.05) for P. vivax populations in Thailand except those from Yala and Narathiwat provinces (0.05,p,0.1). Although these recombination breakpoints occur at different positions, i.e. between nucleotides 1008 and 1009, 1170 and 1171, and 1173 and 1174 for Chanthaburi, Prachuap Khirikhan and Tak provinces, respectively, it is noteworthy that all were located in domain IV of the PvTRAP gene. Meanwhile, estimation of the parameter Rm reveals 3 and 5 recombination sites in parasite populations from Tak and Prachuap Khirikhan provinces whereas Chanthaburi isolates and southern Thai P. vivax populations had equal number of potential recombination sites (Rm = 2). Locations of potential recombination sites for each population are listed in Table S3, most of which span domain IV. These results suggest that intragenic recombination could shape diversity at the PvTRAP locus.

Phylogenetic relationship
Model test reveals that the Tamura 3-parameter with the rate variation model that allowed for some sites to be evolutionarily invariable gave the lowest BIC value. The tree with the highest log likelihood (23285. 15  buri, Yala and Narathiwat provinces seem to be located in separate branches from the remaining Thai isolates. However, there was no clear lineage for each parasite population because several isolates from different endemic areas shared or was placed in closely related branches. More importantly, all branches in this phylogenetic inference received bootstrap values less than 80%, suggesting no distinct lineage of the PvTRAP haplotypes from worldwide origins (Figure 4).

Discussion
Host adhesion and motility are fundamental properties of malarial sporozoites to establish infection that involves sporozoite-specific proteins, one of which is TRAP. It has been shown that TRAPdeficient sporozoites were severely impaired in their host cell adherence property and gliding motility; thereby, invasion of mosquito's salivary glands and hepatocytes was interrupted [14,49]. Two extracellular portions of malarial TRAP, the von Willebrand A-domain and the thrombospondin repeats located in domains II and IV, respectively, are crucial for initial host cell adherence and stabilization of adhesion/deadhesion during gliding mobility of sporozoites [49,50]. Crystal structure analyses spanning the von Willebrand A-domain and the thrombospondin repeats of PvTRAP and PfTRAP reveal two conformational states, open and closed structures. Such structural phase transition is possibly responsible for 'stick-and-slip' or gliding motility of sporozoites through the actomyosin motility apparatus [51,52]. The sequence motif of metal-ion-dependent adhesion sites (MIDAS) in von Willebrand Adomain and the extensible b ribbon that implicates in ligand binding during the open high-affinity state [51] were perfectly conserved in all PvTRAP haplotypes as shown in this study. Although amino acid substitutions were observed in the b1 and a3 domains, charge and polarity of these substituted residues were retained, suggesting structural constraint in these portions of the protein (Table 1). Experimental studies have shown that artificially engineered mutations of the MIDAS-coordinating amino acid residues that were conserved across malaria species could impair gliding motility and sporozoite infectivity to both mosquito and mammalian hosts [49,50]. On the other hand, alteration in both charge and polarity profiles of some substituted amino acids (residues 143, 166, 172 and 176, Tables 1 and S2) were identified in a4 and a5 domains, implying that changes in these residues may not drastically affect functional conformation of PvTRAP. Nevertheless, further experimental studies are required to address this issue.
The extent of sequence diversity of the PvTRAP locus among parasite populations in Thailand seems to be comparable among endemic areas except that from southern region ( Table 2). The results from this study are in line with our recent analyses of other major vaccine antigens, i.e. apical membrane antigen-1 (Pvama-1), merozoite surface protein-1 (Pvmsp-1), merozoite surface protein-4 (Pvmsp-4) and merozoite surface protein-5 (Pvmsp-5) of P. vivax showing that southern parasite population possesses significantly lower levels of nucleotide diversity, number of haplotypes and haplotype diversity than those of northwestern parasite population [24,25,53]. Heterogeneity in the PvTRAP sequences could have been arisen from selective pressures and intragenic recombination. Recombination confers evolutionary advantages because creation of adaptive traits or removal of deleterious mutants is enabled. Importantly, most recombination sites were identified within or spanning domain IV of the PvTRAP locus where signature of positive selection was identified. The relatively higher levels of nucleotide diversity of PvTRAP from Thai-Mynmar border (Tak and Prachuap Khirikhan isolates) than that from Thai-Cambodia border (Chanthaburi isolates) seems to be in line with Rm in each population. It is likely that transmigration of people could maintain and spread of P. vivax harboring variant PvTRAP haplotypes while recombination could further generate novel alleles. Intriguingly, it seems that reduction in genetic diversity of southern parasite population may not be simply explained by clonal population structure of P. vivax in this region because recombination event could be traced despite the paucity of potential recombination sites observed (Rm = 2). In contrast, several decades of extensive malaria control in this region and a lack of remarkable re-introduction of malaria cases from Malaysia could have resulted in population bottleneck as previously suggested [24]. Therefore, after bottleneck effect and relaxation of malaria control in southern Thailand because of difficulty in implementation due to local political unrest, it could be possible that recombination between different PvTRAP haplotypes may gradually increase the level of genetic diversity in this population.
Our analyses have shown that domains II and IV of the PvTRAP locus had significant difference in the rate of nonsynonymous substitutions than that of synonymous substitutions. Evidence of positive or balancing selection was also reaffirmed by the McDonald-Kreitman test. Although departure from neutrality in the entire coding PvTRAP sequence was not apparent by using Tajima's D and related statistics, sliding window analysis has identified significant negative Tajima's D values in domains II of P. vivax population from Prachuap Khirikhan and Chanthaburi provinces while significant positive values occurred in domain IV among isolates from Yala and Narathiwat provinces. Differential patterns of departure from neutrality among populations could reflect different evolutionary forces exerted on PvTRAP of each population. Alternatively, intrinsic statistical property of Tajima's D statistics can be confounded by demographic processes [54]. Therefore, significant positive Tajima's D observed in southern Thai isolates could imply balancing selection or a recent bottleneck because an excess of intermediate frequency alleles as rare alleles are lost from the population due to genetic drift can be found immediately after the bottleneck. Meanwhile, significant negative Tajima's D value in domains II could imply purifying selection and selective sweep while migration and population growth may be the confounders [55]. Nevertheless, positively selected codons were found based on analyses using SLAC, FEL, REL and MEME methods while negatively selected residues were detected in domains I, II and VI (Table 6). Taken together, it seems that co-existence of both negatively selected amino acids and residues under positive selection could stem from functional constraint of this molecule and probably driven by host immune pressure, respectively. Meanwhile, it seems that intragenic recombinantion was pronounced in populations from Tak and Prachuap Khirikhan provinces while relatively few recombinations occurred in parasites from Chanthaburi province and southern isolates. These findings are in accord with our previous analysis on PvMSP4 and PvMSP5 in which frequency of recombination seems to correlate with endemicity of each area [25,53]. Despite low level of sequence diversity, genetic distances inferred from the PvTRAP locus among P. vivax populations could be deployed for analysis of population structure. Pairwise comparison of populations revealed that South Korean isolates displayed high and significant Fst indices when compared with malaria populations in other endemic areas. Likewise, slightly lower but significant Fst indices were observed when isolates from Brazil were compared with populations elsewhere (Table 3). Therefore, gene flow between P. vivax populations in Thailand, Brazil and Korea may not occur because geographic distance has precluded intra-and intercontinental spread of parasites. Although malaria can be transferred between remote endemic areas through infected travelers, this situation could be inefficient due to rarity in numbers of such cases to contribute to genetic admixture of parasites in these three countries. However, closer look into genetic differentiation between 4 major malaria endemic areas of Thailand, significant genetic differentiation of P. vivax population between endemic areas was observed except parasites from Tak province collected over a decade apart (1996 and 2006-2007) having low and non-significant Fst indices. Therefore, diversity in the PvTRAP locus of parasites in Thailand seems to exhibit spatial but not temporal variation. It is noteworthy that regression analysis of linear geographic as well as normal logarithm of geographic distances and pairwise linearized Fst values yielded significant correlation (Mantel test, p = 0.024 and 0.028, respectively), indicating that P. vivax populations in these 4 endemic areas have been isolated by distance ( Figure 2). Therefore, it seems likely that the flight range of mosquito vectors was the major determinant for magnitude of genetic differentiation of P. vivax populations between endemic areas. However, there remain some other factors that could influence and shape genetic structure of malaria in Thailand. Although geographic distance between Tak and Prachuap Khirikhan provinces is about 160 kilomerters longer than that between Tak and Chanthaburi provinces, significant genetic differentiation occurred in the latter but not the former. It should be noted that transmigration of people along Thai-Myanmar border where both Tak and Prachuap Khirikhan provinces are located is intense during the past decades whereas transmigration of gem miners who, not uncommonly, carried malaria parasites in their circulation between Tak and Chanthaburi provinces had ceased for over 3 decades ago [56]. Furthermore, the abundance of malaria vectors along Thai-Myanmar border could facilitate gene flow of P. vivax in these areas. In contrast, ecological niches of Anopheles vectors between Tak and Chanthaburi provinces are largely interrupted by focal deforestation and urbanization, precluding natural spread of malaria between these areas.
Analysis of hierarchical splitting of gene pools by STRUC-TURE has shown that K = 3 is an optimum value. Intriguingly, K = 3 could not subdivide South Korean and southern Thai isolates that in fact differed biologically and genetically. A very long incubation period of vivax malaria from 230 to 300 days has been recognized among South Korean isolates [28] but, to our knowledge, has never been recorded in Thai strains. Furthermore, phylogenetic analysis has placed all South Korean isolates to a separate cluster from southern Thai isolates, albeit without significant bootstrap support ( Figure 3). In contrast, the partition at K = 5 is more consistent with results from analysis using Fst indices in which distinct subpopulation structure was perceived between isolates from southern Thailand and those from South Korea whilst at K,5, no discernible difference was observed between these populations. Meanwhile, phylogenetic inference does not support distinct clusters of isolates relating to their geographic origins because no branch received remarkably high bootstrap values, consistent with our previous study [19]. Therefore, genetic diversity of the PvTRAP locus displays tendency towards geographic variation (Figure 4).
Protective immunity against malarial TRAP involves both humoral and cell mediated immune responses. Antibodies against TRAP targeting domains involved in gliding motility and hepatocyte binding ligands could prevent sporozoite invasion into hepatocytes [13] while TRAP-specific cytotoxic T cells could destroy sporozoite-infected hepatocytes [57]; thereby, exoerythrocytic development does not ensue. The roles of anti-TRAP antibodies in partial protection from or significantly reduced risk of falciparum malaria have been demonstrated in African endemic areas [12,58,59]. With analogy to the thrombospodin-related motif (TRM) of PfTRAP that binds heparin sulfate proteoglycan of hepatocyte, the TRM in domain III of PvTRAP is relatively conserved with two amino acid substitutions, i.e. WTACSVTC-GR(or K)GTH (or Q) SRSR, while the MIDAS motif is perfectly conserved among clinical isolates. Therefore, limited diversity at the functional domains in PvTRAP has encouraged vaccine incorporation. Meanwhile, several CD4+ and CD8+ T cell epitopes have been mapped across PfTRAP, some of which have been associated with protection [57,60,61]. Importantly, T cell responses to PfTRAP were commonly allele-specific although certain epitopes could induce or serve as targets of cross-reactive immunity [60]. Recent studies have highlighted the importance of protective roles of CD8+ T cell and memory T cell responses to PfTRAP from clinical malaria [61,62]. The significance of diversity in these T cell epitopes among natural malaria population remains to be elucidated in term of vaccine efficacy. To date, little is known about immunological responses to PvTRAP although vaccination study using synthetic peptide encompassing hepatocyte-binding ligand of the protein has conferred protection against parasite challenge in Aotus monkeys [63]. Our analysis suggests that polymorphisms in PvTRAP could affect T cell recognition because amino acid substitutions in several predicted HLA-binding peptides have remarkably altered the predicted epitope scores (Table 7).
In conclusion, our analysis has shown that genetic diversity in the PvTRAP locus of Thai P. vivax isolates exhibits geographic variation and population structure. Positive selection and intragenic recombination have shaped diversity at the PvTRAP locus. The low level of sequence diversity in PvTRAP among clinical isolates as shown in this study may not drastically compromise vaccine design.