Polymorphism in merozoite surface protein-7E of Plasmodium vivax in Thailand: Natural selection related to protein secondary structure

Merozoite surface protein 7 (MSP-7) is a multigene family expressed during malaria blood-stage infection. MSP-7 forms complex with MSP-1 prior to merozoite egress from erythrocytes, and could affect merozoite invasion of erythrocytes. To characterize sequence variation in the orthologue in P. vivax (PvMSP-7), a gene member encoding PvMSP-7E was analyzed among 92 Thai isolates collected from 3 major endemic areas of Thailand (Northwest: Tak, Northeast: Ubon Ratchathani, and South: Yala and Narathiwat provinces). In total, 52 distinct haplotypes were found to circulate in these areas. Although population structure based on this locus was observed between each endemic area, no genetic differentiation occurred between populations collected from different periods in the same endemic area, suggesting spatial but not temporal genetic variation. Sequence microheterogeneity in both N- and C- terminal regions was predicted to display 4 and 6 α-helical domains, respectively. Signals of purifying selection were observed in α-helices II-X, suggesting structural or functional constraint in these domains. By contrast, α-helix-I spanning the putative signal peptide was under positive selection, in which amino acid substitutions could alter predicted CD4+ T helper cell epitopes. The central region of PvMSP-7E comprised the 5’-trimorphic and the 3’-dimorphic subregions. Positive selection was identified in the 3’ dimorphic subregion of the central domain. A consensus of intrinsically unstructured or disordered protein was predicted to encompass the entire central domain that contained a number of putative B cell epitopes and putative protein binding regions. Evidences of intragenic recombination were more common in the central region than the remainders of the gene. These results suggest that the extent of sequence variation, recombination events and selective pressures in the PvMSP-7E locus seem to be differentially affected by protein secondary structure.


Introduction
In most malaria endemic areas outside of Africa, Plasmodium vivax mainly coexists with Plasmodium falciparum, both of which affect global health burden and contribute remarkably to economic loss [1,2].Control of malaria caused by P. vivax has been complicated by the presence of hypnozoites responsible for chronic relapsing illness, the emergence of chloroquine-resistant P. vivax strains and spread of insecticide-resistant anopheline vectors [3].Therefore, alternative measures are required such as development of malaria vaccines [4].
One of the prime strategies for asexual blood stage vaccine development is to mount immunity that interrupts the invasion of Plasmodium merozoites into erythrocytes [5].The initial attachment of merozoite to erythrocyte surface is primarily mediated by the binding of merozoite surface protein-1 (MSP-1) to Band 3 on the erythrocyte membrane [6].Although MSP-1 has been considered a prime target for asexual blood stage vaccine development, recent studies have shown that other merozoite surface proteins, such as merozoite surface proteins-6 and -7 (MSP-6 and MSP-7), form a non-covalent complex with MSP-1 prior to receptor-ligand recognition [7][8][9][10].MSP-7 is expressed during schizogony and undergoes two steps of proteolytic processing akin to MSP-1.Disruption of P. falciparum MSP-7 (PfMSP-7) has resulted in partial impairment in erythrocyte invasion by malarial merozoites [11].Meanwhile, anti-PfMSP-1/6/7 antibodies can interfere with MSP-1 shedding and reduce merozoite invasion into erythrocytes [12].Disruption of the orthologous gene in P. berghei affected intraerythrocytic growth of parasites [13].Furthermore, specific binding of P. berghei MSP-7 to P-selectin has suggested the role of this protein in modulating disease severity through immunological process [14].Therefore, immunity induced by vaccines derived from malarial MSP-7 could potentially interrupt parasite development.
MSP-7 proteins are encoded by a multigene family, of which the number of gene members varies across Plasmodium species [15,16].The MSP7 family of P. vivax contains 13 gene members, designated alphabetically from PvMSP-7A to PvMSP-7M, found in tandem with head to tail arrangement on chromosome 12 [15].Of these, PvMSP-7C, -7E, -7H and -7I displayed higher nucleotide diversity than other paralogous gene members [17][18][19].Although the less polymorphic protein members have been suggested for vaccine incorporation, it is currently unknown whether particular members of the PvMSP7 family involve in binding to PvMSP1 during host cell invasion.Importantly, several malarial surface proteins involved in host cell invasion are under positive or balancing selection and have been considered to be targets of vaccine development [20].Meanwhile, antigenic diversity in several malarial vaccine candidates could elicit allele-specific immune responses, an issue that may complicate vaccine design.Therefore, analysis of sequence variation in malarial vaccine candidates among parasite populations from different geographic areas is essential for a basis of vaccine development [4].
The extent of genetic diversity in PvMSP-7 has been analyzed by using P. vivax samples from Colombia [17][18][19].However, PvMSP7E seems to be the most polymorphic protein member and reportedly has evolved rather rapidly [19].Therefore, this locus could be useful as a marker for P. vivax populations besides being a member in the protein family involved in erythrocyte invasion.The aim of this study is to analyze sequence diversity at this locus among P. vivax populations from diverse malaria endemic areas of Thailand.Results revealed extensive diversity in PvMSP7E of Thai isolates that exhibited spatial variation.Furthermore, differential selective pressures in the PvMSP-7E locus seem to be related with its predicted protein secondary structure.

Human ethics statement
Written informed consent was obtained from all participants or from their parents or guardians prior to blood sample collection.Research protocol was approved by the Institutional Review Board in Human Research of Faculty of Medicine, Chulalongkorn University, Thailand (IRB No. 104/59).

Study population
One hundred and ten venous blood samples were collected from uncomplicated symptomatic P. vivax-infected patients diagnosed by microscopic examination of Giemsa stained blood films and preserved in EDTA anticoagulant.Of these, 80 samples were collected from Tak (n = 31), Narathiwat (n = 16) and Yala (n = 9) provinces during 2008-2009.Additional 24 blood samples from Tak province preserved at -80˚C were obtained in 1996.Thirty samples from Ubon Ratchathani province was collected during a malaria epidemic in 2014-2015.All blood samples were stored at -30˚C.

DNA extraction
Genomic DNA of blood samples was prepared by using QIAamp DNA mini kit (Qiagen, Hilden, Germany) essentially per manufacturer's recommendations.DNA was stored at -30 ˚C until use.

PCR detection of P. vivax and genotyping
All blood samples were reaffirmed for the presence of P. vivax DNA by nested PCR method using specific primers derived from the 18S rRNA gene as described previously [21].Allelespecific PCR targeting the polymorphic block 6 of the merozoite surface protein 1 of P. vivax was deployed to determine the number of parasite clones in each isolate as previously reported [22][23][24].

Data analysis
DNA sequences were aligned using the default option in CLUSTAL W program [25].The PvMSP-7E gene of the Salvador I strain was used as a reference sequence (PVX_082665).Sequences from the Colombian isolates previously reported were also included for comparison (GenBank accession nos.KM212276-KM212294) [19].All sites at which the alignment postulated a gap were eliminated in pairwise comparisons of the analysis.Exploration of repetitive DNA sequence motifs was performed by scanning the sequence with Tandem Repeats Finder version 4.0 program [26].Protein secondary structure prediction was determined by Deep Convolutional Neural Filed program (DeepCNF) implemented in the RaptorX-Property Web-Server [27].The DeepCNF method has been validated to outperform other methods that achieved >70% accuracy in eight-state protein structure prediction [28].Protein disordered or intrinsically unstructured regions were analyzed by using the GeneSilico MetaDisorder service [29].Prediction of anchorage or protein-protein interaction regions in disordered proteins was done by using the ANCHOR/IUPRED web server [30].Haplotype diversity and its sampling variance were calculated using the DnaSP version 5.10 program [31].Nucleotide diversity (π) was calculated from the mean of pairwise sequence differences in the sample sequences using Juke and Cantor model of nucleotide substitution [32].The number of synonymous substitutions per synonymous site (d S ) and the number of nonsynonymous substitutions per nonsynonymous site (d N ) were computed by Nei and Gojobori's model with Jukes-Cantor correction [32,33].The standard errors of these parameters were estimated by the bootstrap method with 1,000 pseudosamplings implemented in the MEGA 6.0 program [34].Statistical differences between these parameters were determined by a two-tailed Z-test and the significance level was set at p < 0.05.Codon-based analyses of selection were performed by using the single-likelihood ancestor counting (SLAC), fixed effects likelihood (FEL), internal branch FEL (iFEL), random effects likelihood (REL), mixed effects model of evolution (MEME) and fast unconstrained Bayesian approximation (FUBAR) methods implemented in the Datamonkey Web-Server.Significance level settings for these tests were considered per the default values available on the Datamonkey Web-server [35].Amino acid property-based predictions of positive selection were determined by using the TreeSAAP program.Significant changes in amino acid properties were inferred for categories 6-8 with their relative percent probability set at 99.9% [36].Evidences of genetic recombination were analyzed by using the Recombination Detection Program version 4 (RDP4) that includes RDP4, GENCONV, Bootscanning, the Maximum Chi Square, CHIMAERA, Sister Scanning and 3SEQ methods [37].The fixation index (Fst) was deployed to evaluate the level of population differentiation due to genetic structure by using different hierarchical analyses of molecular variance implemented in the Arlequin software version 3.11 [38].Significance levels of the fixation indices were determined by permutation test.Phylogenetic trees were constructed using the maximum likelihood method based on the best substitution model for the sequence data that showed the lowest Bayesian Information Criterion (BIC) scores [34].The reliability of clustering patterns in the phylogenetic tree was assessed by 1,000 bootstrap pseudoreplicates.BCPRED server was used to predict B-cell epitopes with an epitope length of 20 amino acids, set to 90% classifier specificity [39].PREDIVAC server was employed to stimulate the MHC-II binding peptide [40].It is a cutting-edge tool to predict CD4+ T-cell epitopes because it has over 95% coverage of human HLA class II DR protein diversity.In the present study, HLA-DR alleles were restricted to DRB1 Ã 1202, DRB1 Ã 1502, DRB1 Ã 0701, DRB1 Ã 1501, and DRB5 Ã 1602 due to its predominant frequencies among Thai population [41].The threshold PREDIVAC score was set at 70 to screen MHC-II binding antigenic peptides.

Diversity in PvMSP-7E of Thai isolates
Of 110 P. vivax-infected blood samples, 92 isolates contained single genotypes of the block 6 of PvMSP-1 and yielded non-superimposed signals of the electropherograms of the PvMSP-7E sequences, suggesting single clone infections.Since Yala and Narathiwat provinces are located next to each other and malaria transmission has been almost similar, P. vivax isolates from these provinces are considered to be the same population.Therefore, the PvMSP-7E sequences in this study were defined by geographic origins as Tak (n = 46), Ubon Ratchathani (n = 22) and Yala-Narathiwat (n = 24).However, Tak population was further subdivided into 2 populations based on collection period, i.e. 2008-2009 (n = 28) and 1996 (n = 18)(Table 1).In total, 52 haplotypes were identified among 92 Thai isolates, comprising 194 nucleotide substitutions, 185 segregating sites and 9 insertion/deletions.Haplotype #1 was most common and shared between P. vivax populations from Ubon Ratchathani (n = 2) and Yala-Narathiwat (n = 14).Likewise, haplotypes #15-#17 co-existed in Tak and Ubon Ratchathani.However, the remaining 48 haplotypes were not shared between these endemic areas (S2 Fig) .Although the level of nucleotide diversity for Tak population was higher than those for Ubon Ratchathani and Yala-Narathiwat populations, the differences were not statistically significant (Z-test, p > 0.05).Meanwhile, the number of haplotypes and haplotype diversity for Yala-Narathiwat population were remarkably lower than those for Tak and Ubon Ratchathani populations, indicating limited number of variants and more uneven distribution of haplotypes in Yala and Narathiwat provinces.

Sequence variation in the 5' and the 3' regions of PvMSP-7E
Alignment of the complete PvMSP-7E sequences of isolates in this study with the Salvador 1 sequence has revealed two regions with low nucleotide diversity (π = 0.0224 and 0.0273) located in the 5' and 3' regions, spanning 123 and 135-136 codons, respectively (Fig 1).It is noteworthy that 51 nucleotides at the 5' end and 15 nucleotides at the 3' end of the gene were not analyzed among Colombian isolates; these regions contained 4 nonsynonymous codon changes at residues F12L, F14L, S17C and L368F [19].In total 20 and 51 nucleotide substitutions were observed in the 5' and 3' regions, respectively.Of these, 69 nucleotide substitutions were dimorphic and 2 substitutions at positions 786 and 926 at the 3' region were trimorphic, i.e. 3 bases are found at positions wherever substitutions occur.Most Thai isolates (94.6%) had a deletion of codon 312 coding for proline in the 3' region, whereas 21 (22.8%) and 26 (28.3%) isolates contained TTA (leucine) and GAA (glutamine) insertions between codons 150 and 151, and between codons 215 and 216, respectively (positions after the coding region of the Salvador 1 sequence).

Sequence variation in the central region of PvMSP-7E
The nucleotide diversity of the central region of PvMSP-7E, encompassing codons 124-234 of the Salvador 1 strain (Fig 1 ), displayed an order of magnitude greater than those for the 5' and 3' regions and the differences were statistically meaningful (p < 0.0001)(Table 2).terminal parts of the central region are referred here as 5'-trimorphic and 3'-dimorphic subregions, containing 77-78 and 35-37 codons, respectively (Fig 1).The level of nucleotide diversity for the 5'-trimorphic region was significantly greater than that for the 3'-dimorphic region (p < 0.005)(Table 2).Furthermore, 2 indels were found in the central region of this gene, one in the 5'-trimorphic and the other in the 3'-dimorphic subregions.

Protein secondary structure prediction
Protein secondary structure prediction of PvMSP-7E by using DeepCNF method implemented in the RaptorX-Property Web-Server has revealed 10 α-helical domains.Four α-helices were located in the N-terminal and the remaining 6 α-helices at the C-terminal (Fig 3).The majority of the non-helical regions seem to contain random coil motifs.Analysis using the MetaDisorder series implemented in the GeneSilico Metadisorder service has identified 3 intrinsically unstructured or disordered regions that span codons 27-36, 46-102 and 121-241, referred

Selective pressure on PvMSP-7E
Departure from neutrality in the PvMSP-7E locus by comparison of d S and d N has shown that d S was significantly greater than d N in both 5' and 3' regions (p < 0.005), suggesting purifying selection in these domains.By contrast, the central region seems to be under positive selection because d N significantly exceeded d S (p < 0.05)(Table 2).Similar results were obtained when analysis was performed on sequences from each parasite population (S1 Table ).Further analysis of the central region has revealed that d N significantly outnumbered d S in the 3'-dimorphic (p < 0.05), but not in the 5'-trimorphic region (p > 0.05) (Table 2).To explore whether selective pressure could have operated on specific regions of the protein, the rates of synonymous and nonsynonymous substitutions per site were determined for each domain according to the predicted protein secondary structure.Owing to the paucity of mutation sites in α-helical domains except α-helix-I, 3 adjacent helical regions were combined for analysis.Results revealed that d S significantly exceeded d N in α-helical domains II-IV, V-VII and VIII-X, suggesting purifying selection in these regions.Evidence of purifying selection was also observed in the predicted disordered domain 2 (D2).The remaining non-helical regions and the D1 domain exhibited no significant difference between d S and d N .On the other hand, the α-helix-I domain displayed greater value of d N than d S and the difference was statistically significant, implying positive selection in this region.In domain D3, d N outnumbered d S but the difference was not significant.However, positive selection was detected in the 3' portion of the D3 domain corresponding to the 3'-dimorphic subregion where significantly greater d N than d S was observed (Tables 2 and 3).Codon-based tests for departure from neutrality using SLAC, FEL, iFEL, REL, FUBAR and MEME methods implemented in the Datamonkey Wed-Server have identified 2, 8, 11, 39, 12 and 20 positively selected sites, respectively.Likewise, the TreeSAAP program that determined significant alteration in various physicochemical properties of substituted amino acids could detect 18 positively selected sites.Meanwhile, SLAC, FEL, iFEL, REL and FUBAR identified 23, 38, 31, 30 and 24 negatively selected sites, respectively.However, these methods could potentially generate some false positive and negative results.Therefore, a consensus of concordant results for positively and negatively selected residues from at least 2 methods was considered herein and listed in Tables 4 and 5.It is noteworthy that the majority (21 of 26) of codons displaying positive selection were mapped in α-helix-I and domains predicted to be disordered protein.On the other hand, most negatively selected codons (29 of 34) were found outside these domains.The distribution of these positively and negatively selected sites in relation with α-helix and disordered regions was significant difference (p < 0.0001, Fischer exact probability test).

Recombination
Intragenic recombination in the PvMSP-7E locus was evidenced by various recombination parameters.The RDP package incorporating RDP, GENCONV, Bootscan, MaxChi, Chimera, Siscan and 3Seq identified 11, 8, 8 and 1 unique events by at least one of these tests in populations from Tak collected during 2008-2009, Tak in 1996, Ubon Ratchathani and Yala-Narathiwat, respectively.More than half of the nucleotide sites (29 of 56) involved in these recombination events were located in the central region of the gene (S2 Table ).

Population differentiation
Genetic differentiation between populations was measured from the fixation index that ranges between 0 to 1 (or 100%), indicating no subdivision between populations and complete genetic isolation between populations, respectively.The Fst values based on PvMSP-7E between P. vivax populations and their p values are shown in Table 6.The Fst values were relatively high and reached highly significant levels (p < 10 −5 ) between populations from Tak (all collection periods) and Yala-Narathiwat (21.75%), and between Ubon Ratchathani and Yala-Narathiwat (19.44%), implying genetic differentiation or limited gene flow between these endemic areas.Although the Fst value between Ubon Ratchathani and Tak (all collection period) was low (0.82%), it was significantly deviated from zero (p = 0.045).When Tak isolates were considered

Phylogenetic analysis
Maximum likelihood tree was constructed using Hasegawa-Kishino-Yano model and gamma distributed with invariant sites that gave the lowest BIC score.Phylogenetic analysis of Thai isolates and those reported from Colombia has shown no geographic clustering of the PvMSP-7E sequences (Fig 4).Although the tree topology displayed 2 clusters of sequences, the bootstrap value was low.Mosaic organization in the central region of the gene, probably due to  recurrent interallelic recombination events throughout this locus, could lead to phylogenetic homogenization of PvMSP-7E.

Predicted linear B-cell and helper T-cell epitopes
Most of the predicted B-cell epitopes were identified in the central region of PvMSP-7E (S3 Fig) .On the other hand, the putative CD4+ T cell epitopes for common HLA-DRB1 haplotypes in Thai population, i.e.DRB1 Ã 0701, DRB Ã 1202, DRB1 Ã 1501, DRB1 Ã 1502 and DRB5 Ã 1602 [41], were predicted to occur in all domains of the protein.However, amino acid substitutions in these epitopes seem to alter predicted HLA-binding scores (S3 Table ).

Discussion
Primary processing of PfMSP-7 generates two protein fragments with molecular weight of 20-kDa and 33-kDa.Although the fate of the former N-terminal fragment remains to be addressed, the latter C-terminal fragment is found to be associated with the PfMSP-1 precursor protein [10].Secondary processing of PfMSP-7 yields 19-or 22-kDa proteins, in which the cleavage sites occur after glutamine residues, i.e. between glutamine and glutamic acid, and glutamine and serine, respectively [42].Currently it is unknown whether PvMSP-7 undergoes proteolytic processing akin to PfMSP-7.However, a consensus for P. falciparum subtilisin 1 (PfSUB1) cleavage site seems to be present in PvMSP7E (S4 Fig) [43].
Sequence analysis of the PvMSP-7E gene of Thai isolates has shown that both 5' and 3' regions displayed low levels of nucleotide diversity that are in line with previous analysis of the Colombian samples [19].Although the 3' region contained more nucleotide substitutions than the 5' region, the levels of nucleotide diversity of these regions did not significantly differ (Table 2).However, evidences of purifying selection were observed in both regions, suggesting that structure or function of the protein could affect the rate of synonymous and nonsynonymous substitutions in these regions.Codon-based tests for deviation from neutrality also supported that most negatively selected codons occurred in these regions.Further analysis, taking into account the predicted protein secondary structure, has revealed evidences of purifying selection in all helical domains except α-helix-I, implying structural or functional constraints in most α-helical structure of the protein.On the other hand, α-helix-I was under positive selection because d N was significantly greater than d S .It is noteworthy that the putative signal peptide of PvMSP-7E spanned α-helix-I domain that seems to be shed from the precursor protein without any association with the MSP-1 complex [9].To date it is unknown whether the N-terminal signal peptide would be immunogenic during malaria infection.However, positive selection in the N-terminal signal peptide of this protein could imply its role in immune evasion process because amino acid substitutions in α-helix-I domain (residues 12, 14, 16 and 17) could alter CD4+ T helper cell epitopes' predicted scores for peptide binding to the common HLA-DRB1 haplotypes in Thai population (S3 Table) [40,41].
Although signals of recombination events have been detected across PvMSP-7E, the majority of potential recombination breakpoints seem to be more pronounced in the central region than the remainders.Therefore, a higher level of nucleotide diversity in the central domain than those in the 5' and 3' regions could partly stem from intragenic recombination between distinct alleles.The 5' domain of the central region exhibits mosaic organisation of sequences that could be plausibly generated by intragenic recombination among 3 putative parental alleles.Meanwhile, the 3' domain of the central region could be derived from recombination between dimorphic parental alleles.It is likely that intragenic recombination between distinct alleles during sexual reproduction in anopheline vector could enhance genetic diversity at this  locus, implying that effective vector control could reduce genetic diversity of parasite population.Recombination has a local influence on sequence diversity, where it stabilizes adaptive traits or removes deleterious variants [44].Intriguingly, the boundary the conserved 5'region and the central variable region of PvMSP-7E seems to be around the predicted processing site as inferred from a consensus of amino acid residues observed in subtilisin 1 of P. falciparum (PfSUB1)(Fig 2 and S4 Fig) [43].These features could be related to the binding domains of MSP-7 to the primary processing fragments of MSP1, in which the C-terminal part of PfMSP-7 is reportedly mediated this protein-protein interaction [5,42].
The central region of PvMSP-7E was highly enriched in polar and charge amino acids with a number of glycine and proline residues whilst non-polar and non-charge residues were sparse in this region (Fig 2).Importantly, the entire central region of PvMSP-7E exhibited a consensus prediction of intrinsically unstructured or disordered protein.Although intrinsically disordered protein regions were also predicted to occur at two clusters in the N-terminal part (D 1 and D2) as a sub-segment in α-helix-II and region spanning helices III and IV, they were relatively short.It is important to note that most intrinsically disordered protein regions undergo transition to ordered or structured proteins upon functioning [45].The predicted disordered domains 1 and 2 that overlapped α-helical domains II-IV in the N-terminal part of PvMSP-7E could change to their corresponding structure proteins upon contact with their respective, yet unknown targets.Importantly, the N-terminal part of MSP-7 has been recently demonstrated to be a ligand for the host's P-selectin [14].Because P-selectin is a cell adhesion molecule on the surface of activated endothelial cells and platelets that plays an important role in efficient recruitment of leucocytes to the site of tissue injury during the inflammatory process, binding of malarial MSP-7 to this specific host receptor could modulate disease severity during malaria infection [46,47].Purifying selection in the 5' region of PvMSP-7E could support functional constraint on the N-terminal part of this protein.
The occurrence of long stretch of unstructured or disordered region in the central part of PvMSP-7E could lead to increased in intrinsic plasticity that has been considered to be an important characteristic for molecular recognition of or interaction with its protein targets.The protein-protein binding regions were also predicted to be located in the central part of this protein (Fig 3).Importantly, the C-terminal MSP-7 fragment associated with MSP-1 complex has not been directly involved in erythrocyte recognition [48,49].Test for departure from neutrality by comparison between d S and d N has shown evidence for positive selection in the central region of PvMSP-7E.However, signals of positive selection occurred exclusively to the 3' portion of this region corresponding to the dimorphic central domain of the protein.Besides being a potential binding region to the primary processing fragments of MSP-1, the dimorphic domain could be responsible for immune evasion.Consistently, mice immunized with P. yoelii MSP-7 could elicit antibody response, but failed to protect mice against lethal infection with the virulent strain [50].Although the effects of sequence diversity in PvMSP-7E on host immune responses remain to be investigated, amino acid substitutions in the this protein could remarkably alter predicted scores for HLA-bindings for CD4+ T helper cell epitopes as well as predicted scores for linear B-cell epitopes, and particularly concentrated in the central domain (S3 Fig and S3 Table ).
The implementation of integrated malaria control program in Thailand has markedly reduced the annual parasite incidences of the country during the past 3 decades.However, foci of malaria transmission remain in several provinces bordering Myanmar, Cambodia and Malaysia albeit being considered hypoendemic areas [51].Analysis of genetic diversity in the PvMSP-7E locus of P. vivax populations from 3 major endemic areas of Thailand has shown a remarkably lower level of haplotype diversity in Yala and Narathiwat than other endemic areas, in which only 3 haplotypes circulated in the study population whereas 19 and 34 haplotypes were sampled from Ubon Ratchathani and Tak, respectively.These findings are in agreement with previous reports on analysis of other genetic loci that encode merozoite surface protein-5, apical membrane antigen-1 and thrombospondin-related adhesive protein of P. vivax in this country [22,23,52,53].Reduced haplotype diversity in P. and P. falciparum populations from Yala and Narathiwat occurred as a result of population bottlenecks in the parasites caused by control measures [22].Although reduction in number of malaria cases in Thailand occurred across endemic areas over the past decades, cross-border migration of malaria cases was common along Thai-Myanmar and Thai-Cambodia borders but rare along Thai-Malaysia border.Therefore, population bottleneck effects can be envisaged among malaria parasites in Yala and Narathiwat provinces.The recombination breakpoints in PvMSP-7E of P. vivax populations surveyed ranged from 1 in Yala-Narathiwat population to 11 in Tak population collected during 2008-2009.The correlation between the number of recombination breakpoints and the levels of mean haplotype diversity was approaching a significant value (r = 0.941, p = 0.059), suggesting that intragenic recombination could have partly shaped the extent of haplotype diversity.The non-zero recombination breakpoints in PvMSP-7E of P. vivax isolates from Yala and Narathiwat provinces further support that bottleneck effects rather than strict clonal expansion occurred in the population.
Phylogenetic analysis neither shows specific clusters of sequences for each endemic province nor unique clades for Thai or Colombian isolates.Pairwise Fst estimates among P. vivax populations in this study have revealed significant genetic differentiation of P. vivax population between endemic areas.However, the fixation index between populations from Tak province collected in 1996 and during 2008-2009 exhibited low and non-significant value.This is in accord with our previous analysis on genetic diversity in the PvTRAP locus of parasites in Thailand that displayed spatial but not temporal variation [23].Surprisingly, the genetic differentiation between Tak collected during 2008-2009 and Ubon Ratchathani achieved a low and non-significant F ST value, suggesting gene flow or genetic admixture between these parasite populations.Recent malaria epidemic in Ubon Ratchathani province during sample collection period was mainly due to forest encroachment for illegal logging by both native people and migrants from other provinces, whereas pre-epidemic malaria cases in this province were mostly indigenous cases who acquired infections locally.Undoubtedly, population migration could shape genetic diversity of P. vivax populations in Thailand.
In conclusion, our analysis revealed extensive polymorphism in the PvMSP-7E locus among P. vivax populations in Thailand that could have been influenced by selective pressure and intragenic recombination.The levels of haplotype diversity displayed geographic variation in which a few haplotypes were circulating in southern parasite population as a result from bottleneck effects as previously noted [22].Differential selective pressures were observed at this locus and seem to be related with its predicted protein secondary structure, in which helices seem to be less tolerant to molecular adaptation than unstructured or disordered domains.However, functional and biological significance of these findings requires further investigations.

Fig 2 .
Fig 2. Sequence variation in the central region of PvMSP-7E.(A) Boundaries of the N-and C-terminal portions.(B) Parental alleles of the N-and Cterminal subregions.Dots and dashes are identical residues and deletion/insertion, respectively.Variant amino acids are listed beneath each parental sequence.https://doi.org/10.1371/journal.pone.0196765.g002

Fig 3 .
Fig 3. Predicted protein secondary structure of PvMSP-7E.α-helices and coiled structures are boxed and unboxed, respectively.Disordered protein regions are shown in bold.Protein-protein binding regions are marked underneath with asterisks.https://doi.org/10.1371/journal.pone.0196765.g003 from 2 or more tests are shown.Tick marks indicate significant positive selection.For codon-based tests, p values are based on default values in the Datamonkey web-server.For TreeSAAP, p values < 0.001 in categories 5-8 are considered to be significant differences in amino acid properties.https://doi.org/10.1371/journal.pone.0196765.t004 from 2 or more tests are shown.Tick marks indicate significant p values based on default option in the Datamonkey Web-Server.https://doi.org/10.1371/journal.pone.0196765.t005

Table 2 . Nucleotide diversity (π) and number of synonymous (d S ) and nonsynonymous (d N ) nucleotide substitutions per site in PvMSP-7E among Thai isolates. Region n M h π ± S.E. d S ± S.E. d N ± S.E.
number of haplotypes.Tests of the hypothesis that d S equals d N ;.Tests of the hypotheses that π in the central region equals the corresponding values in the 5' and the 3' regions; Tests of the hypotheses that π in the 5' trimorphic region equals that in the 3' dimorphic region; Ã p < 0.05; ÃÃ p < 0.005; ÃÃÃ p < 0.001 § § § § p < 0.0001.## p < 0.005.https://doi.org/10.1371/journal.pone.0196765.t002

Table 3 . Number of synonymous (d S ) and nonsynonymous (d N ) substitutions per site in relation to protein secondary structure prediction of PvMSP7E. Predicted domain
Fst values were also observed when comparisons were performed with Yala-Narathiwat population.Likewise, genetic differentiation was observed between populations from Ubon Ratchathani and Tak collected in 1996 (p = 0.018) despite a small Fst value.However, the Fst value between populations from Ubon Ratchathani and Tak collected during 2008-2009 was not significant, suggesting that more gene flow between these areas occurred rather recently.Nevertheless, low level of genetic divergence between Tak populations collected in 1996 and during 2008 and 2009 was observed (p = 0.108), implying chronological genetic stability of Tak population.