Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Polymorphism in merozoite surface protein-7E of Plasmodium vivax in Thailand: Natural selection related to protein secondary structure

  • Chew Weng Cheng,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing – original draft

    Affiliation Molecular Biology of Malaria and Opportunistic Parasites Research Unit, Department of Parasitology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand

  • Chaturong Putaporntip,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Supervision, Validation, Writing – original draft, Writing – review & editing

    Affiliation Molecular Biology of Malaria and Opportunistic Parasites Research Unit, Department of Parasitology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand

  • Somchai Jongwutiwes

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Validation, Writing – original draft, Writing – review & editing

    Affiliation Molecular Biology of Malaria and Opportunistic Parasites Research Unit, Department of Parasitology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand

Polymorphism in merozoite surface protein-7E of Plasmodium vivax in Thailand: Natural selection related to protein secondary structure

  • Chew Weng Cheng, 
  • Chaturong Putaporntip, 
  • Somchai Jongwutiwes


Merozoite surface protein 7 (MSP-7) is a multigene family expressed during malaria blood-stage infection. MSP-7 forms complex with MSP-1 prior to merozoite egress from erythrocytes, and could affect merozoite invasion of erythrocytes. To characterize sequence variation in the orthologue in P. vivax (PvMSP-7), a gene member encoding PvMSP-7E was analyzed among 92 Thai isolates collected from 3 major endemic areas of Thailand (Northwest: Tak, Northeast: Ubon Ratchathani, and South: Yala and Narathiwat provinces). In total, 52 distinct haplotypes were found to circulate in these areas. Although population structure based on this locus was observed between each endemic area, no genetic differentiation occurred between populations collected from different periods in the same endemic area, suggesting spatial but not temporal genetic variation. Sequence microheterogeneity in both N- and C- terminal regions was predicted to display 4 and 6 α-helical domains, respectively. Signals of purifying selection were observed in α-helices II-X, suggesting structural or functional constraint in these domains. By contrast, α-helix-I spanning the putative signal peptide was under positive selection, in which amino acid substitutions could alter predicted CD4+ T helper cell epitopes. The central region of PvMSP-7E comprised the 5’-trimorphic and the 3’-dimorphic subregions. Positive selection was identified in the 3’ dimorphic subregion of the central domain. A consensus of intrinsically unstructured or disordered protein was predicted to encompass the entire central domain that contained a number of putative B cell epitopes and putative protein binding regions. Evidences of intragenic recombination were more common in the central region than the remainders of the gene. These results suggest that the extent of sequence variation, recombination events and selective pressures in the PvMSP-7E locus seem to be differentially affected by protein secondary structure.


In most malaria endemic areas outside of Africa, Plasmodium vivax mainly coexists with Plasmodium falciparum, both of which affect global health burden and contribute remarkably to economic loss [1, 2]. Control of malaria caused by P. vivax has been complicated by the presence of hypnozoites responsible for chronic relapsing illness, the emergence of chloroquine-resistant P. vivax strains and spread of insecticide-resistant anopheline vectors [3]. Therefore, alternative measures are required such as development of malaria vaccines [4].

One of the prime strategies for asexual blood stage vaccine development is to mount immunity that interrupts the invasion of Plasmodium merozoites into erythrocytes [5]. The initial attachment of merozoite to erythrocyte surface is primarily mediated by the binding of merozoite surface protein-1 (MSP-1) to Band 3 on the erythrocyte membrane [6]. Although MSP-1 has been considered a prime target for asexual blood stage vaccine development, recent studies have shown that other merozoite surface proteins, such as merozoite surface proteins-6 and -7 (MSP-6 and MSP-7), form a non-covalent complex with MSP-1 prior to receptor-ligand recognition [710]. MSP-7 is expressed during schizogony and undergoes two steps of proteolytic processing akin to MSP-1. Disruption of P. falciparum MSP-7 (PfMSP-7) has resulted in partial impairment in erythrocyte invasion by malarial merozoites [11]. Meanwhile, anti-PfMSP-1/6/7 antibodies can interfere with MSP-1 shedding and reduce merozoite invasion into erythrocytes [12]. Disruption of the orthologous gene in P. berghei affected intraerythrocytic growth of parasites [13]. Furthermore, specific binding of P. berghei MSP-7 to P-selectin has suggested the role of this protein in modulating disease severity through immunological process [14]. Therefore, immunity induced by vaccines derived from malarial MSP-7 could potentially interrupt parasite development.

MSP-7 proteins are encoded by a multigene family, of which the number of gene members varies across Plasmodium species [15, 16]. The MSP7 family of P. vivax contains 13 gene members, designated alphabetically from PvMSP-7A to PvMSP-7M, found in tandem with head to tail arrangement on chromosome 12 [15]. Of these, PvMSP-7C, -7E, -7H and -7I displayed higher nucleotide diversity than other paralogous gene members [1719]. Although the less polymorphic protein members have been suggested for vaccine incorporation, it is currently unknown whether particular members of the PvMSP7 family involve in binding to PvMSP1 during host cell invasion. Importantly, several malarial surface proteins involved in host cell invasion are under positive or balancing selection and have been considered to be targets of vaccine development [20]. Meanwhile, antigenic diversity in several malarial vaccine candidates could elicit allele-specific immune responses, an issue that may complicate vaccine design. Therefore, analysis of sequence variation in malarial vaccine candidates among parasite populations from different geographic areas is essential for a basis of vaccine development [4].

The extent of genetic diversity in PvMSP-7 has been analyzed by using P. vivax samples from Colombia [1719]. However, PvMSP7E seems to be the most polymorphic protein member and reportedly has evolved rather rapidly [19]. Therefore, this locus could be useful as a marker for P. vivax populations besides being a member in the protein family involved in erythrocyte invasion. The aim of this study is to analyze sequence diversity at this locus among P. vivax populations from diverse malaria endemic areas of Thailand. Results revealed extensive diversity in PvMSP7E of Thai isolates that exhibited spatial variation. Furthermore, differential selective pressures in the PvMSP-7E locus seem to be related with its predicted protein secondary structure.

Materials and methods

Human ethics statement

Written informed consent was obtained from all participants or from their parents or guardians prior to blood sample collection. Research protocol was approved by the Institutional Review Board in Human Research of Faculty of Medicine, Chulalongkorn University, Thailand (IRB No. 104/59).

Study population

One hundred and ten venous blood samples were collected from uncomplicated symptomatic P. vivax-infected patients diagnosed by microscopic examination of Giemsa stained blood films and preserved in EDTA anticoagulant. Of these, 80 samples were collected from Tak (n = 31), Narathiwat (n = 16) and Yala (n = 9) provinces during 2008–2009. Additional 24 blood samples from Tak province preserved at -80°C were obtained in 1996. Thirty samples from Ubon Ratchathani province was collected during a malaria epidemic in 2014–2015. All blood samples were stored at -30°C.

DNA extraction

Genomic DNA of blood samples was prepared by using QIAamp DNA mini kit (Qiagen, Hilden, Germany) essentially per manufacturer’s recommendations. DNA was stored at -30 °C until use.

PCR detection of P. vivax and genotyping

All blood samples were reaffirmed for the presence of P. vivax DNA by nested PCR method using specific primers derived from the 18S rRNA gene as described previously [21]. Allele-specific PCR targeting the polymorphic block 6 of the merozoite surface protein 1 of P. vivax was deployed to determine the number of parasite clones in each isolate as previously reported [2224].

Amplification and sequencing of PvMSP-7E

The complete coding region of PvMSP-7E (~1.1 kb) from each isolate was amplified by nested PCR using outer primers PvMSP-7F (5’-CATACCTTCGATACGTGTACTTC-3’) and PvMSP-7R (5’-CATTTCGCGTGTGCGTGTCTATG-3’) of the Salvador I strain (GenBank accession no. XM_001614084, chromosome 12: position 771164 to 772282) and the inner primers located 5’ and 3’ before and after the coding region of PvMSP-7E (PvMSP-7EF: 5’-AATCGCCACACATCGTCTGTG-3’ and PvMSP-7ER: 5’-ATTTCATCTTTACTGTTGGGCAC-3’) (S1 Fig). Primary PCR amplification was performed in a total volume of 15 μL containing PCR buffer, 200 μM dNTP, 0.2 μM of each primer, nuclease free water, 2 μL of template DNA and 1.25 units of TaKaRa LA Taq (Takara, Seta, Japan). The thermal cycling profiles for primary PCR contained a pre-amplification denaturation at 94°C, 60 s; followed by 35 cycles of denaturation at 96°C, 30 s; annealing at 50°C, 30 s; polymerization at 72°C, 7 minutes, and final elongation at 72°C, 10 minutes. Secondary PCR contained PCR buffer, 200 μM dNTP, 0.2 μM of each primer, nuclease free water, 1 μL of template DNA from primary PCR and 1.25 units of ExTaq DNA polymerase (Takara, Seta, Japan) in a total volume of 30 μL. The amplification profile for secondary PCR consisted of 94°C, 60 s; 30 cycles of 96°C, 30 s; 50°C, 30 s and 72°C, 2 minutes; and 72°C, 5 minutes. DNA amplification was performed by using a GeneAmp 9700 PCR thermal cycler (Applied Biosystems, Foster City, CA). The PCR products were analyzed on 1% agarose gel electrophoresis, stained with ethidium bromide and visualized under UV transillumination. Template DNA for sequencing was prepared from PCR products that were purified by using QIAquick PCR purification kit (Qiagen, Hilden, Germany). DNA sequences were determined directly and bi-directionally from the purified templates from secondary PCR using ABI PRISM BigDye Terminator v3.1 Ready Reaction Cycle Sequencing kit (Applied Biosystems) and sequencing primers.

Data analysis

DNA sequences were aligned using the default option in CLUSTAL W program [25]. The PvMSP-7E gene of the Salvador I strain was used as a reference sequence (PVX_082665). Sequences from the Colombian isolates previously reported were also included for comparison (GenBank accession nos. KM212276-KM212294) [19]. All sites at which the alignment postulated a gap were eliminated in pairwise comparisons of the analysis. Exploration of repetitive DNA sequence motifs was performed by scanning the sequence with Tandem Repeats Finder version 4.0 program [26]. Protein secondary structure prediction was determined by Deep Convolutional Neural Filed program (DeepCNF) implemented in the RaptorX-Property Web-Server [27]. The DeepCNF method has been validated to outperform other methods that achieved >70% accuracy in eight-state protein structure prediction [28]. Protein disordered or intrinsically unstructured regions were analyzed by using the GeneSilico MetaDisorder service [29]. Prediction of anchorage or protein-protein interaction regions in disordered proteins was done by using the ANCHOR/IUPRED web server [30]. Haplotype diversity and its sampling variance were calculated using the DnaSP version 5.10 program [31]. Nucleotide diversity (π) was calculated from the mean of pairwise sequence differences in the sample sequences using Juke and Cantor model of nucleotide substitution [32]. The number of synonymous substitutions per synonymous site (dS) and the number of nonsynonymous substitutions per nonsynonymous site (dN) were computed by Nei and Gojobori’s model with Jukes-Cantor correction [32, 33]. The standard errors of these parameters were estimated by the bootstrap method with 1,000 pseudosamplings implemented in the MEGA 6.0 program [34]. Statistical differences between these parameters were determined by a two-tailed Z-test and the significance level was set at p < 0.05. Codon-based analyses of selection were performed by using the single-likelihood ancestor counting (SLAC), fixed effects likelihood (FEL), internal branch FEL (iFEL), random effects likelihood (REL), mixed effects model of evolution (MEME) and fast unconstrained Bayesian approximation (FUBAR) methods implemented in the Datamonkey Web-Server. Significance level settings for these tests were considered per the default values available on the Datamonkey Web-server [35]. Amino acid property-based predictions of positive selection were determined by using the TreeSAAP program. Significant changes in amino acid properties were inferred for categories 6–8 with their relative percent probability set at 99.9% [36]. Evidences of genetic recombination were analyzed by using the Recombination Detection Program version 4 (RDP4) that includes RDP4, GENCONV, Bootscanning, the Maximum Chi Square, CHIMAERA, Sister Scanning and 3SEQ methods [37]. The fixation index (Fst) was deployed to evaluate the level of population differentiation due to genetic structure by using different hierarchical analyses of molecular variance implemented in the Arlequin software version 3.11 [38]. Significance levels of the fixation indices were determined by permutation test. Phylogenetic trees were constructed using the maximum likelihood method based on the best substitution model for the sequence data that showed the lowest Bayesian Information Criterion (BIC) scores [34]. The reliability of clustering patterns in the phylogenetic tree was assessed by 1,000 bootstrap pseudoreplicates. BCPRED server was used to predict B-cell epitopes with an epitope length of 20 amino acids, set to 90% classifier specificity [39]. PREDIVAC server was employed to stimulate the MHC-II binding peptide [40]. It is a cutting-edge tool to predict CD4+ T-cell epitopes because it has over 95% coverage of human HLA class II DR protein diversity. In the present study, HLA-DR alleles were restricted to DRB1*1202, DRB1*1502, DRB1*0701, DRB1*1501, and DRB5*1602 due to its predominant frequencies among Thai population [41]. The threshold PREDIVAC score was set at 70 to screen MHC-II binding antigenic peptides.


Diversity in PvMSP-7E of Thai isolates

Of 110 P. vivax-infected blood samples, 92 isolates contained single genotypes of the block 6 of PvMSP-1 and yielded non-superimposed signals of the electropherograms of the PvMSP-7E sequences, suggesting single clone infections. Since Yala and Narathiwat provinces are located next to each other and malaria transmission has been almost similar, P. vivax isolates from these provinces are considered to be the same population. Therefore, the PvMSP-7E sequences in this study were defined by geographic origins as Tak (n = 46), Ubon Ratchathani (n = 22) and Yala-Narathiwat (n = 24). However, Tak population was further subdivided into 2 populations based on collection period, i.e. 2008–2009 (n = 28) and 1996 (n = 18)(Table 1). In total, 52 haplotypes were identified among 92 Thai isolates, comprising 194 nucleotide substitutions, 185 segregating sites and 9 insertion/deletions. Haplotype #1 was most common and shared between P. vivax populations from Ubon Ratchathani (n = 2) and Yala-Narathiwat (n = 14). Likewise, haplotypes #15-#17 co-existed in Tak and Ubon Ratchathani. However, the remaining 48 haplotypes were not shared between these endemic areas (S2 Fig). Although the level of nucleotide diversity for Tak population was higher than those for Ubon Ratchathani and Yala-Narathiwat populations, the differences were not statistically significant (Z-test, p > 0.05). Meanwhile, the number of haplotypes and haplotype diversity for Yala-Narathiwat population were remarkably lower than those for Tak and Ubon Ratchathani populations, indicating limited number of variants and more uneven distribution of haplotypes in Yala and Narathiwat provinces.

Table 1. Sequence diversity in the PvMSP-7E gene of P. vivax populations in Thailand.

Sequence variation in the 5’ and the 3’ regions of PvMSP-7E

Alignment of the complete PvMSP-7E sequences of isolates in this study with the Salvador 1 sequence has revealed two regions with low nucleotide diversity (π = 0.0224 and 0.0273) located in the 5’ and 3’ regions, spanning 123 and 135–136 codons, respectively (Fig 1). It is noteworthy that 51 nucleotides at the 5’ end and 15 nucleotides at the 3’ end of the gene were not analyzed among Colombian isolates; these regions contained 4 nonsynonymous codon changes at residues F12L, F14L, S17C and L368F [19]. In total 20 and 51 nucleotide substitutions were observed in the 5’ and 3’ regions, respectively. Of these, 69 nucleotide substitutions were dimorphic and 2 substitutions at positions 786 and 926 at the 3’ region were trimorphic, i.e. 3 bases are found at positions wherever substitutions occur. Most Thai isolates (94.6%) had a deletion of codon 312 coding for proline in the 3’ region, whereas 21 (22.8%) and 26 (28.3%) isolates contained TTA (leucine) and GAA (glutamine) insertions between codons 150 and 151, and between codons 215 and 216, respectively (positions after the coding region of the Salvador 1 sequence).

Fig 1. Schematic representation of PvMSP-7E depicting conserved (open boxes), variable trimorphic (filled black box) and dimorphic (hatched box) regions.

The number of codons for each region is in parentheses.

Sequence variation in the central region of PvMSP-7E

The nucleotide diversity of the central region of PvMSP-7E, encompassing codons 124–234 of the Salvador 1 strain (Fig 1), displayed an order of magnitude greater than those for the 5’ and 3’ regions and the differences were statistically meaningful (p < 0.0001)(Table 2). Closer looks into the central amino acid sequences have revealed 29 allelic types (Fig 2A). Repeat motifs were undetectable in this gene. Importantly, the N-terminal part of the central region spanning residues 124–200 exhibited mosaic organization of sequences that could have been generated by genetic shuffling among three parental types, represented by the Salvador 1 strain (type I-5’), the APH5 isolate from Tak province (type II-5’) and an unknown strain (type III-5’)(Fig 2A). Meanwhile, the C-terminal portion of the remaining central region could have arisen from 2 parental types (Fig 2B). Therefore, nucleotide sequences encoding these N- and C-terminal parts of the central region are referred here as 5’-trimorphic and 3’-dimorphic subregions, containing 77–78 and 35–37 codons, respectively (Fig 1). The level of nucleotide diversity for the 5’-trimorphic region was significantly greater than that for the 3’-dimorphic region (p < 0.005)(Table 2). Furthermore, 2 indels were found in the central region of this gene, one in the 5’-trimorphic and the other in the 3’-dimorphic subregions.

Fig 2. Sequence variation in the central region of PvMSP-7E.

(A) Boundaries of the N- and C-terminal portions. (B) Parental alleles of the N- and C-terminal subregions. Dots and dashes are identical residues and deletion/insertion, respectively. Variant amino acids are listed beneath each parental sequence.

Table 2. Nucleotide diversity (π) and number of synonymous (dS) and nonsynonymous (dN) nucleotide substitutions per site in PvMSP-7E among Thai isolates.

Protein secondary structure prediction

Protein secondary structure prediction of PvMSP-7E by using DeepCNF method implemented in the RaptorX-Property Web-Server has revealed 10 α-helical domains. Four α-helices were located in the N-terminal and the remaining 6 α-helices at the C-terminal (Fig 3). The majority of the non-helical regions seem to contain random coil motifs. Analysis using the MetaDisorder series implemented in the GeneSilico Metadisorder service has identified 3 intrinsically unstructured or disordered regions that span codons 27–36, 46–102 and 121–241, referred here as domains D1, D2 and D3, respectively. Most of the protein-protein interaction regions were mapped in the central portion of PvMSP-7E (Fig 3).

Fig 3. Predicted protein secondary structure of PvMSP-7E.

α-helices and coiled structures are boxed and unboxed, respectively. Disordered protein regions are shown in bold. Protein-protein binding regions are marked underneath with asterisks.

Selective pressure on PvMSP-7E

Departure from neutrality in the PvMSP-7E locus by comparison of dS and dN has shown that dS was significantly greater than dN in both 5’ and 3’ regions (p < 0.005), suggesting purifying selection in these domains. By contrast, the central region seems to be under positive selection because dN significantly exceeded dS (p < 0.05)(Table 2). Similar results were obtained when analysis was performed on sequences from each parasite population (S1 Table). Further analysis of the central region has revealed that dN significantly outnumbered dS in the 3’-dimorphic (p < 0.05), but not in the 5’-trimorphic region (p > 0.05) (Table 2). To explore whether selective pressure could have operated on specific regions of the protein, the rates of synonymous and nonsynonymous substitutions per site were determined for each domain according to the predicted protein secondary structure. Owing to the paucity of mutation sites in α-helical domains except α-helix-I, 3 adjacent helical regions were combined for analysis. Results revealed that dS significantly exceeded dN in α-helical domains II-IV, V-VII and VIII-X, suggesting purifying selection in these regions. Evidence of purifying selection was also observed in the predicted disordered domain 2 (D2). The remaining non-helical regions and the D1 domain exhibited no significant difference between dS and dN. On the other hand, the α-helix-I domain displayed greater value of dN than dS and the difference was statistically significant, implying positive selection in this region. In domain D3, dN outnumbered dS but the difference was not significant. However, positive selection was detected in the 3’ portion of the D3 domain corresponding to the 3’-dimorphic subregion where significantly greater dN than dS was observed (Tables 2 and 3).

Table 3. Number of synonymous (dS) and nonsynonymous (dN) substitutions per site in relation to protein secondary structure prediction of PvMSP7E.

Codon-based tests for departure from neutrality using SLAC, FEL, iFEL, REL, FUBAR and MEME methods implemented in the Datamonkey Wed-Server have identified 2, 8, 11, 39, 12 and 20 positively selected sites, respectively. Likewise, the TreeSAAP program that determined significant alteration in various physicochemical properties of substituted amino acids could detect 18 positively selected sites. Meanwhile, SLAC, FEL, iFEL, REL and FUBAR identified 23, 38, 31, 30 and 24 negatively selected sites, respectively. However, these methods could potentially generate some false positive and negative results. Therefore, a consensus of concordant results for positively and negatively selected residues from at least 2 methods was considered herein and listed in Tables 4 and 5. It is noteworthy that the majority (21 of 26) of codons displaying positive selection were mapped in α-helix-I and domains predicted to be disordered protein. On the other hand, most negatively selected codons (29 of 34) were found outside these domains. The distribution of these positively and negatively selected sites in relation with α-helix and disordered regions was significant difference (p < 0.0001, Fischer exact probability test).


Intragenic recombination in the PvMSP-7E locus was evidenced by various recombination parameters. The RDP package incorporating RDP, GENCONV, Bootscan, MaxChi, Chimera, Siscan and 3Seq identified 11, 8, 8 and 1 unique events by at least one of these tests in populations from Tak collected during 2008–2009, Tak in 1996, Ubon Ratchathani and Yala-Narathiwat, respectively. More than half of the nucleotide sites (29 of 56) involved in these recombination events were located in the central region of the gene (S2 Table).

Population differentiation

Genetic differentiation between populations was measured from the fixation index that ranges between 0 to 1 (or 100%), indicating no subdivision between populations and complete genetic isolation between populations, respectively. The Fst values based on PvMSP-7E between P. vivax populations and their p values are shown in Table 6. The Fst values were relatively high and reached highly significant levels (p < 10−5) between populations from Tak (all collection periods) and Yala-Narathiwat (21.75%), and between Ubon Ratchathani and Yala-Narathiwat (19.44%), implying genetic differentiation or limited gene flow between these endemic areas. Although the Fst value between Ubon Ratchathani and Tak (all collection period) was low (0.82%), it was significantly deviated from zero (p = 0.045). When Tak isolates were considered separately according to sampling years, significant deviation from zero of the Fst values were also observed when comparisons were performed with Yala-Narathiwat population. Likewise, genetic differentiation was observed between populations from Ubon Ratchathani and Tak collected in 1996 (p = 0.018) despite a small Fst value. However, the Fst value between populations from Ubon Ratchathani and Tak collected during 2008–2009 was not significant, suggesting that more gene flow between these areas occurred rather recently. Nevertheless, low level of genetic divergence between Tak populations collected in 1996 and during 2008 and 2009 was observed (p = 0.108), implying chronological genetic stability of Tak population.

Table 6. Interpopulation variance indices of P. vivax populations in Thailand inferred from PvMSP-7E.

Phylogenetic analysis

Maximum likelihood tree was constructed using Hasegawa-Kishino-Yano model and gamma distributed with invariant sites that gave the lowest BIC score. Phylogenetic analysis of Thai isolates and those reported from Colombia has shown no geographic clustering of the PvMSP-7E sequences (Fig 4). Although the tree topology displayed 2 clusters of sequences, the bootstrap value was low. Mosaic organization in the central region of the gene, probably due to recurrent interallelic recombination events throughout this locus, could lead to phylogenetic homogenization of PvMSP-7E.

Fig 4. Maximum likelihood phylogenetic tree of PvMSP-7E based on Hasegawa-Kishino-Yano model and gamma distributed with invariant sites.

Tree was constructed using distinct sequences of Thai and Colombian isolates (closed triangle) comparing with the Salvador I strain (closed circle). Bootstrap values >50% are shown.

Predicted linear B-cell and helper T-cell epitopes

Most of the predicted B-cell epitopes were identified in the central region of PvMSP-7E (S3 Fig). On the other hand, the putative CD4+ T cell epitopes for common HLA-DRB1 haplotypes in Thai population, i.e. DRB1*0701, DRB*1202, DRB1*1501, DRB1*1502 and DRB5*1602 [41], were predicted to occur in all domains of the protein. However, amino acid substitutions in these epitopes seem to alter predicted HLA-binding scores (S3 Table).


Primary processing of PfMSP-7 generates two protein fragments with molecular weight of 20-kDa and 33-kDa. Although the fate of the former N-terminal fragment remains to be addressed, the latter C-terminal fragment is found to be associated with the PfMSP-1 precursor protein [10]. Secondary processing of PfMSP-7 yields 19- or 22-kDa proteins, in which the cleavage sites occur after glutamine residues, i.e. between glutamine and glutamic acid, and glutamine and serine, respectively [42]. Currently it is unknown whether PvMSP-7 undergoes proteolytic processing akin to PfMSP-7. However, a consensus for P. falciparum subtilisin 1 (PfSUB1) cleavage site seems to be present in PvMSP7E (S4 Fig) [43].

Sequence analysis of the PvMSP-7E gene of Thai isolates has shown that both 5’ and 3’ regions displayed low levels of nucleotide diversity that are in line with previous analysis of the Colombian samples [19]. Although the 3’ region contained more nucleotide substitutions than the 5’ region, the levels of nucleotide diversity of these regions did not significantly differ (Table 2). However, evidences of purifying selection were observed in both regions, suggesting that structure or function of the protein could affect the rate of synonymous and nonsynonymous substitutions in these regions. Codon-based tests for deviation from neutrality also supported that most negatively selected codons occurred in these regions. Further analysis, taking into account the predicted protein secondary structure, has revealed evidences of purifying selection in all helical domains except α-helix-I, implying structural or functional constraints in most α-helical structure of the protein. On the other hand, α-helix-I was under positive selection because dN was significantly greater than dS. It is noteworthy that the putative signal peptide of PvMSP-7E spanned α-helix-I domain that seems to be shed from the precursor protein without any association with the MSP-1 complex [9]. To date it is unknown whether the N-terminal signal peptide would be immunogenic during malaria infection. However, positive selection in the N-terminal signal peptide of this protein could imply its role in immune evasion process because amino acid substitutions in α-helix-I domain (residues 12, 14, 16 and 17) could alter CD4+ T helper cell epitopes’ predicted scores for peptide binding to the common HLA-DRB1 haplotypes in Thai population (S3 Table) [40, 41].

Although signals of recombination events have been detected across PvMSP-7E, the majority of potential recombination breakpoints seem to be more pronounced in the central region than the remainders. Therefore, a higher level of nucleotide diversity in the central domain than those in the 5’ and 3’ regions could partly stem from intragenic recombination between distinct alleles. The 5’ domain of the central region exhibits mosaic organisation of sequences that could be plausibly generated by intragenic recombination among 3 putative parental alleles. Meanwhile, the 3’ domain of the central region could be derived from recombination between dimorphic parental alleles. It is likely that intragenic recombination between distinct alleles during sexual reproduction in anopheline vector could enhance genetic diversity at this locus, implying that effective vector control could reduce genetic diversity of parasite population. Recombination has a local influence on sequence diversity, where it stabilizes adaptive traits or removes deleterious variants [44]. Intriguingly, the boundary between the conserved 5’region and the central variable region of PvMSP-7E seems to be around the predicted processing site as inferred from a consensus of amino acid residues observed in subtilisin 1 of P. falciparum (PfSUB1)(Fig 2 and S4 Fig) [43]. These features could be related to the binding domains of MSP-7 to the primary processing fragments of MSP1, in which the C-terminal part of PfMSP-7 is reportedly mediated this protein-protein interaction [5, 42].

The central region of PvMSP-7E was highly enriched in polar and charge amino acids with a number of glycine and proline residues whilst non-polar and non-charge residues were sparse in this region (Fig 2). Importantly, the entire central region of PvMSP-7E exhibited a consensus prediction of intrinsically unstructured or disordered protein. Although intrinsically disordered protein regions were also predicted to occur at two clusters in the N-terminal part (D 1 and D2) as a sub-segment in α-helix-II and region spanning helices III and IV, they were relatively short. It is important to note that most intrinsically disordered protein regions undergo transition to ordered or structured proteins upon functioning [45]. The predicted disordered domains 1 and 2 that overlapped α-helical domains II-IV in the N-terminal part of PvMSP-7E could change to their corresponding structure proteins upon contact with their respective, yet unknown targets. Importantly, the N-terminal part of MSP-7 has been recently demonstrated to be a ligand for the host’s P-selectin [14]. Because P-selectin is a cell adhesion molecule on the surface of activated endothelial cells and platelets that plays an important role in efficient recruitment of leucocytes to the site of tissue injury during the inflammatory process, binding of malarial MSP-7 to this specific host receptor could modulate disease severity during malaria infection [46, 47]. Purifying selection in the 5’ region of PvMSP-7E could support functional constraint on the N-terminal part of this protein.

The occurrence of long stretch of unstructured or disordered region in the central part of PvMSP-7E could lead to increased in intrinsic plasticity that has been considered to be an important characteristic for molecular recognition of or interaction with its protein targets. The protein-protein binding regions were also predicted to be located in the central part of this protein (Fig 3). Importantly, the C-terminal MSP-7 fragment associated with MSP-1 complex has not been directly involved in erythrocyte recognition [48, 49]. Test for departure from neutrality by comparison between dS and dN has shown evidence for positive selection in the central region of PvMSP-7E. However, signals of positive selection occurred exclusively to the 3’ portion of this region corresponding to the dimorphic central domain of the protein. Besides being a potential binding region to the primary processing fragments of MSP-1, the dimorphic domain could be responsible for immune evasion. Consistently, mice immunized with P. yoelii MSP-7 could elicit antibody response, but failed to protect mice against lethal infection with the virulent strain [50]. Although the effects of sequence diversity in PvMSP-7E on host immune responses remain to be investigated, amino acid substitutions in the this protein could remarkably alter predicted scores for HLA-bindings for CD4+ T helper cell epitopes as well as predicted scores for linear B-cell epitopes, and particularly concentrated in the central domain (S3 Fig and S3 Table).

The implementation of integrated malaria control program in Thailand has markedly reduced the annual parasite incidences of the country during the past 3 decades. However, foci of malaria transmission remain in several provinces bordering Myanmar, Cambodia and Malaysia albeit being considered hypoendemic areas [51]. Analysis of genetic diversity in the PvMSP-7E locus of P. vivax populations from 3 major endemic areas of Thailand has shown a remarkably lower level of haplotype diversity in Yala and Narathiwat than other endemic areas, in which only 3 haplotypes circulated in the study population whereas 19 and 34 haplotypes were sampled from Ubon Ratchathani and Tak, respectively. These findings are in agreement with previous reports on analysis of other genetic loci that encode merozoite surface protein-5, apical membrane antigen-1 and thrombospondin-related adhesive protein of P. vivax in this country [22, 23, 52, 53]. Reduced haplotype diversity in P. vivax and P. falciparum populations from Yala and Narathiwat occurred as a result of population bottlenecks in the parasites caused by control measures [22]. Although reduction in number of malaria cases in Thailand occurred across endemic areas over the past decades, cross-border migration of malaria cases was common along Thai-Myanmar and Thai-Cambodia borders but rare along Thai-Malaysia border. Therefore, population bottleneck effects can be envisaged among malaria parasites in Yala and Narathiwat provinces. The recombination breakpoints in PvMSP-7E of P. vivax populations surveyed ranged from 1 in Yala-Narathiwat population to 11 in Tak population collected during 2008–2009. The correlation between the number of recombination breakpoints and the levels of mean haplotype diversity was approaching a significant value (r = 0.941, p = 0.059), suggesting that intragenic recombination could have partly shaped the extent of haplotype diversity. The non-zero recombination breakpoints in PvMSP-7E of P. vivax isolates from Yala and Narathiwat provinces further support that bottleneck effects rather than strict clonal expansion occurred in the population.

Phylogenetic analysis neither shows specific clusters of sequences for each endemic province nor unique clades for Thai or Colombian isolates. Pairwise Fst estimates among P. vivax populations in this study have revealed significant genetic differentiation of P. vivax population between endemic areas. However, the fixation index between populations from Tak province collected in 1996 and during 2008–2009 exhibited low and non-significant value. This is in accord with our previous analysis on genetic diversity in the PvTRAP locus of parasites in Thailand that displayed spatial but not temporal variation [23]. Surprisingly, the genetic differentiation between Tak collected during 2008–2009 and Ubon Ratchathani achieved a low and non-significant FST value, suggesting gene flow or genetic admixture between these parasite populations. Recent malaria epidemic in Ubon Ratchathani province during sample collection period was mainly due to forest encroachment for illegal logging by both native people and migrants from other provinces, whereas pre-epidemic malaria cases in this province were mostly indigenous cases who acquired infections locally. Undoubtedly, population migration could shape genetic diversity of P. vivax populations in Thailand.

In conclusion, our analysis revealed extensive polymorphism in the PvMSP-7E locus among P. vivax populations in Thailand that could have been influenced by selective pressure and intragenic recombination. The levels of haplotype diversity displayed geographic variation in which a few haplotypes were circulating in southern parasite population as a result from bottleneck effects as previously noted [22]. Differential selective pressures were observed at this locus and seem to be related with its predicted protein secondary structure, in which helices seem to be less tolerant to molecular adaptation than unstructured or disordered domains. However, functional and biological significance of these findings requires further investigations.

Supporting information

S1 Table. Nucleotide diversity (π) and number of synonymous (dS) and nonsynonymous (dN) substitutions per site in PvMSP-7E among P. vivax populations in Thailand.


S2 Table. Recombination breakpoints in PvMSP-7E of Thai isolates.


S3 Table. Putative CD4+ T cell epitopes in PvMSP-7E of the Salvador I strain and 2 Thai isolates (APH5 and APH31) for common HLA-DRB1 and HLA-DRB5 haplotypes in Thai population.


S1 Fig. Schematic diagram of nested-PCR primers used in this study.


S2 Fig. PvMSP-7E haplotypes among Thai isolates.


S3 Fig. Predicted linear B cell epitopes in PvMSP-7E of the Salvador I strain and 2 Thai isolates (APH5 and APH31).


S4 Fig. Secondary processing site in PfMSP7 and predicted cleavage site in PvMSP7E (down-pointing triangles).



We are grateful to all patients who donated their blood samples for this study. We thank the staff of the Bureau of Vector Borne Disease, Department of Disease Control, Ministry of Public Health, Thailand, for assistance in field work and Urassaya Pattanawong for excellent technical assistance.


  1. 1. WHO. World Malaria Report 2016. World Health Organization, Programme WGM; 2016.
  2. 2. Baird JK. Evidence and implications of mortality associated with acute Plasmodium vivax malaria. Clin Microbiol Rev. 2013;26:36–57. pmid:23297258
  3. 3. Price RN, von Seidlein L, Valecha N, Nosten F, Baird JK, White NJ. Global extent of chloroquine-resistant Plasmodium vivax: a systematic review and meta-analysis. Lancet Infect Dis. 2014;14:982–91. pmid:25213732
  4. 4. Crompton PD, Pierce SK, Miller LH. Advances and challenges in malaria vaccine development. J Clin Invest. 2010;120:4168–78. pmid:21123952
  5. 5. Beeson JG, Drew DR, Boyle MJ, Feng G, Fowkes FJ, Richards JS. Merozoite surface proteins in red blood cell invasion, immunity and vaccines against malaria. FEMS Microbiol Rev. 2016;40:343–72. pmid:26833236
  6. 6. Goel VK, Li X, Chen H, Liu SC, Chishti AH, Oh SS. Band 3 is a host receptor binding merozoite surface protein 1 during the Plasmodium falciparum invasion of erythrocytes. Proc Natl Acad Sci USA. 2003;100:5164–9. pmid:12692305
  7. 7. Stafford WH, Günder B, Harris A, Heidrich HG, Holder AA, Blackman MJ. A 22 kDa protein associated with the Plasmodium falciparum merozoite surface protein-1 complex. Mol Biochem Parasitol. 1996;80:159–69. pmid:8892293
  8. 8. Trucco C, Fernandez-Reyes D, Howell S, Stafford WH, Scott-Finnigan TJ, Grainger M, et al. The merozoite surface protein 6 gene codes for a 36 kDa protein associated with the Plasmodium falciparum merozoite surface protein-1 complex. Mol Biochem Parasitol. 2001;112:91–101. pmid:11166390
  9. 9. Kauth CW, Woehlbier U, Kern M, Mekonnen Z, Lutz R, Mücke N, et al. Interactions between merozoite surface proteins 1, 6, and 7 of the malaria parasite Plasmodium falciparum. J Biol Chem. 2006;281:31517–27. pmid:16940297
  10. 10. Pachebat JA, Kadekoppala M, Grainger M, Dluzewski AR, Gunaratne RS, Scott-Finnigan TJ, et al. Extensive proteolytic processing of the malaria parasite merozoite surface protein 7 during biosynthesis and parasite release from erythrocytes. Mol Biochem Parasitol. 2007;151:59–69. pmid:17097159
  11. 11. Kadekoppala M, O’Donnell RA, Grainger M, Crabb BS, Holder AA. Deletion of the Plasmodium falciparum merozoite surface protein 7 gene impairs parasite invasion of erythrocytes. Eukaryot Cell. 2008;7:2123–32. pmid:18820076
  12. 12. Woehlbier U, Epp C, Hackett F, Blackman MJ, Bujard H. Antibodies against multiple merozoite surface antigens of the human malaria parasite Plasmodium falciparum inhibit parasite maturation and red blood cell invasion. Malar J. 2010;9:77. pmid:20298576
  13. 13. Tewari R, Ogun SA, Gunaratne RS, Crisanti A, Holder AA. Disruption of Plasmodium berghei merozoite surface protein 7 gene modulates parasite growth in vivo. Blood. 2005;105:394–6. pmid:15339842
  14. 14. Perrin AJ, Bartholdson SJ, Wright GJ. P-selectin is a host receptor for Plasmodium MSP7 ligands. Malar J. 2015;14:238. pmid:26045295
  15. 15. Garzón-Ospina D, Cadavid LF, Patarroyo MA. Differential expansion of the merozoite surface protein (msp)-7 gene family in Plasmodium species under a birth-and-death model of evolution. Mol Phylogenet Evol. 2010;55:399–408 pmid:20172030
  16. 16. Castillo AI, Andreína Pacheco M, Escalante AA. Evolution of the merozoite surface protein 7 (msp7) family in Plasmodium vivax and P. falciparum: A comparative approach. Infect Genet Evol. 2017;50:7–19. pmid:28163236
  17. 17. Garzón-Ospina D, Romero-Murillo L, Tobón LF, Patarroyo MA. Low genetic polymorphism of merozoite surface proteins 7 and 10 in Colombian Plasmodium vivax isolates. Infect Genet Evol. 2011;11:528–31. pmid:21182986
  18. 18. Garzón-Ospina D, López C, Forero-Rodríguez J, Patarroyo MA. Genetic diversity and selection in three Plasmodium vivax merozoite surface protein 7 (Pvmsp-7) genes in a Colombian population. PloS one. 2012;7:e45962. pmid:23049905
  19. 19. Garzón-Ospina D, Forero-Rodríguez J, Patarroyo MA. Heterogeneous genetic diversity pattern in Plasmodium vivax genes encoding merozoite surface proteins (MSP)-7E,− 7F and-7L. Malar J. 2014;13:495. pmid:25496322
  20. 20. Hughes MK, Hughes AL. Natural selection on Plasmodium surface proteins. Mol Biochem Parasitol. 1995;71:99–113. pmid:7630387
  21. 21. Putaporntip C, Hongsrimuang T, Seethamchai S, Kobasa T, Limkittikul K, Cui L, et al. Differential prevalence of Plasmodium infections and cryptic Plasmodium knowlesi malaria in humans in Thailand. J Infect Dis. 2009;199:1143–50 pmid:19284284
  22. 22. Jongwutiwes S, Putaporntip C, Hughes AL. Bottleneck effects on vaccine-candidate antigen diversity of malaria parasites in Thailand. Vaccine. 2010;28:3112–7. pmid:20199765
  23. 23. Kosuwin R, Putaporntip C, Tachibana H, Jongwutiwes S. Spatial variation in genetic diversity and natural selection on the thrombospondin-related adhesive protein locus of Plasmodium vivax (PvTRAP). PLoS One. 2014;9:e110463. pmid:25333779
  24. 24. Putaporntip C, Jongwutiwes S, Sakihama N, Ferreira MU, Kho W-G, Kaneko A, et al. Mosaic organization and heterogeneity in frequency of allelic recombination of the Plasmodium vivax merozoite surface protein-1 locus. Proc Nat Acad Sci USA. 2002;99:16348–53. pmid:12466500
  25. 25. Thompson JD, Gibson T, Higgins DG. Multiple sequence alignment using ClustalW and ClustalX. Current protocols in bioinformatics. 2002:chapter 2: unit 2.3.
  26. 26. Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–80. pmid:9862982
  27. 27. Wang S, Li W, Liu S, Xu J. RaptorX-Property: a web server for protein structure property prediction. Nucleic Acids Res. 2016;44:W430–5. pmid:27112573
  28. 28. Yang Y, Gao J, Wang J, Hefferman R, Hanson J, Paliwal K, et al. Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Brief Bioinform. 2016: bbw129.
  29. 29. Kozlowski LP, Bujnicki JM. MetaDisorder: a meta-server for the prediction of intrinsic disorder in proteins. BMC Bioinformatics. 2012;13:111. pmid:22624656
  30. 30. Dosztányi Z, Mészáros B, Simon I. ANCHOR: web server for predicting protein binding regions in disordered proteins. Bioinformatics. 2009;25:2745–6. pmid:19717576
  31. 31. Librado P, Rozas J. DnaSP v5: A software for comprehensive analysis of DNA polymorphism data. Bioinformatics. 2009;25:1451–2. pmid:19346325
  32. 32. Jukes TH, Cantor CR, 1969. Evolution of protein molecules, in: Munro H.N. (Ed.), Mammalian protein metabolism. Academic Press, New York, pp. 21–132.
  33. 33. Nei M, Gojobori T. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986;3:418–26. pmid:3444411
  34. 34. Tamura K, Stecher G, Peterson D, Filipski D, Kumar S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol. 2013:30;2725–9. pmid:24132122
  35. 35. Kosakovsky Pond SL, Frost SDW. Datamonkey: Rapid detection of selective pressure on individual sites of codon alignments. Bioinformatics. 2005;21:2531–3. pmid:15713735
  36. 36. Woolley S, Johnson J, Smith MJ, Crandall KA, McClellan DA. TreeSAAP: selection on amino acid properties using phylogenetic trees. Bioinformatics. 2003;19:671–2. pmid:12651734
  37. 37. Martin DP, Murrell B, Golden M, Khoosal A, Muhire B. RDP4: Detection and analysis of recombination patterns in virus genomes. Virus Evol. 2015;1:vev003. eCollection 2015. pmid:27774277
  38. 38. Excoffier L, Lischer HE. Arlequin suite ver 3.5: A new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour. 2010;10:564–7. pmid:21565059
  39. 39. EL-Manzalawy Y, Dobbs D, Honavar V. Predicting linear B-cell epitopes using string kernels. J Mol Recognit. 2008;21:243–55. pmid:18496882
  40. 40. Oyarzún P, Ellis JJ, Bodén M, Kobe B. PREDIVAC: CD4+ T-cell epitope prediction for vaccine design that covers 95% of HLA class II DR protein diversity. BMC Bioinformatics. 2013;14:52. pmid:23409948
  41. 41. Romphruk AV, Puapairoj C, Romphruk A, Barasrux S, Urwijitaroon Y, Leelayuwat C. Distributions of HLA-DRB1/DQB1 alleles and haplotypes in the north-eastern Thai population: indicative of a distinct Thai population with Chinese admixtures in the central Thais. Eur J Immunogenet. 1999;26:129–33. pmid:10331158
  42. 42. Kadekoppala M, Holder AA. Merozoite surface proteins of malaria parasite: the MSP1 complex and the MSP7 family. Int J Parasitol. 2010;40:1155–61. pmid:20451527
  43. 43. Silmon de Monerri NC, Flynn HR, Campos MG, Hackett F, Koussis K, Withers-Martinez C, et al. Global identification of multiple substrates for Plasmodium falciparum SUB1, an essential malarial processing protease. Infect Immun. 2011;79:1086–97. pmid:21220481
  44. 44. Hughes AL. Near neutrality: leading edge of the neutral theory of molecular evolution. Ann N Y Acad Sci. 2008;1133:162–179. pmid:18559820
  45. 45. Forman-Kay JD, Mittag T. From sequence and forces to structure, function, and evolution of intrinsically disordered proteins. Structure. 2013;21:1492–9. pmid:24010708
  46. 46. Mayadas TN, Johnson RC, Rayburn H, Hynes RO, Wagner DD. Leucocyte rolling and extravasation are severely compromised in P-selection-deficient mice. Cell. 1993;74:541–54. pmid:7688665
  47. 47. Stenberg PE, McEver RP, Shuman MA, Jacques YV, Bainton DF. A platelet alpha-granule membrane-protein (GMP-140) is expressed on the plasma membrane after activation. J Cell Biol. 1985;101:880–6. pmid:2411738
  48. 48. Lin CS, Uboldi AD, Epp C, Bujard H, Tsuboi T, Czabotar PE, et al. Multiple Plasmodium falciparum merozoite surface protein 1 complexes mediate merozoite binding to human erythrocytes. J Biol Chem. 2016;291:7703–15. pmid:26823464
  49. 49. Cowman AF, Tonkin CJ, Tham WH, Duraisingh MT. The molecular basis of erythrocyte invasion by malaria parasites. Cell Host Microbe. 2017;22:232–45. pmid:28799908
  50. 50. Mello K, Daly TM, Long CA, Burns JM, Bergman LW. Members of the merozoite surface protein 7 family with similar expression patterns differ in ability to protect against Plasmodium yoelii malaria. Infect Immun. 2004;72:1010–8. pmid:14742548
  51. 51. Cui L, Yan G, Sattabongkot J, Cao Y, Chen B, Chen X, et al. Malaria in the Greater Mekong Subregion: heterogeneity and complexity. Acta Trop. 2012;121:227–39. pmid:21382335
  52. 52. Putaporntip C, Jongwutiwes S, Grynberg P, Cui L, Hughes AL. Nucleotide sequence polymorphism at the apical membrane antigen-1 locus reveals population history of Plasmodium vivax in Thailand. Infect Genet Evol. 2009;9:1295–300. pmid:19643205
  53. 53. Putaporntip C, Udomsangpetch R, Pattanawong U, Cui L, Jongwutiwes S. Genetic diversity of the Plasmodium vivax merozoite surface protein-5 locus from diverse geographic origins. Gene. 2010;456:24–35. pmid:20178839