Geographical distribution and genetic diversity of Plasmodium vivax reticulocyte binding protein 1a correlates with patient antigenicity

Plasmodium vivax is the most widespread cause of human malaria. Recent reports of drug resistant vivax malaria and the challenge of eradicating the dormant liver forms increase the importance of vaccine development against this relapsing disease. P. vivax reticulocyte binding protein 1a (PvRBP1a) is a potential vaccine candidate, which is involved in red cell tropism, a crucial step in the merozoite invasion of host reticulocytes. As part of the initial evaluation of the PvRBP1a vaccine candidate, we investigated its genetic diversity and antigenicity using geographically diverse clinical isolates. We analysed pvrbp1a genetic polymorphisms using 202 vivax clinical isolates from six countries. Pvrbp1a was separated into six regions based on specific domain features, sequence conserved/polymorphic regions, and the reticulocyte binding like (RBL) domains. In the fragmented gene sequence analysis, PvRBP1a region II (RII) and RIII (head and tail structure homolog, 152–625 aa.) showed extensive polymorphism caused by random point mutations. The haplotype network of these polymorphic regions was classified into three clusters that converged to independent populations. Antigenicity screening was performed using recombinant proteins PvRBP1a-N (157–560 aa.) and PvRBP1a-C (606–962 aa.), which contained head and tail structure region and sequence conserved region, respectively. Sensitivity against PvRBP1a-N (46.7%) was higher than PvRBP1a-C (17.8%). PvRBP1a-N was reported as a reticulocyte binding domain and this study identified a linear epitope with moderate antigenicity, thus an attractive domain for merozoite invasion-blocking vaccine development. However, our study highlights that a global PvRBP1a-based vaccine design needs to overcome several difficulties due to three distinct genotypes and low antigenicity levels.


Introduction
In 2020 the prevalence of Plasmodium vivax malaria was estimated to be between 4.1 and 5.1 million cases globally [1,2], accounting for a significant proportion of malaria cases in the African (0.3%), South-East Asia (36.3%), and Western Pacific (30.1%) [1]. Despite its global importance, P. vivax research has been neglected compared to malaria caused by P. falciparum due in part to its low directly associated mortality and the absence of a continuous in vitro culture method (a consequence of P. vivax only invading nascent CD71+ve reticulocytes). As most malaria elimination programs target P. falciparum, the proportion of P. vivax infections outside of sub-Saharan Africa continues to increase [3,4]. Failure to control P. vivax is primarily due to relapsing blood infections, emerging from activated dormant liver forms (hypnozoites) often weeks to months after the primary infection. This is compounded by a cryptic endosplenic life-cycle resulting in a persistent hidden splenic reservoir of asexual-stage P. vivax parasites in chronic infection [5,6]. Additionally, increasing reports of drug resistant vivax malaria in South-East Asia [7], have spurred efforts to develop a vaccine against P. vivax.
Most vaccine efforts against P. vivax have focused on the assumption that its erythrocytic life cycle is dependent on the presence of the Duffy Antigen Receptor for Chemokine (DARC, CD234) on the host red cell. Therefore, targeting DARC's corresponding ligand (P. vivax Duffy Binding Protein, PvDBP) is the most investigated P. vivax vaccine target. Of particular interest is the extracellular region II of the PvDBP (PvDBP-RII) which specifically interacts with the Fy6 region of DARC [8]. Although PvDBP is a promising vaccine candidate, there are several challenges. Firstly, a general lack of understanding of effective PvDBP-RII vaccine conditions such as the method of delivery, immune-reactivity, and persistence [9,10]. Secondly, a failure of the PvDBP-RII vaccine in field trials resulting from high levels of natural genetic polymorphisms [11][12][13]. Finally, the widespread discovery of Duffy-negative vivax malaria cases in Madagascar and parts of Northern Africa, suggesting P. vivax can use alternate invasion pathways besides DARC [14]. P. vivax reticulocyte binding-like protein family (PvRBL) is another promising vaccine candidate [15], consisting of eleven members including three pseudogenes [16]. Although their precise roles remain largely unknown, the RBL protein family such as PvRBP2a, PvRBP2b, and P. falciparum reticulocyte binding-like protein homologs (PfRh) show consistent functions of host erythrocyte binding activity (i.e. enabling P. vivax merozoites to identify host reticulocytes) [17,18]. The most well-known member of RBL protein family in P. vivax is PvRBP2b. This RBL protein interacts with the transferrin receptor (CD71), which is an important marker for selecting young reticulocyte [19]. Additionally, monoclonal antibodies against PvRBP2b showed inhibition of P. vivax invasion in Brazilian and Thai clinical isolates [19]. Another PvRBL family member, PvRBP1a is also shown to bind preferentially to human reticulocytes [20-24] with affinity-purified patient IgG against PvRBP1a fragment (30-778 aa. and 352-599 aa.) specifically blocking reticulocyte binding of native and recombinant PvRBP1a antigens [21,22]. These studies support the use of PvRBL proteins as attractive vaccine candidates for targeting the alternative P. vivax invasion pathways.
In this study, we analysed clinical isolates from six countries for genetic diversity and three countries for the antigenicity screening of PvRBP1a. Comparing vivax malaria patient antigenicity, genetic polymorphism, and natural selection we aimed to produce new information on the suitability of PvRBP1a as a vaccine candidate.

Ethics statement
Whole blood samples were collected from symptomatic P. vivax patients after microscopic blood film examination at regional health centres of the respective countries. All clinical samples were collected under the following ethical guidelines and approved protocols: ROK and Myanmar were amplified (Republic of Korea (ROK, n = 16) and Shwegyin in Myanmar (n = 10)). For the whole pvrbp1a gene sequencing, a total of 9 primer sets was designed based on pvrbp1a (PVP01_0701200) derived from the PvP01 Papua Indonesian reference sequence (S1 Table) [25]. The detailed whole gene sequences of pvrbp1a are available at Gen-Bank (accession numbers: MW862435-MW862460).
pvrbp1a sequences from China (n = 4), Ethiopia (n = 22), Malaysia (Sabah, n = 54), and Thailand (North-western, n = 96) were obtained from whole-genome sequencing data which were described previously at [26][27][28][29]. Briefly, the genomic region for pvrbp1a was obtained from PlasmoDB [30]. Reads that were aligned to this region were extracted from the wholegenome sequencing data (in BAM format) as FASTQ files using samtools [31]. For each sample, a de novo assembly was performed to the available reads using SPAdes [32]. From the SPAdes output, the longest contig from each sample was collected into a single FASTA file. The reference sequence for pvrbp1a was added into this FASTA file for multiple sequence alignment using MAFFT [33]. Samples that did not assemble correctly (i.e., structurally deviated from the reference sequence) were removed. After removal, there were sequences that had indels compared to the pvrbp1a reference. To validate the presence of these indels, the Genome Analysis Toolkit (GATK) [34] was used to realign these indels to the reference sequence. Samples whose indels were not consistent between the realignment and de novo assembly results were excluded from the analysis. Finally, the remaining samples were realigned with MAFFT to generate the pvrbp1a sequences used in downstream analyses. A total of 202 sequences were aligned with pvrbp1a (PVP01_0701200) sequence.

Homology-based tertiary structure modelling
The tertiary structure model of PvRBP1a was generated by homology-based structure prediction. Suitable structural templates were searched, and models were built using SWISS-MODEL (https://swissmodel.expasy.org) (PDB ID: 6d03.1) [35]. The quality and potential errors were evaluated by Ramachandran plots [36] and ERRAT [37]. The generated PvRBP1a initial model was refined using Galaxy Refine to improve accuracy [38]. The Root-mean-square deviation (RMSD) and Z-score were measured by DALI server (http://ekhidna2.biocenter.helsinki.fi/ dali). The RMSD measures the structural differences between the aligned alpha-carbon positions. The crystallographic model of proteins with about 50% sequence identity differs by about 1 Å. The Z-score is the distance, in standard deviations, between the observed alignment RMSD and the mean RMSD for random pairs of the same length, with the same or fewer gaps. Similarities with a Z-score lower than 2 are spurious. The structure was visualized using UCSF CHIMERA software [38].

Nucleotide diversity and natural selection
Pvrbp1a nucleotide diversity (π) is defined as the average number of nucleotide differences per site between two sequences. The number of polymorphic sites, haplotypes, and haplotype diversity (Hd) were calculated using DnaSP software [27,39]. Intra-species tests for evidence of selection (non-random evolution) were evaluated using Tajima's D, Fu and Li's D � , and Fu and Li's F � . Under neutrality (random evolution), Tajima's D is expected to be 0. Significant positive values of Tajima's D indicate a departure from random mutation that may reflect population contraction or balancing selection, whereas significant negative values may reflect population expansion after a recent bottleneck or a recent selective sweep [40]. Significant positive values of Fu and Li's D � and Fu and Li's F � tend to reflect population contraction due to selection, while negative values reflect population expansion and excess of singletons [41]. Additionally, Nei and Gojobori's method was used to calculate the intra-species (within P. vivax) proportion of synonymous substitutions per synonymous site (dS) and non-synonymous substitutions per non-synonymous site (dN) with 1,000 pseudo replication bootstraps using MEGA 7 software. A dN-dS greater than zero reflect positive selection, while a dN-dS value less than zero reflects purifying selection. Lastly, the inter-species (between P. cynomolgi) McDonald-Kreitman (MK) test, which compares the amount of variation within a species to the divergence between species. The MK test was computed using the P. cynomolgi rbp1a Breok strain (JQ422037) as an out-group species using DnaSP software.
Pvrpb1a haplotype diversity was evaluated using DnaSP software, and clustering patterns were assessed and illustrated using a median-joining method in Network 5.0 software. To check the robustness of the network, the haplotype tree was drawn by the maximum likelihood method and robustness was estimated by the bootstrap method with 1,000 pseudo replicates as implemented in the MEGA7.

Recombinant protein expression
The recombinant PvRBP1a (PVP01_0701200) was expressed with two fragments for antigenicity screening. The PvRBP1a-N (157-650 aa.) was selected based on the tertiary structure homolog regions of the PvRBP2b binding domain. PvRBP1a-C (606-962 aa.) was chosen because of the sequence conserved domain within pvrbp1a. The recombinant PvRBP1a-N and PvRBP1a-C fragments were produced by HEK293E (HEK293 EBNA1-6E) cell-based protein expression systems. Briefly, synthesized pvrbp1a genes which were codon-optimized and mutated in predicted N-glycosylation sites were amplified using a specific primer set (S1 Table) and cloned into expression vector pTT5-MCS-His using In-Fusion Cloning kit (Clontech, Mountain View, CA, USA). For protein expressions, we used HEK293 EBNA1-6E cellbased transiently transfection systems. After five days of transfection, secreted proteins were collected from the culture supernatant. The expressed proteins were purified by Ni-NTA agarose (QIAGEN) with 250 mM imidazole, respectively. Each recombinant protein (3 μg/lane) was prepared with 5x reducing buffer and separated by 13% SDS-PAGE. After electrophoresis, SDS-PAGE gels were stained with Coomassie brilliant blue (Sigma-Aldrich).

Antigenicity evaluation
Protein microarray was performed for evaluating total antigenicity using recombinant PvRBP1a-N and PvRBP1a-C. Three aminopropyl-coated slides were prepared as described previously [42]. The slides printed with optimised concentration of recombinant protein (PvRBP1a-N, 100 ng/μl and PvRBP1a-C, 50 ng/μl) to each spot were incubated for 2 hours at 37˚C, and these slides were blocked with blocking buffer (5% BSA in PBS-T, 0.1% Tween 20) for 1 hour at 37˚C. Healthy and vivax malaria patient sera were diluted in PBS-T to 1:25 ratio and treated on the chip for 1 hour at 37˚C. The arrays were visualized with 10 ng/μl of Alexa Fluor 546 goat anti-human IgG (Invitrogen, Carlsbad, CA, USA) in PBS-T for 1 hour at 37˚C and scanned using InnoScan 300 (INNOPSYS, France). The positive cut-off values calculated by negative control mean fluorescence intensity (MFI) plus two standard deviations.
Additionally, antigens were heat-treated at 80˚C for 5 minutes to characterise the epitope. The linearized antigen also used for the microarray as described above.

Statistical analysis
The antigenicity data were analysed using GraphPad Prism (GraphPad Software, San Diego, CA, USA) and SigmaPlot (Systat Software Inc., San Jose, CA, USA). For the protein array, the Student's t-test was used to compare the experimentally measured values of each group. The correlation of clinical information with antigenicity was calculated by Pearson correlation test (ρ). Differences of p < 0.05 were considered significant. The total IgG reactivity index was calculated by each MFI divided by average negative MFI of PvRBP1a-N (157-650 aa.) and PvRBP1a-C (606-962 aa.), respectively, for normalization and reactivity comparison.

Pvrbp1a gene sequence characteristics and tertiary structure prediction
The pvrbp1a (PVP01_0701200) genomic DNA contains 8,704 bp in length on chromosome 7 with one intron and two exons. The open reading frame contain 8,502 bp (2,833 aa.) in length with 326 kDa predicted molecular weights (Fig 1A). The pvrbp1a coding sequence was divided into six segments for detailed characterisation in this study using the following criteria: (1) specific feature domains such as signal peptide and transmembrane domain, (2) homology-based prediction domain which showed binding activity with host cell in other RBLs, (3) polymorphic and conserved sequence. Region I (1-151 aa.) contained signal peptide (1-22 aa.) and unidentified domain with other RBLs. Region II (152-458 aa., head structure) and region III (459-625 aa., tail structure) were divided by protein tertiary structure prediction with PvRBP2b model (PDB ID: 6d03.1) (Fig 1B). Region IV (626-1441 aa.) contained a conserved sequence region and Region V (1442-2771 aa.) was a moderate polymorphic region within pvrbp1a sequence from 202 clinical isolates in this study. Region VI (2772-2833 aa.) contained transmembrane (2772-2809 aa.) and intracellular domains (2810-2833 aa.) (Fig 1A).
The protein tertiary structure was predicted by pvrbp1a (PVP01_0701200) sequence and the model was best-matched to PvRBP2b binding domain (PDB ID: 6d03.1, 168-633 aa.) ( Fig  1A and 1B). The PvRBP1a model Ramachandran plot revealed 91.9% of the favourable regions and 0.4% of disallowed regions in the initial model. The overall quality factor verified by ERRAT showed 91.3%. After refinement, overall structure accuracy was improved to 96.3% of the favourable region and 0.4% of the disallowed region in the Ramachandran plot and to 98.06% of ERRAT. This refined final tertiary structure was allowed as a PvRBP1a prediction model and performed structural alignment with PvRBP2b. The structure was superimposed to 0.6 Å of RMSD and 38.8 of Z-score which reflect that PvRBP1a and PvRBP2b closely resembled ( Fig 1B). The PvRBP1a predicted structure model has a clearly divided head (PvRBP1a-RII) and tail (PvRBP1a -RIII) domains based on the PvRBP2b structure which is the reticulocyte binding domain ( Fig 1B).
Additionally, PvRBP1a was commonly aligned with conserved hydrophobic regions to other PvRBPs and may have a role in maintaining the proper architecture and specific functions of these proteins [18]. Overall, PvRBP1a-RII and PvRBP1a-RIII were presumed to be the most important domain in eliciting polymorphisms with a conserved conformation.

Genetic diversity and natural selection of pvrbp1a
Across 202 isolates from 6 countries, a total of 139 polymorphic sites, including 31 synonymous and 108 non-synonymous sites, were observed in pvrbp1a. A high number of non-synonymous mutation may affect antibody recognition according to changes in the protein tertiary structure. The highest nucleotide diversity was observed in Thailand (π ± S.D., 0.00204 ± 0.00007), followed by Ethiopia (0.00171 ± 0.00012), ROK (0.00166 ± 0.00022), Myanmar (0.00159 ± 0.00012), Malaysia (0.00148 ± 0.00011), and China (0.00120 ± 0.00012). However, this result was considered a tendency of high nucleotide diversity according to the sample size which is reflected the gene can be more varied in each country based on sample size. The population-wide nucleotide diversity was 0.00196 ± 0.00004 (π ± S.D.) ( Table 1). To investigate whether the allele frequencies at the polymorphisms in pvrbp1a reflected evidence of selection (departures from random mutation), several tests were performed at both intra-  (Table 1).
On investigation of the six separate pvrbp1a coding sequence regions, the highest nucleotide diversity was observed in PvRBP1a-RIII (tail structure, 0.01290 ± 0.00029), followed by PvRBP1a-RII (head structure, 0.00453 ± 0.00012) (Fig 2 and (Tables 2 and 3). Additionally, no significant values were calculated in the inter-species (between P. cynomolgi) McDonald-Kreitman (MK) test (NI = 1.069) ( Table 3). Even all the calculations had weak evidence, all values indicated selection pressure. Overall, the PvRBP1a-RIII domain (tail structure) departure from neutrality was balancing selection that divides the population and contracted within the population.
There was also no significant evidence of selection at the PvRBP1a-RII fragment with Tajima's D (0.18145), Fu and Li's D � (-0.96778), and Fu and Li's F � (-0.60847) ( Table 2). However, the intra-species (within P. vivax) dN-dS calculation (2.225, p < 0.05) indicated significant positive selection ( Table 3). The inter-species (between P. cynomolgi) MK test was greater than 1 in PvRBP1a-RII, indicative of purifying selection (Table 3). Taken together, both PvRBP1a-RII and PvRBP1a-RIII showed high levels of nucleotide diversity that mainly occurred by nonsynonymous mutation with weak evidence of balancing selection pressure within species. However, PvRBP1a-RI and PvRBP1a-RIV, which surround the head and tail structure   (Table 2). Additionally, the conserved sequence identification was performed by DNAsp software using 202 isolates. The result showed that total of 14 conserved regions were revealed (S2 Table). The haplotype networks using sequence combined across the most polymorphic regions, PvRBP1a-RII+RIII (152-625 aa.), were performed using the Median-joining method. The RII +RIII domains consisted of a total of 82 haplotypes which segregated into three distinct clusters (Fig 3 and S1 Fig). A cluster 1, comprised isolates from across South-East Asia (Malaysia, n = 29; Thailand, n = 58; Myanmar, n = 4) and Africa (Ethiopia, n = 8). Cluster 2 appeared to connect with cluster 1 and predominantly comprised isolates from North-East Asia (ROK, n = 15; China, n = 4) and Africa (Ethiopia, n = 14), with little representation from South-East Asia (Malaysia, n = 1; Thailand, n = 12; Myanmar, n = 1). Cluster 3, which also appeared to connect with cluster 1, mainly comprised South-East Asian Asia (Malaysia, n = 24; Thailand, n = 26; Myanmar, n = 5) isolates and one isolate from ROK. The Papua Indonesian PvP01 reference pvrbp1a sequence (PVP01_0701200) comprised haplotype 1, located in cluster 3.

Discussion
The RBL protein family including PvRBP1a, are essential ligands used by P. vivax merozoites for the identification and invasion of human reticulocytes, as such they are important potential vaccine candidates [15]. To date studies of PvRBP1a have focused mainly on its reticulocyte binding activity using a limited set of recombinant PvRBP1a fragments (S3 Table) [22,23]. Of particular note, the PvRBP1a recombinant fragment (352-599 aa.) induced antibody, abrogated 40% of P. vivax merozoite invasion activity [21]. While the results indicate that PvRBP1a is a promising novel vaccine candidate, prior to this study there was limited knowledge about the PvRBP1a genetic diversity and antigen characteristics. In our study, PvRBP1a-RII and PvRBP1a-RIII (152-625 aa.), located in the N-terminal, were revealed as the most polymorphic regions in the pvrbp1a gene. This genetic polymorphism pattern is similar to other RBL protein family members such as PvRBP2a, PvRBP2b, and PfRh5 [18,19,43]. Interestingly, the highly polymorphic PvRBP1a-RII+RIII region overlaps with the RBL domain, showing host cell-binding activity within the RBL family. This domain of PvRBP1a has also been shown to have reticulocyte binding activity (S3 Table) [20,21,23]. Genetic polymorphisms are a major hurdle for vaccine development as variation can alter the epitope expression, resulting in loss of vaccine efficacy [44]. The most advanced P. vivax vaccine candidate to date is PvDBP-RII (268-513 aa.). The pvdbp gene has been show to exhibit high genetic diversity in Malaysia (π = 0.01000), Brazil (π = 0.01200) [45], and Myanmar (π = 0.00400-0.00600), as well as evidence of positive selection pressure [12,[46][47][48][49]. The PvDBP-RII fragment is a functional domain containing 12 conserved-cysteine residues, with a conserved protein structure and binding functions to DARC [50]. Our study revealed that the PvRBP1a-RII+RIII domains (PvRBP1a-N) have comparable nucleotide diversity (π = 0.00748 in worldwide isolates) to PvDBP-RII. Although these domains showed weak evidence for natural selection using intra-species tests, evidence of balancing selection was observed with high level of non-synonymous mutation. However, the inter-species MK test (NI = 2.149) was indicated purifying selection. Taken together with the haplotype distribution result, PvRBP1a-RII +RIII is divided into three different genetic clusters. Thus, these three clusters should be considered for PvRBP1a based vaccine development in the future.
Interestingly, the PvRBP1a-RI and PvRBP1a-RIV domains showed strong evidence for directional selection pressure in our study. Both regions exhibited an excess of rare haplotypes (observed once), reflective of a possible population expansion or selection sweep. However, these gene regions (PvRBP1a-C) have low antigenicity, limiting their utility as vaccine candidate targets.
The antigenicity screening results reflect the correlation of PvRBP1a genetic diversity and natural selection properties. PvRBP1a-N (RII+RIII, 157-650 aa.) is a highly polymorphic region mainly caused by random point mutations and could be divided into three genetic clusters. Because the PvRBP1a-N has a linear epitope, conserved epitope sequence residue possibly elicits humoral immune response with up to 46.7% sensitivity. On the other hand, the conformational epitope in PvRBP1a-C (RIV partial, 606-969 aa.) showed lower antigenicity (17.8%) than PvRBP1a-N caused by strong directional selection with population expansion even consist with relatively conserved sequence. In comparison with PvDBP-RII and PvEBP-RII, the same methods for antigenicity screening with this study showed 56.9% and 16.1% sensitivity [23,51]. These data show that PvRBP1a-N elicits moderate level of antigenicity compared with other essential microneme ligands, even it is containing three different genotypes. An effective vaccine should include sequence residues that sufficiently induce the host immune responses and broadly covers the existing antigenic diversity.
To facilitate PvRBP1a based vaccine development, the relationship between sequence conserved regions (S2 Table) and functional studies of PvRBP1a fragment (S3 Table) were compared. Among the PvRBP1a fragments which showed the ability to bind to reticulocytes, the C2 (409-444 aa.) and C3 (473-507 aa.) regions were commonly overlapped, especially the C3 region, which might have important roles to bind reticulocyte when compared to previous report (S3 Table). Furthermore, the antibody against PvRBP1a fragment containing C3 regions showed binding [22] and invasion inhibition activity [20]. Therefore, C3 regions in the PvRBP1a-N (RII+RIII) can be considered attractive target regions for vaccine development.
Although PvRBP1a is a novel candidate for an alternative invasion pathway blocking vaccine, correlation analysis of genetic diversity and antigenicity screening revealed several obstacles. The low antigenicity due to high genetic polymorphism in a particular interest domain, PvRBP1a-RII+RIII will need to be considered for PvRBP1a based vaccine development.
Supporting information S1