Phlebotomus papatasi sand fly predicted salivary protein diversity and immune response potential based on in silico prediction in Egypt and Jordan populations

Phlebotomus papatasi sand flies inject their hosts with a myriad of pharmacologically active salivary proteins to assist with blood feeding and to modulate host defenses. In addition, salivary proteins can influence cutaneous leishmaniasis disease outcome, highlighting the potential of the salivary components to be used as a vaccine. Variability of vaccine targets in natural populations influences antigen choice for vaccine development. Therefore, the objective of this study was to investigate the variability in the predicted protein sequences of nine of the most abundantly expressed salivary proteins from field populations, testing the hypothesis that salivary proteins appropriate to target for vaccination strategies will be possible. PpSP12, PpSP14, PpSP28, PpSP29, PpSP30, PpSP32, PpSP36, PpSP42, and PpSP44 mature cDNAs from field collected P. papatasi from three distinct ecotopes in the Middle East and North Africa were amplified, sequenced, and in silico translated to assess the predicted amino acid variability. Two of the predicted sequences, PpSP12 and PpSP14, demonstrated low genetic variability across the three geographic isolated sand fly populations, with conserved multiple predicted MHCII epitope binding sites suggestive of their potential application in vaccination approaches. The other seven predicted salivary proteins revealed greater allelic variation across the same sand fly populations, possibly precluding their use as vaccine targets.


Introduction
Leishmaniasis is a group of neglected diseases caused by Leishmania parasites, vectored by phlebotomine sand flies and endemic in 98 countries [1]. Different Leishmania species are uniquely associated with distinct clinical outcomes, ranging from cutaneous lesions to fatal visceral disease. Leishmania major, one of the causative agents of cutaneous leishmaniasis (CL), is transmitted by the bites of Phlebotomus papatasi, P. bergeroti, and P. duboscqi sand flies [2]. Approximately 0.7-1.2 million cases of CL occur each year [1]. CL produces scarring skin lesions and current treatments can be toxic, expensive, require multiple administrations, and can be difficult to access [2]. Although significant effort has been expended, there currently is no efficacious vaccine for human populations.
The salivary proteins of hematophagous arthropods are pharmacologically active molecules that modulate host inflammation, vasoconstriction, and blood clotting [3]. In P. papatasi infected with Le. major, parasites are regurgitated into the host's skin during probing or feeding, along with a cocktail of salivary proteins, where an infection can be established. Pre-exposure to sand fly salivary proteins can exacerbate or attenuate Leishmania infections in animal models [4][5][6], and have been suggested as potential as vaccine candidates [7,8].
It has been previously determined that exposure to uninfected P. papatasi bites confers some level of protection against Le. major in murine models, presumably via stimulation of a delayed-type hypersensitivity immune response at the site of inoculation [9]. One particular salivary protein, PpSP15, induces a Th-1 mediated immune response with a hallmark increase in IFN-γ in mice [5], and vaccination of nonhuman primates with the P. duboscqi orthologous PdSP15 resulted in a significant decrease in parasite load and lesion size, though full protection was not established [7]. Conversely, in naïve hosts, sand fly saliva exacerbates disease progression by downregulating the host's immune response while polarizing the immune response to favor Th2 cytokine production [7,[10][11][12].
Genetic variability among populations of sand flies will influence the success of any salivary protein-based vaccine. Specifically, highly polymorphic salivary proteins and those under positive selection should be cautiously considered for further vaccine development. Previously, P. papatasi salivary protein 15 (PpSP15), sampled from field populations demonstrated minimal selection with a high degree of conservation at the predicted amino acid level validating PpSP15 as a potential vaccine candidate [13,14]. Here we analyzed the genetic variability in mRNA and in the predicted peptide sequences of nine highly expressed P. papatasi salivary proteins [15,16], including PpSP12, PpSP14, PpSP28, PpSP29, PpSP30, PpSP32, PpSP36, PpSP42, and PpSP44, from three P. papatasi populations from Egypt and Jordan.
As a geographically widespread species, P. papatasi is prevalent in the Mediterranean Basin, especially the Middle East and North Africa, with an ability to adapt to a variety of habitats that exhibit different climates, elevations, vegetation, and host species. As a result of adaptation to ecological variation, it is expected that sand flies and their salivary proteins face selective pressures that could influence vector competency and disease outcomes [17]. P. papatasi population genetics studies have demonstrated that although pockets of genetic variability exist between populations, evidence suggests that the species as a whole remains relatively homogeneous [18][19][20][21][22][23].
It remains vitally important to monitor salivary protein genetic variability to ensure the most appropriate salivary proteins are chosen as vaccine targets. A successful sand fly salivary protein-based vaccine to combat CL also depends on expression profiles and human (host) immune response to these salivary proteins, ideally selected from geographically distant sand fly populations. Egypt and Jordan are endemic for CL, with certain regions designated as hyperendemic in Jordan, with Le. major causing the majority of CL cases [24,25]. We previously assessed the expression variability of 10 salivary gland genes [14,26] and the genetic variability of PpSP15 [14] of specimens inhabiting distinct ecotopes in Egypt and Jordan and throughout the sand fly season in each habitat. The purpose of this study was to analyze the genetic variability present in the nine P. papatasi salivary genes from the previous expression study [26] as potential vaccine targets that are conserved across populations and demonstrate the potential for the proteins to elicit an immune response, similar to what we reported for PpSP15 [14]. We recommend that PpSP12 and PpSP14 be considered for further vaccine development as we demonstrate that they are conserved across populations while also have potential to elicit an immune response. We caution of the use of a highly variable protein like PpSP28 for further development. Sand flies remained alive until dissection and were euthanized in soapy water. Flies were individually identified by microscopic examination of female spermathecae according to Lane [27], and only non-parous females were used in the analysis presented. Parity was assessed according to Anez [28]. Map in Fig 1 was created  P. papatasi from Aswan (PPAW) were collected from a small village near the Nile River that permits artificial irrigation for the cultivation of crops like corn (Zea mays), wheat (Triticum aestivum), mangoes (Mangifera indica), and date palms (Phoenix dactylifera). Dogs, goats, and cattle are kept and raised in the village as well. The village sits at 117 m above sea level. Temperatures typically fall between 24˚C and 45˚C with minimal rainfall. Sand flies are abundant in this village though Le. major is absent [30]. P. papatasi collected from Swaimeh (PPJS) inhabit an area endemic for zoonotic Le. major due to the presence of Psammomys obesus, the reservoir host [31]. This low elevation area (~350m below sea level) experiences a Saharan Mediterranean climate with rainfall less than 50mm that occurs from November to April. Temperatures maximally range from 35-40˚C in summer months and minimally range from 8-12˚C in the winter. The sandy, rocky, salty soil supports halophytic and tropical flora species such as chenopods [32]. P. papatasi collected from Malka (PPJM) inhabit a rocky landscape with a typical Mediterranean climate. Malka is located at an elevation of 670 m. During the collection time in 2006, only Le. tropica was present in the region and Le. major was absent hypothesized due to the absence of Ps. obesus [33].

Sample preparation
Dissected, P. papatasi female heads with both salivary glands intact were placed in 1.5 ml centrifuge tubes with 50 μL RNA later (Ambion, Austin, TX, USA) and homogenized with an RNAse-free pestle and hand-held homogenizer. Samples were stored at 4˚C for up to 48 hours, shipped on dry ice, and then stored at -80˚C until analyzed. Table 1 outlines the number of individuals from each site for each salivary protein.

RNA extraction and cDNA synthesis
Total RNA was extracted from the heads and salivary glands of individual P. papatasi samples using the RNeasy Mini RNA isolation kit (Qiagen, Valencia, CA, USA). cDNAs were synthesized using Invitrogen reagents (Invitrogen, Carlsbad, CA, USA), per manufacturer's specifications as we previously performed [14,26].

Sequence analyses
cDNAs produced from the total RNA of individual P. papatasi were amplified by PCR. Primers used to amplify each salivary transcript can be found in S1 Table. PCR products were purified by twice washing in 150 μL DNAse/RNAse-free water (Invitrogen, Carlsbad, CA, USA) in Multiscreen PCR cleaning plates (Millipore, Burlington, Massachusetts, USA) with vacuum application (10 psi). Purified PCR products were resuspended in 50 μL sterile water. Leading and lagging strands were sequenced and poor-quality sequences were excluded from the analyses. Forward and reverse chromatograms were inspected and consensus sequences were aligned using MEGA [34] and manually corrected. The resulting sequences were deposited in GenBank (http://ncbi.nlm.nih.gov) and accession numbers can be found in S1 Table. Multi-copy assessment of salivary proteins We looked for copy number variation relative to currently assembled P. papatasi loci by following the approach of Miles et al. [35]. In short, paired sequences from the two individual entries with the most reads (SRR1997534 and SRR199776) were downloaded from SRA. After initial overall quality checking using FastQC [36], these paired reads were then aligned to the Ppap reference assembly using BWA 0.5.9r16 [37]. Next, based on the resulting alignments (SAM output), reads were placed into non-overlapping 300 bp bins such that each bin contained all reads whose alignment started in its corresponding 300 bp interval. Unlike Miles et al. [35] there were no known "core genome" coordinates that excluded repeat regions for computing a less biased average count for normalization. Therefore, discrete counts in each bin were normalized based on the average count across all scaffolds, median count, and average count excluding terminal 2 kb of scaffolds (1 kb on 5' and 3' of scaffolds). Although normalization slightly differed, the results in terms of under (less than 0.5 average/median) and over (more than 2 average/median) remained the same within and between the two samples considered. Because of under assembly of heterologous regions, the final normalization value was computed based on the empirical distribution of read bin counts with the value with the greatest number of entries near the computed overall average. Significance was derived using a Poisson model parameterized with this estimate as lambda using the Lander-Waterman model of sequence sampling [38], and a Bonferroni correction was applied to correct for multiple comparisons.

Population analyses
Both interpopulation and intrapopulation analyses were performed using DnaSP v.6 [38]. Interpopulation parameters assessed included: fixation indexes such as Fst [39,40] and Gst [41], as well as Hs and Ks indexes [42]. Other parameters assessed included: neutral evolution hypothesis [43] and neutrality tests Tajima's D [44] and Fu and Li's D and F [45]. The Ka/Ks ratio (ω) for the predicted salivary protein as well as a sliding window analysis of 70 codons each was calculated. Weblogos [46] pictorially depict the relative frequencies of polymorphic nucleotides and amino acids. The height of the bases indicate relative frequency and conservation is depicted by the overall weight of the stack. Network 5 [47] generated median joining networks exhibiting haplotype relationships.

Secondary structure and T-cell epitope predictions
Secondary structure predictions for each salivary protein were generated using a secondary structure prediction tool (http://bioinf.cs.ucl.ac.uk/psipred) with default parameters based on the consensus sequence for all individual amino acid sequences from DnaSP. Two different predictions tools predicted the promiscuous HLA-class II binding sites and human T-cell epitopes: IEDB analysis resource T-cell epitope prediction tools (http://tools.immuneepitope.org/ main/html/tcell_tools.html) [48,49] and ProPred MHC class II binding prediction server (http://www.imtech.res.in/raghava/propred/) [50]. For the 51 HLA alleles tested in ProPred, thresholds included a promiscuous search set to 3%. For the 27 HLA alleles tested in IEDB, only predicted peptides with a Consensus percentile rank of 0.10 or below are included as the top 10% of peptides with the strongest predicted binding affinity.

Results
Using our previously published PpSP15 data [14] as a guide, herein, we present data for PpSP12, PpSP14, and PpSP28 as a representative predicted salivary proteins with PpSP29, PpSP30, PpSP32, PpSP36, PpSP42, and PpSP44 data provided in the supplemental materials.

PpSP12 in-depth analyses
Nucleotide and amino acid genetic diversity. The 291 bp PpSP12 fragment produced 14 polymorphic sites (Fig 2A). Two specific nucleotide positions indicate limited heterogeneity between the Jordan populations compared to the Egypt population. Position one shows conservation of adenine in both Jordan populations with variation present in the Egypt population. Conversely, in position 13 the conserved frequency of adenine is greater in the Aswan population in comparison to both Malka and Swaimeh. All of the populations present similar heterogeneity at positions 6-9. Although heterogeneity exists in PpSP12, it is the lowest when comparing all 9 salivary proteins. The translated PpSP12 amino acid sequence has 6 variable positions out of 97 total amino acids (Fig 2B). At position 2, arginine and lysine are both found in all populations. The frequency of alanine and proline are relatively equal for all populations in position 4. Both of these amino acids are small in size, nonpolar, and hydrophobic. At position 3 and 5, the relative frequencies of lysine and asparagine are the same except the Aswan population at position 5 has a much higher frequency of lysine.
Population genetics analysis. A total of 96 cDNA sequences were analyzed for PpSP12 from Aswan (n = 26), Malka (n = 29), and Swaimeh (n = 41). Twenty-nine haplotypes were identified with 14 variant sites. Of the 29 haplotypes, 22 were found in only one of the geographic study sites and 6 of the 7 shared haplotypes were found in 2 populations. One haplotype, H_1, was present in all 3 populations and was the most common haplotype. The Aswan, Egypt, population had 5 unique haplotypes (H_2, H_3, H_5, H_8, H_9) with 3 of those being private haplotypes (H_5, H_8, H_9). The Malka, Jordan, population exhibited 9 unique haplotypes (H_10, H_13, H_14, H_15, H_16, H_18, H_19, H_20, H_21) with 6 of the 9 being private haplotypes (H_10, H_14, H_15, H_18, H_19, H_20). The Swaimeh, Jordan, population demonstrated 8 unique haplotypes (H_22 to H_29) with 3 private haplotypes (H_22, H_24, H_25). A variety of population genetics parameters were assessed ( Table 2) indicating genetic homogeneity for PpSP12 across the three populations. Tajima's D and Ka/Ks analysis indicated that this protein is not undergoing positive selection but rather it is either neutral or possibly experiencing purifying selection (Table 2). Furthermore, population structure is not indicated as Fst uncovered little genetic variability in pairwise comparisons ( Table 3). The PpSP12 median-joining network does not demonstrate any notable clustering separating the different populations from one another (Fig 3). Although there are 29 total haplotypes, the haplotypes are differentiated from one another by only one mutation. The Ka/Ks ratio, a diversifying selection index, was 0.293 or less across the sliding window analysis of the protein for all populations indicating purifying or stabilizing selection of this protein (Table 4).
Secondary structure & T-cell epitope predictions. The predicted amino acid sequence for PpSP12 exhibited only one polymorphic site (R60) that was found in an α-helix whereas the other six polymorphic sites were found in predicted coils (D57, K74, A77, K119, N122) (Fig 4). All 6 polymorphic sites are found in predicted MHC II T-cell epitope binding sites though this should not interfere with the potential for T-cell activation as the polymorphic sites are found in the middle of the predicted binding sites and surrounded by conserved regions. Of the 140 amino acids included in this analysis, 95 amino acids were predicted to be potential epitope recognition sites. The areas of the amino acid sequence with the highest  predicted binding affinities occur between the lysine residue at position 2 (K2) and the proline residue at position 23 (P23) as well as the tyrosine residue at position 110 (Y110) and the asparagine residue at position 138 (N138). Of the 78 total HLA alleles tested using the two software tools, all 51 alleles from ProPred identified potential binding sites though certain alleles, such as DRB1_03, DRB1_11, and DRB1_13, had greater predicted binding affinities than the others. The alleles with the strongest binding affinity potential identified by IEDB software included DQA1_0401/DQB1_0402, DPA1_0103/DPB1_0201, and DRB1_0301. The DQA1/DQB1 and DPA1/DPB1 alleles demonstrated a greater affinity for residues between K2 to A20 and DRB1_0301 demonstrated a greater affinity for Y110 to F126, bookending the mature PpSP12.

PpSP14 in-depth analyses
Nucleotide and amino acid genetic diversity. The 246-bp PpSP14 fragment produced 23 polymorphic sites (Fig 5A). Similar to PpSP12, there exists limited heterogeneity between the populations studied. Position 11 demonstrates variation in all 3 populations with equal representation of adenine and guanine. In position 17, the Jordan populations have roughly equal rates of cytosine and guanine but the Egypt population has cytosine in the majority of individuals. Guanine dominates at position 21 in both Jordan populations but is equally represented with cytosine in the Egypt population. The remaining 20 polymorphic sites present similar levels of heterogeneity across all 3 populations. The translated PpSP14 amino acid sequence has 14 variable positions out of 82 total amino acids (Fig 5B). At position 1 and 3, leucine, valine, and isoleucine are all easily substituted for one another since they are hydrophobic and prefer to be buried in the protein core. In position 5, the Aswan, Egypt, population demonstrates limited substitution of the asparagine amino acid with lysine, both polar amino acids. Lysine and arginine are relatively equal for all populations at position 8. Serine and threonine were present at position 13, but more threonine is found in the Egypt population compared to the Jordan populations. At position 10, threonine and alanine substitutions are found in all populations but are more frequent in the Egypt population. At position 11, threonine is substituted with isoleucine in both Jordanian populations.
Population genetics analysis. A total of 119 cDNA sequences were analyzed for PpSP14 from PPAW (n = 29), PPJM (n = 44), and PPJS (n = 46). Thirty-eight haplotypes were identified with 23 variant sites. Of the 38 haplotypes, 25 were found in only one of the geographic study sites, 8 were shared between PPAW/PPJM or PPJM/PPJS, but none were shared by PPAW/PPJS. Five haplotypes were present in all three populations (H_1, H_5, H_8, H_10, H_13), with H_5 the most common haplotype. PPAW had 6 unique haplotypes (H_3, H_7, H_9, H_11, H_12, H_14) with 1 of those designated a private haplotype (H_3). PPJM had 8 unique haplotypes (H_16, H_18, H_21, H_22, H_23, H_25, H_26, H_27) with 6 of the 8 being private haplotypes (H_16, H_18, H_22, H_23, H_25, H_26). PPJS had 11 unique haplotypes (H_28, H_29, H_30, H_31, H_32, H_33, H_34, H_35, H_36, H_37, H_38) with 6 of the 11 being private haplotypes (H_31, H_32, H_33, H_36, H_37, H_38). The population genetics assessment indicates genetic homogeneity for PpSP14 across the 3 populations (Table 5). There is the potential for population structuring as Fst demonstrated moderate genetic differentiation between PPAW and PPJM (0.10771) and PPAW and PPJS (0.09091) and little genetic differentiation between PPJM and PPJS (0.02346) ( Table 6). The PpSP14 median joining network does not demonstrate any significant clustering separating the different populations from one another (Fig 6). The PPJS haplotypes might be clustering together as compared to PPAW and PPJM but the 38 haplotypes are differentiated from one another by only one mutation. The only exception being haplotypes H_7 and H_12, both from PPAW are  (Table 7). Secondary structure & T-cell epitope predictions. The predicted amino acid sequence for PpSP14 has 6 polymorphic sites (L65, K79, A85, K88, V91, T92) that were predicted to be in α-helices whereas the other 8 polymorphic sites were found in predicted coils (L41, F45, I57, N73, T100, P101, S114, L118) (Fig 7). Twelve of the 14 polymorphic sites were found in predicted MHC II T-cell epitope binding sites. This variability may affect the predicted binding sites found between amino acids L41 and K58 and between amino acids L87 and C103, as the variable sites were found at the beginning and end of the fragment. There are no polymorphic sites found between amino acids M1 and F19. The variable sites found between H61 and A77 and I106 and T134 were found in the middle of the amino acid fragment and should not hinder binding. Of the 142 amino acids included in this analysis, 100 amino acids were predicted to be potential epitope recognition sites. The software prediction tools, IEDB and ProPred, agree that the areas of the amino acid sequence with the highest predicted binding affinities occur between methionine residue at position 1 (M1) and the phenylalanine residue at position 19 (F19) as well as the isoleucine residue at position 106 (I106) and the threonine residue at position 134 (T134). Similar to PpSP12, all 51 alleles from ProPred identified potential binding sites, particularly between residues M1 and F19. Alleles DRB1_04XX, DRB1_08XX, DRB1_11XX, DRB1_13XX, and DRB1_15XX had the highest binding affinities overall. To a lesser extent, the following alleles were also identified DRB1_010X, DRB1_030X, DRB1_070X, and DRB5_010X. The alleles with the strongest binding affinity potential

PpSP28 In-depth Analyses
Nucleotide and amino acid genetic diversity. The 651-bp PpSP28 fragment produced 95 polymorphic sites (Fig 8A). Approximately, 61% of the polymorphic sites are transition substitutions and 26% are transversions. The remaining 12% of polymorphic sites are mostly conserved as the number of substitutions are so few. Positions 50 and 90 have three possible options at this site, each site a different combination of guanine, cytosine, adenine, or thymine. The translated PpSP28 amino acid sequence has 53 variable positions out of 184 total amino acids ( Fig 8B). Eighteen of the variable sites demonstrate limited heterogeneity while the other 35 sites demonstrate significant variation between the populations and an abundance of amino acid substitutions. PpSP28 exhibits the greatest nucleotide and amino acid sequence variability of all 9 salivary proteins studied (Fig 8).
Population genetics analysis. A total of 111 cDNA sequences were analyzed for PpSP28 from Aswan (n = 26), Malka (n = 30), and Swaimeh (n = 55). Ninety-five variant sites were identified in 122 haplotypes. Swaimeh, Jordan, tallied the most unique haplotypes (62) with 46 identified as private. Malka, Jordan, had 21 unique haplotypes with 16 identified as private. Aswan, Egypt, totaled 30 unique haplotypes with 24 identified as private. One haplotype was shared by all 3 populations (H_1) and 9 haplotypes were shared by 2 populations. As with PpSP12 and PpSP14, various population genetics parameters were assessed (Table 8) indicating heterogeneity among the populations. Although significant variation is present in PpSP28, the analyses do not indicate that positive selection is acting on this salivary protein (Table 8). Fst analysis revealed great genetic differentiation between Aswan, Egypt, and Malka, Jordan with an Fst value of 0.10913 and moderate genetic differentiation between Aswan, Egypt, and Swaimeh, Jordan with a value of 0.06936 (Table 9). There is little genetic differentiation between Malka, Jordan, and Swaimeh, Jordan, (Fst = 0.01595). The median joining network for PpSP28 similarly does not exhibit any clear clustering of the Egypt or Jordan populations, however there are as many as 11 mutations separating connected haplotypes (Fig 9). PpSP28 sliding window analysis of Ka/Ks demonstrates the potential for PpSP28 to be under diversifying selection in several areas in contrast to the majority of the protein under purifying selection in all populations (Table 10). Values higher than one were detected in all 3 populations with PPJM having two sliding window regions with values over one compared to one sliding window region in PPJS and PPAW (Table 10). Secondary structure & T-cell epitope predictions. The predicted amino acid sequence for PpSP28 revealed 53 polymorphic sites with 30 predicted to be in α-helices whereas the other 23 polymorphic sites were found in predicted coils (Fig 10). Twenty of the polymorphic sites were found in predicted MHC II T-cell epitope binding sites. Only 1 high affinity predicted binding site between residues M1 and S19 showed no variation. IEDB identified two alleles would recognize this region including DPA1_0301/DPB1_0402 and DPA1_01/ DPB1_0401. The majority of the ProPred alleles recognized some combination of residues between M1 and Q28. Another conserved region was identified between residues F33 and S41 and was recognized by DRB1_08XX, DRB1_11XX, and DRB1_13XX. The final region demonstrating conservation between residues L46 and L54 was recognized but by very few alleles. Two of the regions with the strongest affinities housed the most variation such as residues L71

Multi-copy gene analysis for all salivary proteins
Consistent with prior vector genome assemblies, variation within and/or between loci seems to have affected the initial assembly of SP proteins in this species. In general, comparing 2 individual sand fly samples relative to the reference assembly implies that some SP proteins may exist as 2 separate loci given its coverage is significantly less than expected (p <0.0002; see Methods) using the Poisson-based coverage model of Lander and Waterman [52]. The only potential evidence of multiple copies occurred at the 3' terminal end of the second and third region (exon) of SP42 (2-3X expectation in both samples); however, other regions of the same gene were found less than expected (p < 0.0002) (S2 Table).

Discussion
We examined the genetic variability and potential immunogenicity of nine abundantly expressed P. papatasi salivary proteins with the overarching goal of identifying prospective targets to incorporate into an anti-leishmanial vaccine. The salivary proteins assessed included: PpSP12, PpSP14, PpSP28, PpSP29, PpSP30, PpSP32, PpSP36, PpSP42, and PpSP44. All sand flies were collected from three natural populations and were subjected to similar analyses that we previously performed for PpSP15 [14] to ascertain those salivary proteins that demonstrate similar characteristics to PpSP15 as it has been extensively studied as a vaccine target [53]. A multitude of considerations must be addressed to characterize a salivary protein as a potential vaccine candidate, including genetic variability and conservation across populations, consistent expression, and immunogenicity. We recommend that PpSP12 and PpSP14 be considered in vaccination strategies as these proteins are conserved across populations, demonstrate minimal variability, do not appear to be under selective pressure, and have the potential to activate the human immune system. PpSP28, PpSP29, PpSP32, PpSP36, PpSP42, or PpSP44 may be viable candidates for further vaccination applications but we would prioritize PpSP12 and PpSP14. PpSP12 and PpSP14 exhibited a high degree of conservation at the nucleotide and amino acid levels (Figs 2 & 5) across all three populations studied. When our sampled sequences were aligned with previously published P. papatasi salivary protein gene sequences from Tunisia (PpSP12 accession number JQ988874 and PpSP14 accession number JQ988880) [51] and Israel (PpSP12 accession number AF335485 and PpSP14 accession number AF335486) [15], PpSP12 and PpSP14 demonstrated almost identical sequences with 95% and 91% identity shared, respectively. This level of conservation across multiple populations beyond those included in this study demonstrates the potential for a vaccine with broad geographic coverage. Furthermore, the population genetics indices do not indicate that PpSP12 is under selective pressure. PpSP14 might be under slight selective pressure as evidenced by Tajima's D and Ka/Ks ratio though these values are not statistically significant (refer to Table 5 for summary information). In addition, a smaller number of nonsynonymous mutations or replacement changes are observed in PpSP12 (6) and PpSP14 (13) than in our previous PpSP15 analysis (19), further relative frequencies of amino acid polymorphisms in wild caught P. papatasi populations from PPAW, PPJM, and PPJS.
https://doi.org/10.1371/journal.pntd.0007489.g008 suggesting PpSP12 and PpSP14 are not under positive selection pressures [14]. Nor do the median joining networks utilizing these genes indicate any clear population structuring. PpSP12, PpSp14, and PpSP15 belong to the family of small odorant-binding proteins (OBP) but their specific functions are unknown. Phlebotomus OBPs are related to the D7 protein family that includes PpSP28 and PpSP30 and may have arose from a duplication event of a D7 gene [51]. The high degree of conservancy among the OBPs demonstrated in this study mimics similar conservation of salivary proteins in geographically distant populations of P. duboscqi in Kenya and Mali [54]. The amplicon region for SP30 is 98.91% identical between the P. papatasi reference sequence and one P. duboscqi sequence, so based on the sequencing here, we cannot be absolutely sure that this sequence is from P. papatasi. However, because we did not collect any P. duboscqi and the other salivary gland genes, amplified from the same flies, exhibited more identity to P. papatasi genes than P. duboscqi genes, we are confident that these sequences are not from P. dubsocqi. P. duboscqi and P. papatasi belong to the same subgenus and are both known vectors of Le. major. The use of highly conserved salivary proteins across sand fly species to elicit a cross-protective effect would make the ideal vaccine, and  cross protection against Le. major using salivary gland homogenate from P. papatasi and P. duboscqi using a murine model has been demonstrated [55]. Even though species specificity exists, cross protection may be possible across species that vector the same Leishmania parasites, i.e. Le. major vectored by P. papatasi, P. duboscqi, and P. bergeroti. Gene expression is another important consideration in vaccine design as it relates to antigen dosage [56]. Salivary protein genes that are constitutively expressed are viable vaccine targets more so than those genes that change due to seasonality or other factors. We previously assessed the expression of the nine genes analyzed here from the same collections [26] and found that PpSP12 was significantly upregulated in September for the PPJS population but no significant change occurred in the other populations. PpSP14 did not experience a significant change in expression during the sampling season [26]. Sugar content in plants from dry habitats, like Swaimeh, Jordan, varies in comparison to plants found in irrigated areas like Aswan, Egypt, suggesting that sugar source may be a principle factor in the differential expression demonstrated by PpSP12. We also have demonstrated that gene expression of PpSP12 and PpSP14 is influenced by diet and senescence [57]. In colony-reared, 3-day old sand flies, a 3.95 and 2.18-fold change was observed in blood-fed and sugar-fed flies respectively, compared to nonfed sand flies for PpSP12. For PpSP14, there was a 3.05-fold change in blood fed females compared to nonfed female sand flies. There was similar upregulation of both PpSP12 and PpSP14 at day 5 and day 9 post-emergence. Though diet and senescence may influence salivary gland gene expression, environmental factors play a much larger role in gene expression regulation in wild populations [26]. Both PpSP12 and PpSP14 were expressed throughout seasonal trappings and when specifically tested for age or diet. Although these genes are not considered constitutively expressed like PpSP32, they do not experience downregulation providing further evidence of their potential to provide a high enough antigen dose to prime the immune system for protection [26,57].
Another key aspect to vaccine development is the potential to elicit an immune response in human hosts. If certain salivary proteins are not predicted to interface with the appropriate human immune cells, then those salivary proteins should be excluded from further study. Both mature PpSP12 and PpSP14 proteins have multiple promiscuous MHC class II epitopes identified for presentation to T-cell receptors with limited variation in the potential epitope regions. Conversely, PpSP28 demonstrates high variability in predicted epitope regions decreasing the bonding likelihood with MHC class II receptors. We also identified the MHC class II alleles expected to recognize the salivary protein epitopes and investigated the predominant alleles of human populations living in Egypt and Jordan. The MHC class II alleles with strong binding affinities for PpSP12 that are also prevalent in Egyptian and Jordanian human populations include: DRB1_0301, DRB1_040X, DRB1_110X, DRB1_1301, and DRB1_150X [58][59][60]. The MHC class II alleles identified for PpSP14 include: DRB1_1101, DRB1_0301, DRB1_040X, DRB1_11XX, DRB1_13XX, DRB1_15XX, and to a lesser extent DRB1_070X [58][59][60]. The six remaining salivary proteins were predicted to bind to MHC class II alleles with varying affinity. PpSP36, PpSp42, and PpSP44 demonstrated greater binding affinities to multiple regions for each predicted protein structure but were not predicted to bind with the most prevalent alleles in the human populations from Egypt and Jordan (Supl. data). PpSP29, PpSP30, and PpSP32, displayed fewer predicted binding regions with lower affinities for those regions (Supl. data).
Of critical importance is whether these salivary proteins are recognized by human plasma. Although PpSP12 and PpSP14 are smaller in size than the other salivary proteins analyzed, Fig 10. PpSp28 secondary structure, polymorphic sites, and MHC class II epitope predictions. The mature PpSP28 amino acid sequence predicted secondary structure. Yellow highlighted amino acids indicate the predicted MHC class II predicted promiscuous peptides. Individual amino acids underlined in black indicate unique polymorphic sites. Predicted secondary structure based on sequence accession #AGE83090 and #AGE83091 [51].
https://doi.org/10.1371/journal.pntd.0007489.g010 they are less variable overall as there is less opportunity for mutations to occur. Even though larger proteins might be more immunogenic, our data, supported by previous studies, indicate that PpSP12 and PpSP14 will be recognized by alleles circulating in the study areas [16,[58][59][60]. Both PpSP12 and PpSP14 are recognized by the immune system but antibody specificity differs among the human populations tested [16]. Our previous assessment of human responses included Egyptian and Jordanian residents (MENA donors) from the same study sites analyzed here and U.S. military personnel deployed overseas [16]; MENA donors displayed antibody responses to PpSP12, PpSP26, PpSP30, PpSP38, and PpSP44 but not PpSP14, whereas U.S. military displayed specificity to PpSP14 and PpSP38 but not PpSP12 [16]. In an independent study, it was shown that plasma antibody specificity of 200 Tunisian children ages 6 to 12 years old reacted to PpSP12, PpSP15, PpSP21, PpSP28, PpSP30, PpSP36, and PpSP44, but not to PpSP14 [61], emphasizing the impact of prolonged exposure to sand fly bites versus naïve individuals traveling to sand fly endemic areas [16,61]. Interestingly, PpSP12 and PpSP14 were also shown immunoreactive in unexposed control donors and that the circulating antibodies against these specific salivary proteins could be the result of exposure to other hematophagous arthropod species [16].
Specific antibody response also factors into the polarization of the immune response to a Th1-mediated or Th2-mediated response. The polarization to Th1 or Th2 responses result in protection against CL or a disease exacerbation effect, respectively [10,15]. In our previous work, total P. papatasi salivary gland homogenate elicited IgG4 specificity as the dominant isotype and subclass circulating in human donors and positively correlated with IgE concentrations [16]. IgG4 and IgE are hallmarks of a Th2 and allergic hypersensitivity response [62]. Another study demonstrated that whole salivary gland homogenate upregulates interleukin 4 (IL-4) while inhibiting interleukin 12 (IL-12) and IFN-γ skewing to a Th2 response in the murine model [63]. Th1/Th2 polarization is also dependent on no exposure or pre-exposure to sand fly bites (as reviewed in [64]). The antibody response to individual salivary protein antigens was characterized [61]. PpSP12 was recognized predominately by IgG1 and IgG2 and not IgG4 nor IgE indicating its potential to polarize to a protective Th1 response. PpSP14 was not characterized as it did not demonstrate antibody specificity, but in another study produced a humoral response [5,61].
Taken together, our results and those of others demonstrate the potential of PpSP12 and PpSp14 as vaccine targets. Further testing needs to be conducted to more specifically determine the Th1/Th2 response of PpSP12 and PpSP14 as well as determine if these proteins would confer protection in individuals living in endemic regions as well as naïve populations who may work or travel to endemic areas.
Supporting information S1