Sequence Conservation in Plasmodium falciparum α-Helical Coiled Coil Domains Proposed for Vaccine Development

Background The availability of the P. falciparum genome has led to novel ways to identify potential vaccine candidates. A new approach for antigen discovery based on the bioinformatic selection of heptad repeat motifs corresponding to α-helical coiled coil structures yielded promising results. To elucidate the question about the relationship between the coiled coil motifs and their sequence conservation, we have assessed the extent of polymorphism in putative α-helical coiled coil domains in culture strains, in natural populations and in the single nucleotide polymorphism data available at PlasmoDB. Methodology/Principal Findings 14 α-helical coiled coil domains were selected based on preclinical experimental evaluation. They were tested by PCR amplification and sequencing of different P. falciparum culture strains and field isolates. We found that only 3 out of 14 α-helical coiled coils showed point mutations and/or length polymorphisms. Based on promising immunological results 5 of these peptides were selected for further analysis. Direct sequencing of field samples from Papua New Guinea and Tanzania showed that 3 out of these 5 peptides were completely conserved. An in silico analysis of polymorphism was performed for all 166 putative α-helical coiled coil domains originally identified in the P. falciparum genome. We found that 82% (137/166) of these peptides were conserved, and for one peptide only the detected SNPs decreased substantially the probability score for α-helical coiled coil formation. More SNPs were found in arrays of almost perfect tandem repeats. In summary, the coiled coil structure prediction was rarely modified by SNPs. The analysis revealed a number of peptides with strictly conserved α-helical coiled coil motifs. Conclusion/Significance We conclude that the selection of α-helical coiled coil structural motifs is a valuable approach to identify potential vaccine targets showing a high degree of conservation.


Introduction
The majority of known malaria antigens are highly polymorphic [1]. Tandem repeats are found in central domains of many antigens giving rise to extensive length polymorphism (LP) [2]. In addition, single nucleotide polymorphisms (SNPs) are abundant in antigenic genes, with 65% of SNPs on a genome-wide scale being non-synonymous (i.e the nucleotide substitution results in an amino acid change) [3]. The genetic diversity of new vaccine candidates is generally determined in the preclinical characterization of the candidate. High levels of polymorphism in malaria antigens are thought to be part of the parasite's strategy to avoid destruction by the host's immune defense. By including polymorphic sequences in a malaria vaccine, variant-specific immune responses will be elicited. As a consequence, alleles distinct form the vaccine molecule will be favored by selective advantage giving rise to escape variants. This situation was observed by molecular and immunological monitoring in the Phase I/IIb trial of the malaria vaccine Combination B that, in addition to two other components, contained almost the full length of merozoite surface protein 2 (MSP2) allele of the 3D7 cloned parasite line [4]. In vaccine recipients, a lower prevalence of the 3D7-type genotype was observed and genotypes belonging to the alternative allelic family were responsible for a higher incidence of malaria episodes [5]. A significant strain-specific humoral response was directed against the repetitive and family-specific MSP2 domains, whereas only low antibody titres were observed against conserved domains of MSP2 [6]. Similarly, a strain-specific response was observed in a challenge trial in Aotus monkeys with the two alleles of MSP1 42 [7]. There are also contrasting findings from clinical trial of RTS,S where no selection was observed in break-through infections for SNPs in the circumsporozoite protein T-cell-epitope regions [8]. The question remains whether the inclusion of more than one allelic form of an antigen can compensate substantial polymorphism [9]. As for MSP2, the inclusion of two variants into a vaccine has been proposed for MSP3 [6,10]. So far there is little experimental evidence that multi-allele vaccines actually reduce morbidity in contrast to single antigen vaccines [4]. An other interesting aspect in immune evasion is that naturally occurring variants of the same epitope can prevent memory T cells effector functions referred as ''altered peptide ligand'' antagonism [11,12].
The above examples highlight a major obstacle for vaccine development posed by polymorphism in vaccine candidates. By using non-polymorphic domains of antigens, selection of vaccine escape variants could be avoided. A further important consideration in vaccine development is the complexity of candidate molecules in the vaccine formulation. If more variants are required in order to cover the major alleles found world-wide, highly complex mixtures, particularly for multi-component vaccines, would result; thus risking high costs and potential antagonistic effects [4].
Our approach to discover novel vaccine candidates is based on the selection of protein segments with defined structural motifs, with emphasis on identifying conserved domains of antigens. A genome-wide bioinformatic approach was taken to identify potential candidates that contain an a-helical coiled coil motif [13]. The a-helical coiled coils share a (abcdefg) n motif containing hydrophobic residues at positions a and d and generally polar in the remaining positions. Chemically synthesized short peptides consisting of this motif can fold into their native structure. This is an appealing characteristic and represents a new approach to malaria vaccine development. The use of synthetic peptides over recombinantly expressed proteins in vaccines is advantageous because no expression or elaborate purification system is required, making the development process much less tedious and time consuming [14]. A further advantage of a-helical coiled coil motifs is that they are recognized by conformational dependent antibodies [15]. These coiled coil motifs are highly abundant in the eukaryotic cell. They are found in about 10% of all protein sequences [16]. This widespread occurrence in nature is explained by the broad range of function pertaining to the specific design of their coiled coil domains [17]. The crucial biological function of this domain has been investigated in numerous proteins. Generally, a-helical coiled coil domains serve as oligomerization motifs in proteins.
The rationale for our focus on peptides with little or no polymorphism was that these coiled coil regions were immunogenic in mice and well recognized by naturally occurring antibodies [13] (Olugbile et al., manuscript in preparation). In addition, affinity purified antibodies against these peptides killed parasites in vitro as shown by an assay involving antibodydependent cellular inhibition [13,18,19]. Presence of hydrophobic residues in a and d positions is important for formation of the critical interhelical interactions while mostly hydrophilic residues in the remaining positions are exposed on the surface of the ahelical coiled coil motif and assumed to function as sites for protein interaction. Such structural and functional constraints associated with coiled coil domains likely signify these motifs are under purifying selection and led us to expect and investigate sequence conservation.
In an attempt to elucidate the relationship between coiled coil structure and sequence conservation, we analyzed the polymorphism in 166 synthetic peptides, previously identified in a genomewide selection process [13]. Many of these molecules have undergone immunological testing [13] and some have successfully entered the vaccine development pipeline. 14 peptides included in the analysis were further assessed in 13 culture strains. The sequence diversity of 5 of these 14 peptides was also investigated in parasite populations from endemic countries.

Ethics Statement
Research clearance for blood sampling and genetic analysis of parasites was granted by the Tanzanian Commission for Science and Technology and by the Medical Research and Advisory Committee of the Ministry of Health in Papua New Guinea.

Parasite culture
The culture strains were grown in 10 cm Petri dishes and cultured by standard methods in an atmosphere of 93% N 2 , 4% CO 2 , 3% O 2 at 37uC as described previously [20]. The culture medium was RPMI 1640 10.44 g/L, supplemented with Hepes 5.94 g/L, Albumax II 5 g/L, hypoxanthine 50 mg/L, sodium bicarbonate 2.1 g/L and neomycin 100 mg/L.

Polymorphism study in culture strains
Genetic diversity of 14 peptides spanning the a-helical coiled coil region of 10 hypothetical proteins was assessed in 13 in vitro culture strains (3D7, W2mef, HB3, ITG2F6, IFA18, FVO, 7G8, K1, RO33, MAD20, FCR3, RFCR3 and FC27). The geographical origin of each strain is listed in Table 1. Genomic DNA was isolated with phenol/chloroform extraction. PCR primers used to amplify the a-helical coiled-coil region are listed in Table S1. 63 blood samples derived from 1-5 year old children from Ifakara, Tanzania with uncomplicated acute malaria. These samples were collected in the course of an antimalarial drug trial [21]. 19 samples were asymptomatic community samples from Mugil village, Papua New Guinea [22]. Genomic DNA was isolated with phenol/chloroform extraction or the QIAamp DNA Blood Mini Kit 250 (Qiagen). PCR conditions consisted of denaturation at 94uC for 5 min followed by 35 cycles of denaturation (94uC for 1 min), annealing (50uC for 1 min) and extension (72uC for 1 min). The reaction products were incubated at 72uC for 10 min to ensure complete DNA extension. The PCR products were directly sequenced and aligned using Auto Assembler software to screen for SNPs and LP within the sequences corresponding to the peptides.

RNA isolation and cDNA synthesis
Due to large introns in the genomic DNA corresponding to P38 and P77 cDNA was produced for sequencing of the 13 culture strains (3D7, W2mef, HB3, ITG2F6, IFA18, FVO, 7G8, K1, RO33, MAD20, FCR3, RFCR3 and FC27). 10 ml of parasite culture of 5% mixed stage parasites were lysed in 3 ml of Trizol (Invitrogen) and RNA was extracted with 0.2 volumes of chloroform and precipitated with 0.8 volumes of isopropanol. The extraction was repeated in half of the original Trizol volume to reduce contamination with gDNA. Residual gDNA was digested twice with RQ1 Dnase (Promega) according to the manufacturer's protocol in a total volume of 50 ml. RNA was dissolved in 25 ml of 5 mM Tris/0.5 mM EDTA and 9.5 ml RNA was used for the reverse transcription by AffinityScript Multiple Temperature Reverse Transcriptase (Stratagene) with random primers (Invitrogen) as described by the manufacturer. P38 and P77 sequences were amplified from cDNA with Advantage cDNA polymerase (BD Biosciences) using the primers listed in Table S1.
To determine gDNA contamination the corresponding peptide sequences were amplified with Advantage Taq from RNA processed simultaneously without the addition of reverse transcriptase.

Identification of a-helical coiled coil motif
For the original genome-wide selection of coiled coil domains, we generated 25 residues-long,-helical coiled coil profiles [13] by using pftools package [23]. This profile was constructed by using a multiple alignment of amino acid sequences corresponding to the known a-helical coiled coil domains found in the Protein Data Bank (PDB) [24], release 2006. In this work, the profile was updated by adding new sequences of the known coiled coil structures from PDB (release 2008). The score of this profile reflects the level of similarity of an analyzed amino acid sequence to the typical coiled coil motif which was deduced from the alignment of the known coiled coils. Tests of this profile against a sequence database of proteins with the known 3D structures showed that (1) the scores above 3.0 corresponded exclusively to coiled coil structures; (2) some coiled coil structures may have scores above 2.1. The 2.1 cut-off level was chosen for the first stage of the identification procedure to include most of the putative coiled coils. Subsequently, the selected coiled coil regions were tested manually for the presence of the characteristic heptad repeats. Although all putative a-helical coiled coil domains identified in the P. falciparum genome share the heptad repeat sequence motif, they can be distinguished by fidelity of the heptad repetitions. We subdivided the analyzed coiled coil regions into two groups. The first group contains peptides with perfect, or, in case of one or more SNPs, almost perfect tandem repetition of a certain sequence motif. The length of the perfect repeat either coincides with the length of the 7 residue coiled coil repeat or is divisible by 7. The second group of imperfect repeats is characterized by the repetition of amino acid residues with similar physico-chemical properties rather than by repeat units of exactly the same amino acid residues. Both types of repeats contain hydrophobic residues at positions a and d of the heptads and polar residues in the remaining positions. In this work, we assed the extend of polymorphism in the identified a-helical coiled coil domains and examined the polymorphism in perfect or almost perfect repeats as opposed to that in imperfect repeats.

Results
The extensive genetic diversity of blood stage antigens is one of the key challenges in vaccine development against malaria. After the selection of 166 novel blood stage vaccine candidates, all harbouring a-helical coiled coil motif, we undertook a comprehensive in silico analysis of these domains. In addition, we performed an in depth molecular epidemiological analysis on selected peptides that proved to be the most promising vaccine candidates according to an immunological evaluation process [13].
SNP data for a maximum of 15 P. falciparum culture strains are currently available in the PlasmoDB 5.4 database (http:// PlasmoDB.org), the official database of the P. falciparum genome sequencing consortium. Unfortunately, PlasmoDB 5.4 does not incorporate information on insertions or deletions. Nevertheless, insertions and deletions are thought to provide as much diversity as SNPs in P. falciparum [25]. In order to determine the full extent of diversity in recently identified vaccine candidates [13], we have analyzed the polymorphism of the selected a-helical coiled coil regions in 13 different culture strains and in cross sectional field samples from malaria endemic regions. Direct sequencing made it possible to detect both, SNP and LP.
Genetic Diversity in in vitro culture strains of P. falciparum Table 1 lists the 13 strains that were analyzed for SNPs and LP by PCR and direct sequencing of the corresponding 14 most promising peptides selected from the preclinical evaluations. Primers used for PCR amplification and sequencing are listed in Table S1. Polymorphism results are presented in Table 2. In addition we have included all information on polymorphism that is publicly available at PlasmoDB 5.4. Peptides P1, P14 and P83 showed either LP alone (P1, Figure 1) or both types of polymorphism, SNP plus LP (P14, Figure 2 and P83 Figure 3). Peptides showing LP revealed tandem repeats and differed from each other by 1 to 3 heptad repeat units. Therefore, these mutations do not introduce a frame shift into the coiled coil motif. For example, P83 corresponding to the a-helical coiled coil domain of the gene product of PFC0345w shows a duplication of the heptamer DMNIKEN between amino acids N276 and D277, and was detected with a frequency of 0.31 (4/13) in 4 culture strains ( Table 2).
Some SNPs were prevalent in our culture strains. For P83 the SNP at nt 863 TRA occurred with a frequency of 0.38 (5/13). This SNP was also described in PlasmoDB 5.4 and was observed with a frequency of 0.43 (6/14). Two additional SNPs were found for P83. A GRA substitution at nt position 927 resulted in a nonsynonymous change from methionine (M) to isoleucine (I) at amino acid 309. This SNP was only found once and only in our culture strains. An additional SNP at nt position 999, arising from an ARC conversion, results in an amino acid change from glutamic acid (E) to aspartic acid (D), and was also found only once. These SNPs were also reported in PlasmoDB. With respect to the underlying heptad repeat positions labeled a to g, all SNPs detected in P83 were at hydrophobic positions a and d, however, these SNPs do not affect the coiled coil formation significantly and the antibody epitopes on the surface of the coiled coil likely remain conserved. It is worth mentioning that, in principle, one point mutation can change the oligomerization state of the coiled coil. However, the general rules governing the stoichiometry of the coiled coil structures are still largely unknown. Due to the fact that most of the observed non-synonymous SNPs represent change to the residues with similar physico-chemical properties (e.g. hydrophobic to hydrophobic), we assume that the oligomerization states of the coiled coils also remain conserved. One SNP of P83 leads to a change of a hydrophobic methionine (M288) to a charged lysine (K) in d-position of the heptamer unit underlying coiled coil structural motifs ( Table 2). The charged residue in dposition favors a coiled coil dimer. However, we do not know the

Diversity in parasite populations
Five vaccine candidates, prioritized according to their performance in immunological and functional assays, were further analyzed in a small-scale molecular epidemiological survey. The basis for selection and results of the preclinical evaluation process were published previously [13,14]. The five candidates were peptides P8, P27, P77, P83 and P90. In addition to showing promising results, all were conserved or showed limited polymorphism in culture strains ( Table 2). The extent of sequence conservation was determined in field samples from two malaria endemic areas: Tanzania (TZ) and Papua New Guinea (PNG) ( Table 3). P27 and P77 were found to be completely conserved. P90 is also conserved on the amino acid level in 23 samples from TZ and 31 samples from PNG, with only a synonymous SNP 195 ARG in samples from TZ with a frequency of 0.17 (4/23). This SNP was also reported in PlasmoDB 5.4 to occur in the GHANA1 strain ( Table 2). The P8 sequence was conserved in culture strains, but in the field samples 2 SNPs were detected. 4 SNPs (SNP1-4) and 2 LP (LPT2, LPT26) were detected for P83 in both populations examined ( Figure 4). Thus, in field samples 3/5 peptides showed complete conservation on the amino acid level and only minor polymorphism was observed for the remaining 2 candidates. This is in line with sequence diversity detected in   Effects of polymorphism on the probability of a-helical coiled coil structure formation A comprehensive in silico analysis was performed on 166 selected coiled coil sequences to determine the effects of SNP and LP on the probability to form an a-helical coiled coil. Our approach allowed the prediction of structure modifications caused by the known SNPs within our peptides, which are recorded in the SNP database in PlasmoDB Overall we detected a high degree of sequence conservation in 166 predicted a-helical coiled coil domains. Only 29/166 peptides showed limited polymorphism. In one of the 29 polymorphic peptides the score fell below the cut-off (altered score bolded in Table 4). In contrast to the above result on P83 that had shown SNPs exclusively at hydrophobic residues, the majority of the SNPs in the 166 peptides were found at the surface positions b, c, e, f and g within the heptad repeat and are unlikely to destabilize the coiled coil structures.
For peptides P17 and P23 -wild type (in culture strain 3D7; Table 4), P17-mutant (in culture strain GHANA1), and P23-mutant (in culture strain 7G8, FCB, K1, GHANA1; Table 4) the length of the synthesized peptide was too short to be analyzed by the 25 residue-long profile. These short peptides were then analyzed manually and it was shown that mutations do not affect the heptad pattern and therefore do not prevent a-helical coiled coil formation.

Discussion
One of the hurdles in vaccine development against erythrocytic stages of the parasite is the extensive polymorphism observed in blood stage antigens. A function of polymorphic epitopes may be to divert the effective response. In natural and artificially induced humoral responses, the polymorphic regions of antigens were found to be immunodominant [2,10,26], but it is not known whether polymorphic regions are better or worse than conserved regions as targets of protective immunity.
The fact that polymorphism is maintained in populations lead to the question whether this is due to immune selection through allele-specific protective responses. For several polymorphic antigens immune selection has been confirmed [10,27]. Similarly, SNPs were demonstrated to be under balancing selection, and the  frequency of SNPs as a signature of selection was used to identify new vaccine targets in known antigens [27] or in the entire Plasmodium falciparum genome [3].
Conserved regions were found to be less antigenic and immunogenic than polymorphic regions [6,10,26,28]. To investigate whether conserved regions can elicit adequate protection, the effect of antibody responses to both the conserved and polymorphic regions of MSP3 was measured [10]. Antibodies against both regions were associated with a reduced risk to develop clinical malaria [10]. Moreover, antibodies against the conserved epitopes elicited in humans inhibited parasites growth in vitro as shown by the antibody-dependent cellular inhibition assay [29] and lead to rapid parasite clearance after injection to humanized mice [29,30].
During preclinical evaluation of new vaccine candidates, both antigenicity and sequence conservation are generally determined. We have shown that the majority of our candidates were both conserved and antigenic. In sero-epidemiological surveys the most peptides were found to be recognized by sera of adults from malaria endemic countries [13] (own unpublished results). Immunogenicity of most of the peptides investigated in more detail was confirmed in mice or rabbits [13] (Olugbile, unpublished results). It remains to be shown whether our described strategy to select non-polymorphic epitopes for inclusion in a vaccine will lead to greater efficacy in a field trial.
We analyzed the degree of conservation in predicted a-helical coiled coil regions of all proteins expressed in the blood stages of the parasite. Sequencing revealed that SNPs observed in field samples did not seem to disturb the heptad motif and thus do not destabilize the coiled coil structure. SNPs mostly occurred at hydrophilic surface positions of the coil except for P83. It is likely that SNPs located at surface positions result in a decreased antibody response to the variant epitope and may lead to immune evasion. However, the extent of polymorphism detected in our candidates was very limited and thus might not create a major limitation for vaccine efficiency.
Our sequence analysis revealed that both SNPs and LP were preferentially observed in the a-helical coiled coil motifs containing almost perfect tandem repeats. The perfect repeat units either coincide with the 7-residue coiled coil repeat (e.g. P14, P45, P50, P51, P64, P81, P83, P144, P166) or covering two or more heptad repeats (e.g. P1, P94). This correlation between the repeats perfection and polymorphism leads to a practical recommendation for selection of vaccine candidates: when searching for a-helical coiled coil regions with a reduced level of polymorphism, one should avoid regions with almost perfect tandem repeats.
a-helical coiled coil domains were found to be crucial for the biological function of various proteins. Coiled coils have been shown to be involved in oligomerization, protein-protein interaction and complex formation. These features support many cellular processes such as membrane fusion, protein transport and cell motility [17]. But for the proteins investigated by it is not known whether the putative coiled coil domains are of any functional importance. If these domains play a role in protein function, purifying selection might counteract diversification and polymorphism. Sequence conservation due to functional constraints was reported from viral envelope proteins where the conserved region was found at those positions of the coiled coil that are responsible for protein-protein interaction [31].
Extensive polymorphism has been an issue for the most promising blood stage vaccine candidates, such as apical membrane antigen 1, MSP1 and 2 [6,[32][33][34][35]. In the past, vaccine research was focused on a limited number of vaccine candidates. Due to recent disappointing results from clinical trials where a number of vaccine candidates were not found to be immunogenic, safe, or protective against artificial challenge [36][37][38], new emphasis is laid upon the discovery of novel target antigens. If more of the current candidates fail, additional antigens are    required for supplementing the vaccine pipeline. There is a great demand to identify new antigens that are both, immunogenic and conserved. It remains to be shown whether the strategy to include non-polymorphic antigens in a vaccine formulation will increase protection.