A Novel Candidate Vaccine for Cytauxzoonosis Inferred from Comparative Apicomplexan Genomics

Cytauxzoonosis is an emerging infectious disease of domestic cats (Felis catus) caused by the apicomplexan protozoan parasite Cytauxzoon felis. The growing epidemic, with its high morbidity and mortality points to the need for a protective vaccine against cytauxzoonosis. Unfortunately, the causative agent has yet to be cultured continuously in vitro, rendering traditional vaccine development approaches beyond reach. Here we report the use of comparative genomics to computationally and experimentally interpret the C. felis genome to identify a novel candidate vaccine antigen for cytauxzoonosis. As a starting point we sequenced, assembled, and annotated the C. felis genome and the proteins it encodes. Whole genome alignment revealed considerable conserved synteny with other apicomplexans. In particular, alignments with the bovine parasite Theileria parva revealed that a C. felis gene, cf76, is syntenic to p67 (the leading vaccine candidate for bovine theileriosis), despite a lack of significant sequence similarity. Recombinant subdomains of cf76 were challenged with survivor-cat antiserum and found to be highly seroreactive. Comparison of eleven geographically diverse samples from the south-central and southeastern USA demonstrated 91–100% amino acid sequence identity across cf76, including a high level of conservation in an immunogenic 226 amino acid (24 kDa) carboxyl terminal domain. Using in situ hybridization, transcription of cf76 was documented in the schizogenous stage of parasite replication, the life stage that is believed to be the most important for development of a protective immune response. Collectively, these data point to identification of the first potential vaccine candidate antigen for cytauxzoonosis. Further, our bioinformatic approach emphasizes the use of comparative genomics as an accelerated path to developing vaccines against experimentally intractable pathogens.


Introduction
Cytauxzoon felis is a protozoan parasite of felids that causes cytauxzoonosis, an emerging disease in domestic cats. Without treatment nearly all cats die within three to five days of the onset of clinical symptoms. There are currently no effective means to prevent cytauxzoonosis, and even with treatment costing thousands of dollars, up to 40% of cats still succumb [1,2]. First described in Missouri in 1976, the geographic range of C. felis is expanding and it has now been diagnosed in domestic cats in one third of US states ( Figure 1) [1,3,4,5,6,7,8,9,10,11]. Expansion of the geographic range is presumed to be due to changes in climate, urbanization, and increased exposure to the bobcat [Lynx rufus] reservoir host and the tick vector [Amblyomma americanum]. Bobcats experience a transient schizogenous tissue phase of limited pathogenicity followed by chronic erythroparasitemia. In domestic cats, the disease is usually characterized by a lethal acute schizogenous tissue phase. For animals that survive, a fairly innocuous chronic erythroparasitemia ensues ( Figure 2). The high mortality, growing epidemic and cost of care point to vaccination as the most practical control strategy. Prior studies documenting the development of a protective immune response against C. felis imply that vaccine development is feasible [12,13,14,15]. However the inability to culture C. felis in vitro has been a major barrier to discovery of protective antigens [14], and no vaccines against C. felis exist. In order to overcome experimental limitations and facilitate the rapid identification of vaccine candidate antigens we sequenced the entire 9.1 Mbp C. felis genome and identified approximately 4,300 protein-coding genes, each of which represents a potential protective antigen.
We used leading vaccine candidates from other apicomplexans as a guide to search for orthologues within the C. felis gene complement. Cytauxzoon felis is closely related to the apicomplexans Theileria parva and Theileria annulata, the etiologic agents of East Coast fever (ECF) and tropical theileriosis in cattle, respectively [16,17,18]. The leading vaccine candidate for T. parva, p67, has conferred substantial protection against ECF in clinical trials. Immunization of cattle with p67 reduced the incidence of severe ECF by 49% during field tick challenge trials in Kenya [19]. The T. annulata homologue of p67, SPAG-1, includes neutralizing epitopes on the carboxy terminus that are cross-reactive with p67, and SPAG-1 has been shown to confer protection to homologous species challenge [20,21]. The functions of p67 and SPAG-1 have not been definitively identified although they are proposed to be involved in host cell recognition and invasion [20,22].
Although T. parva p67 shares only 47% amino acid sequence identity with SPAG-1 [23], these two loci reside within a syntenic block of genes highly conserved between the two Theileria species, consistent with their orthology. We searched for C. felis orthologues of p67 and SPAG-1 but found no sequences with significant amino acid similarity. Therefore, guided by the approach used to identify the p67/SPAG-1 orthologue in Babesia bovis (BOV57), we used conserved genome synteny to expose the C. felis orthologue of p67/SPAG-1, which we call cf76 [24]. Here we report our assessment of three criteria likely to be important in determining suitability of cf76 as a vaccine candidate: 1) recognition by the feline immune system 2) degree of sequence similarity among C. felis isolates and 3) expression in the C. felis life stage that is believed to be critical for the development of a protective immune response.

Materials and Methods
Sequence and assembly of the Cytauxzoon felis genome and comparison of the C. felis genome with related apicomplexans Whole blood (80 ml) was collected by sterile methods into citrate phosphate dextrose adenine (CPDA-1) anticoagulant immediately post-mortem from a domestic cat that died of acute C. felis infection. Acute infection was confirmed by microscopic observation of numerous C. felis schizonts in tissue imprints of liver, lung, and spleen. The blood was leuko-reduced to remove host nucleic acid contamination and isolate merozoite infected erythrocytes using a Purecell NEO Neonatal High Efficiency Leukocyte Reduction Filter for Red Cell Aliquots (PALL Corp., Port Washington, NY). Cytauxzoon felis genomic DNA was purified from leuko-reduced blood using the QIAamp DNA Blood Mini Kit (Qiagen, Valencia, CA).
We sequenced the C. felis genome using a 454 Genome Sequencer FLX (Roche, Indianapolis, IN) with Titanium chemistry and the standard Roche protocol. The sequence was assembled using Newbler 2.0 with a minimum overlap requirement of 90% identity over 30 bases. Resulting contigs were compared to the Felis catus genome and contaminating cat sequences (,2% of total reads) were removed.
Cytauxzoon felis tRNA and mRNA were isolated from purified merozoites. Whole blood (80 ml) was collected and leuko-reduced as described above. Following leuko-reduction, erythrocytes were lysed and tRNA and mRNA were purified using the Ribopure Blood Kit and PolyAPurist Mag Kit respectively (Ambion, Grand Island, NY) [25]. A cDNA library was constructed using the SMARTer PCR cDNA Synthesis Kit (Clontech, Mountain View, CA) and generation of expressed sequence tags (ESTs) was completed using a 454 Genome Sequencer FLX (Roche, Indianapolis, IN) with Titanium chemistry and the standard Roche protocol. ESTs were assembled with Newbler and the EST assembly was aligned to the genome using GeneDetective (Time Logic, Carlsbad, CA).
GeneMark-ES 2.5 (http://exon.gatech.edu/genemark_prok_ gms_plus.cgi), which utilizes a Gibbs sampling algorithm to self-train for gene prediction, was deployed to create an initial computationally derived proteome. A combination of hand curated EST data and GeneMark results were used to create a training set for GlimmerHMM (http://www.cbcb.umd.edu/ software/glimmer) to provide a second predicted proteome. Results from the EST comparisons, GlimmerHMM and Gene-Mark as well as sequence similarity searches against protein data from B. bovis, T. parva, P. falciparum and NCBI's non-redundant protein dataset were integrated into a generic Genome Browser (GBrowse). The C. felis genome and EST sequences are deposited with NCBI under BioProject PRJNA196611.
Predicted protein coding genes shared in common amongst the different organisms was determined. These were deduced by an all-against-all blastp search. Matches with at least 60% similarity across at least 30% of the query protein were accepted as matches. Proteins shared between multiple species were calculated based on the intersection of protein identifiers in pair-wise comparisons. No attempts were made to collapse paralogues. Identification and amplification C. felis cf76 A C. felis gene, cf76 (GenBank Accession KC986871), syntenic to p67/SPAG-1/BOV57 was identified in silico. p67, SPAG-1, and BOV57 are each located downstream of the same three highly conserved genes and upstream of the same two highly conserved genes ( Figure 3). A BLAST search was used to identify the same highly conserved genes upstream and downstream of cf76. Amino acid and nucleotide alignments of cf76, p67, SPAG-1, and BOV57 were performed using clustalW [26]. Total RNA was extracted from C. felis merozoite and schizont-laden splenic tissue collected immediately post-mortem from a domestic cat that died of acute C. felis infection using the Trizol LS reagent (Sigma, St. Louis, MO), following the manufacturer's methods. Total RNA (10 mg/ reaction) was treated twice with DNA-free DNase Treatment and Removal Reagent (Ambion, Grand Island, NY). Prior to generation of cDNA, the absence of contaminating DNA in the purified RNA was confirmed by PCR for C. felis 18S rRNA genes [3]. Cytauxzoon felis cDNA was produced using random hexamer primers (Promega, Madison WI) and Smartscribe reverse transcriptase (Clontech, Mountain View, CA). PCR to amplify the C. felis syntenic gene ORF with primers designed from the predicted flanking sequences was performed using previously published conditions with 25 pmol each of primer (59 ATTGGA-TAGTAAATTAGGTTATAAG 39 and 59 GGAATTAATT-CAGTTGGAATTTG 39) and template (50 ng of C. felis splenic RNA, 1 ml of C. felis splenic cDNA, 16 ng of C. felis gDNA, or 1 ml of water) [3]. PCR products were analyzed by agarose gel electrophoresis. Identification of potential signal sequence and trans-membrane domains were deduced using Signal P Server v.4.0 and TMHMM Server v.2.0 from the Center for Biological Sequence Analysis. The GPI-anchor predictor PredGPI was employed to predict GPI-anchored protein sequence.

Cloning and in vitro expression of cf76 and cf76 fragments
The cf76 ORF (2172 bp) and three overlapping subdomains of cf76 including the N-terminal region (720 bp), the central region (828 bp), and the C-terminal region (675 bp) were amplified from C. felis cDNA using primer pairs (Table 1) with a 20 bp adapter sequence at the 59 and 39 ends homologous to cloning sites of a linearized acceptor vector pXT7, to allow for directional cloning. PCR was performed with previously published conditions using 0.05 U/ml High Fidelity Expand Plus Taq DNA polymerase (Roche, Indianapolis, IN), 25 pmol of each primer (Table 1), and 5 ng of C. felis cDNA template [27,28,29].
Each amplified cf76 PCR product was cloned into a pXT7 vector containing an N-terminus 106 histidine (HIS) tag and a Cterminus hemagglutinin (HA) tag using homologous recombination as previously described [27,28,29] and all clones were sequenced bi-directionally. In vitro transcription and translation reactions (IVTT) were performed with purified recombinant plasmids using the RTS 100 E. coli HY kit (5 PRIME, Gaithersburg, MD).

Purification of cf76 and cf76 subdomains
IVTT reaction components containing the entire cf76 protein and cf76 subdomains were purified using the N-terminal HIS tag under native and denaturing conditions with Qiagen Ni-NTA Magnetic Agarose Beads (Qiagen, Valencia, CA). Purity and quantity was assessed by western blot analysis in duplicate using secondary antibodies against the N-terminal HIS tag and the Cterminal HA tag using mouse anti-poly-HIS monoclonal IgG 2a antibody or mouse anti-poly-HA monoclonal antibody (Anti-His 6 (2) and anti-HA clone 12CA5 respectively (Roche, Indianapolis, IN) as described below.
The immune response to purified cf76 and cf76 subdomains was assessed by western blotting using pooled sera from 10 domestic cats that survived natural C. felis infection, as well as 10 nä ive cats. To determine C. felis infection status, genomic DNA (gDNA) was purified using the QIAamp DNA Blood Mini Kit (Qiagen, Valencia, CA) and real-time PCR for C. felis 18S and for the feline house-keeping gene GAPDH was performed using previously published methods [3,30]. Western blots of purified protein were prepared and blocked as described with the addition of 5% goat serum. Cat sera was diluted 1:500 in blocking buffer containing 1.5 mg/ml E. coli lysate (MCLAB, South San Francisco, CA) and incubated for 2 h at room temperature to adsorb E. coli binding proteins in feline sera that may contribute to background. Membranes were incubated with pre-adsorbed cat sera for 2 h at room temperature, washed 565 min in PBST, For tree construction, DNA sequences for the 11 cf76 sequences were aligned using ClustalX [26]. Phylogenetic reconstruction was performed via MrBayes [31] using a GTR model with gammadistributed rate variation across sites. A total of 100,000 generations were run on 2 chains with a sample frequency of 100. A burn-in of 100 data points was more than sufficient to reach convergence.

Transcription of cf76 in schizonts
C. felis infected lung tissues were harvested and formalin fixed immediately post-mortem from a cat that died of acute cytauxzoonosis. Hematoxylin and eosin (H&E) stained sections were examined for the presence of schizonts. The C-terminal region (678 bp) of cf76 was amplified by PCR, cloned into the pGEM-T Easy vector (Promega, Madison, WI) and sequenced bidirectionally. A negative sense digoxigenin-labeled riboprobe was generated and in situ hybridization was performed as previously described on infected lung tissue including the use of an nonspecific (avian viral pathogen derived sequence) digoxigeninlabeled negative sense probe [32].

Ethics Statement
This study was conducted in strict accordance with the recommendations of North Carolina State University Institutional Animal Care and Use Committee (NCSU IACUC) approved protocol 09-067-O. The samples used for this publication were either scheduled to be discarded from of a previous study (collected with owner consent under NCSU IACUC approved protocol 09-067-O) or were diagnostic samples that were scheduled to be discarded.

Results and Discussion
Sequence and assembly of the Cytauxzoon felis genome and comparison of the C. felis genome with related apicomplexans The C. felis sequence assembled into 361 contigs spanning 9.1 mega-bases (MB) of genomic DNA post decontamination of F.  catus sequence ( Table 2). The largest contiguous stretch of genomic sequence was 183 kb, with an N 50 of greater than 70 kb. This genomic data was used to establish an initial computationally predicted proteome of 4,323 genes using the self-training program GeneMark.hmm-ES (v2.5). The C. felis EST data assembled to 962 contigs covering 547 kb of gene space ( Table 2). These contigs were used in a BLASTX search against the NCBI non-redundant database to identify contigs that likely represent close to full-length genes. A GeneDetective search of the ESTs against the genomic data provided information about gene structure (Time Logic, Carlsbad, CA).
A set of 100 randomly selected GeneMark predictions and 57 hand-curated full-length ESTs were then used as a training set for GlimmerHMM (v.3.02). Based on that training set, Glim-merHMM predicted 4,378 genes (Table 3). Although there was some slight variation between the two computationally derived gene sets (Table 3), approximately 25% of the genes are identical between the two, and a further 50% differ only in ascribing the most 59 or most 39 exons; such discrepancies were typically straightforward to resolve with manual curation.
In comparing the C. felis genome (GenBank Accession PRJNA196611) to the genomes of three related apicomplexans, T. parva (GenBank Accession PRJNA16136), B. bovis (GenBank Accession PRJNA20343) and P. falciparum (GenBank Accession PRJNA148) attributes such as genome size, %GC content, average protein length and number of protein-coding genes most closely resemble T. parva and are most different from P. falciparum (Table 3). A comparison of predicted genes between these sets reveals more similar genes in common with T. parva. A total of 914 similar genes are present in all four apicomplexans, and 2,420 are shared by C. felis, B. bovis and T. parva but are not found in P. falciparum. Note that the numbers in each sector of the Venn Diagram are not strictly additive due to the variation in size of different gene families within each of the respective genomes, but provide an overall indication as to the relatedness between the organisms as a whole ( Figure S1).

Identification and characterization of C. felis cf76
Given that C. felis is most closely related to Theileria spp, a BLAST search was used to identify a C. felis orthologue to p67/ SPAG-1. However, no C. felis genes with significant identity to p67 or SPAG-1 were identified within the C. felis genome. Therefore we used genome synteny as a guide, and identified a 2172 bp single copy C. felis gene syntenic to p67. Theileria parva p67, T. annulata SPAG-1, and B. bovis BOV57 antigens are encoded by genes that reside within a syntenic block that is highly conserved between the three species and a similar syntenic block of C. felis genes was identified in silico ( Figure 3). cf76, p67, SPAG-1, and BOV57 are 723aa, 752aa, 907aa, and 494aa in length respectively. The C. felis isolate submitted for genome sequencing was slightly larger than that of the majority of geographically diverse isolates sequenced which were approximately 706aa. Similar to BOV57, cf76 contains a single exon, while p67 and SPAG-1 contain two exons. Consistent with the BLAST result, cf76 only shared 22%, 23%, and 23% nucleotide identities with p67, SPAG-1, and BOV57 respectively ( Figure S2) and 13%, 13%, and 14% amino acid similarities with p67, SPAG-1, and BOV57 ( Figure  S3). Based on the mean predicted molecular weight across isolates  sequenced (75,557.78 Da) we designated the gene that is syntenic to p67/SPAG-1/BOV57 as cf76. Similar to p67, SPAG-1, and BOV57, cf76 encodes a protein with a predicted signal peptide sequence at the amino terminus, suggesting that this protein may be secreted. In contrast to p67/ SPAG-1, cf76 does not have a transmembrane domain, suggesting that it is unlikely to be membrane bound. Also unique to cf76 is a putative glycosylphosphatidylinositol (GPI) anchor. GPI anchors are glycolipids that anchor membrane proteins and have been associated with immunoreactivity in some protozoan pathogens [33]. While it is difficult to definitively establish orthology of cf76 with p67/SPAG-1 using sequence identity, we speculate that conserved synteny combined with a lack of conserved sequence identity may indicate an orthologous gene that is under extreme pressure from the host immune response.

Feline humoral immune response to recombinant cf76
Western blot analysis using pooled sera from 10 cats surviving C. felis infection revealed strong seroreactivity to HIS-purified recombinant cf76 and the C-terminal region. In comparison, lower intensity signal was detected against the central and Nterminal regions of cf76 with immune sera. Substantial reactivity was not observed using pooled sera from 10 cats that tested negative for C. felis, and observed signal was attributed to low levels of cross-reacting antibodies unrelated to C. felis infection (Figure 4). The apparent molecular mass of full length cf76, the N-terminal region, the central region, and the C-terminal region were approximately 100 kDa, 42 kDa, 35 kDa and 37 kDa, despite predicted molecular mass of 81.6 kDa, 27.1 kDa, 33 kDa, and 26.8 kDa respectively ( Figure 5). Production and co-purification of partial transcripts as well as putative degradation products were observed for western blots probed with anti-HIS antibodies while only complete proteins were observed on blots probed with anti-HA antibodies. Collectively these data support that the C-terminus of cf76 is highly immunogenic during natural infection with C. felis. Future experiments evaluating sera from individual cats will provide more information on the percentage of cats surviving C. felis infection that have robust antibody responses against cf76.

cf76 sequence is conserved between samples from different geographic regions
In order to assess the degree of conservation amongst C. felis parasite samples from different geographic regions, we amplified and sequenced cf76 from eleven different samples from eight states in the southeastern and south-central United States, revealing a high degree of conservation (92.2 to 100% identity) ( Figure S4). Evidence of phylogenetic divergence is seen between the isolates obtained from the two states furthest east (NC and VA) and the other states ( Figure S5). Preliminary epitope mapping revealed that high levels of feline antibodies are developed against linear epitopes present in the C-terminal region (Figure 4). This region is highly conserved amongst samples. The only variation in this region was that ten of eleven samples had a tandem repeat of 30 bp sequence while the remaining sample only had this 30 bp sequence once. cf76 is expressed in the C. felis life-stage associated with immune protection Cytauxzoon spp. has a complex life cycle with three life stages in the mammalian host: sporozoites, schizonts, and merozoites ( Figure 2). Of these, schizonts have been associated with a protective immune response. Solid immunity to C. felis was observed in cats that had previously survived the schizogenous phase of cytauxzoonosis [12,14,34]. These cats survived challenge infection with no signs of illness while naïve control cats died of cytauxzoonosis. In contrast direct inoculation with C. felis merozoites alone has not conferred protective immunity. Collectively, these data suggest antigens associated with schizonts are vaccine targets for C. felis. Based on these findings we investigated expression of cf76 in the schizont stage of C. felis using in situ hybridization. Using a labeled negative sense riboprobe directed against cytoplasmic mRNA, we found robust levels of cf76 transcripts in the schizogenous tissue stage of C. felis ( Figure 6) further supporting consideration of this antigen as a vaccine candidate.

Conclusions
Prior to our work no protein coding genes from C. felis had been characterized. Based on a full genome sequence we have now identified ,4,300 protein coding genes and characterized the first vaccine candidate for C. felis. Specifically, our work demonstrates  the potential of cf76 as a vaccine candidate antigen for cytauxzoonosis as it is: 1) recognized by the feline humoral immune system, 2) highly conserved amongst isolates and 3) transcribed in the life stage of C. felis shown to confer protective immunity. To substantiate the efficacy of cf76 as a vaccine antigen, significant reduction in morbidity and mortality of cytauxzoonosis must be demonstrated in immunization and challenge trials.
Our bioinformatic approach provides an example of how comparative genomics can provide an accelerated path to identify vaccine candidates in experimentally intractable pathogens. In addition to identification of specific candidate genes, this approach provides a valuable resource for future comparative genomic and proteomic studies to accelerate identification of additional vaccine candidates and drug targets for C. felis and related apicomplexans. Figure S1 Four way Venn Diagram: Protein coding genes of Cytauxzoon felis and related apicomplexan parasites.