Diversity and Epidemiology of Mokola Virus

Mokola virus (MOKV) appears to be exclusive to Africa. Although the first isolates were from Nigeria and other Congo basin countries, all reports over the past 20 years have been from southern Africa. Previous phylogenetic studies analyzed few isolates or used partial gene sequence for analysis since limited sequence information is available for MOKV and the isolates were distributed among various laboratories. The complete nucleoprotein, phosphoprotein, matrix and glycoprotein genes of 18 MOKV isolates in various laboratories were sequenced either using partial or full genome sequencing using pyrosequencing and a phylogenetic analysis was undertaken. The results indicated that MOKV isolates from the Republic of South Africa, Zimbabwe, Central African Republic and Nigeria clustered according to geographic origin irrespective of the genes used for phylogenetic analysis, similar to that observed with Lagos bat virus. A Bayesian Markov-Chain-Monte-Carlo- (MCMC) analysis revealed the age of the most recent common ancestor (MRCA) of MOKV to be between 279 and 2034 years depending on the genes used. Generally, all MOKV isolates showed a similar pattern at the amino acid sites considered influential for viral properties.

The first isolations of MOKV were made in 1968 and 1969 from organ pools of shrews (Crocidura flavescens manni) in Ibadan, Nigeria [5,6,7]. The only isolations from humans were in 1968 and 1971 from two girls from Nigeria [6,8,9]. However, there were no classical signs of rabies in either of these cases. Whilst the 1968 isolation was made from the cerebrospinal fluid of a girl who presented with fever and convulsions but fully recovered with no neurological damage, the 1971 isolate was from the brain of a girl who died of a poliomyelitis-like encephalitic disease. A further isolation was made in 1974 from a shrew (Crocidura spp.) in Yaounde, Cameroon [10]. The only isolation from a rodent (Lophuromys sikapusi) was in 1981, from Bangui, Central African Republic [11]. MOKV was also isolated from other animal species including companion animals. A survey on lyssaviruses undertaken in Zimbabwe between 1981 and 1984 revealed six isolations of MOKV from domestic animals, namely a dog and cats that had been previously vaccinated against rabies and unvaccinated cats [12,13]. In 1989 MOKV was isolated from a cat in Addis Ababa, Ethiopia [14]. No further isolation of MOKV was made in Zimbabwe until 1993, when the virus was again isolated from a domestic cat [15]. In the Republic of South Africa, the first isolation was made in 1970 from a domestic cat in Umhlanga Rocks, Kwa-Zulu Natal Province (KZN) [16]. At the time the isolate was assumed to be RABV and the isolate was only identified retrospectively using antigenic typing with monoclonal antibodies during the discovery of MOKV in Zimbabwe in the 1980s [17]. Twenty five years later, in 1995, MOKV was isolated from a domestic cat in South Africa, this time from Mdantsane in the Eastern Cape Province (EC) [17]. Two more isolations followed in 1996, one each in KZN and EC and both from domestic cats of which one was vaccinated against rabies [18,19]. In 1997 and 1998 three more isolations were made from rabiesvaccinated cats in KZN [18,19]. Following several years in which MOKV was not encountered, two isolations were from rabiesvaccinated domestic cats in 2006 and 2008 from the EC province and these are the most recent known isolations of this virus [20,21]. From South Africa all isolations of MOKV were from a domestic cat. Viral RNA was detected by PCR from a domestic dog (in 2005) from Mpumalanga Province of South Africa, virus isolation was unsuccessful in this case [20]. A summary of all MOKV isolates and the approximate geographic location of their origin are presented in Table 1 and Fig. 1. Generally, MOKV infected domestic animals were not observed to be particularly aggressive, but displayed other rabies-like signs that included dehydration, unusual behavior, hypersensitivity, neurological disturbance and salivation [19]. Despite MOKV being isolated from a variety of mammal species, this species is the only lyssavirus never to have been isolated from bats.
Cross protection of WHO and OIE recommended rabies vaccines against various rabies-related lyssavirus species have been reported in a number of studies [22,23,24,25,26]. However, no rabies vaccine provided complete protection against MOKV [27,28,29,30,31,32]. More evidence that RABV derived vaccines do not protect against MOKV infection is shown by circumstantial evidence of the fatal infections of numerous domestic animals that had been vaccinated against RABV [13,18,19,20,21]. Given this scenario and the apparent obscurity of MOKV, we argue that much more information is needed to improve our scant understanding of the epidemiology, disease dynamics and the ecology of this virus. Some phylogenetic studies have been undertaken on MOKV [18,20,21,33,34], but these studies were invariably performed on a smaller number of isolates and limited to partial gene sequences. Despite these limitations, these studies provided some evidence of the existence of different virus clusters, delineated according to geographical incidence. Generally, the genetic variance was shown to be inversely related to the spatial distribution of isolates. For example, South African MOKV isolates were shown to be closely related, but distinguishable based on province and as a cluster more distant from those made in a neighboring country, Zimbabwe [21,33,34]. Such patterns of genetic diversity may indicate extended periods of isolated evolution, as have been reported for terrestrial rabies virus variants [35]. An exception appeared to be a grouping that included one isolate from Cameroon and one from Ethiopia.
The study reported here involved a multi-disciplinary collaborative effort amongst various laboratories in order to generate for the first time a comprehensive dataset of all the known MOKV isolates available. We have shown that most, but not all of the viruses mentioned in literature could be tracked and that some contamination or misnaming occurred. Given a final cohort of eighteen MOKV isolates, the objective of the study was to sequence full nucleoprotein (N), phosphoprotein (P), matrix (M) and glycoprotein (G) genes. The estimation of viral lineage divergence times and subsequent application of a molecular clock is dependent on an accurate estimation of the rate of nucleotide substitution. Bayesian techniques using the Markov Chain Monte Carlo (MCMC) methods have been successfully applied to estimate the evolutionary rate and divergence times from dated sequences of RABVs [36,37,38,39,40]. This study applied a relaxed molecular clock to N-, M-, P-and G-gene datasets to obtain estimates of the time to the most recent common ancestor (MRCA) and rate of evolution for MOKV. The subsequent analysis allowed for study of the phylogeny and diversity within this African lyssavirus species.

Virus isolates
MOKV included in this study were comprised of archived isolates. Information on the geographic location, year of isolation, species origin and references of those MOKV isolates is presented in Table 1. The isolates were either passaged several times (passage number unknown) in suckling mice or in tissue culture, or both. Total RNA was extracted from the samples using the TRIzolH method (Invitrogen) according to the manufacturer's instructions.

Primer design, RT-PCR and sequencing
The N, P, M and G genes were sequenced using different primer combinations and cycling conditions available from the authors upon request. All PCR products were analyzed by agarose gel electrophoresis and subsequently purified (Wizard PCR Preps DNA Purification System; Promega). The purified PCR products were sequenced with BigDye Termination Cycle Sequencing Ready Reaction Kit 3.1 (Applied Biosystems) according to the manufacturer's protocol and analyzed on an ABI Prism 3100 DNA sequencer (Applied Biosystems). Within the duration of this project next generation sequencing technology became available and was applied on a selection of samples. Complete genome sequence was obtained directly from brain tissue for four MOKV isolates (RV4, RV1017, RV1021 and RV1035) (Marston, unpublished). Briefly, TRIzol (Invitrogen) extracted viral RNA was depleted of host genomic DNA using RNase-free DNAse (Qiagen, UK) and host ribosomal RNA was depleted using Terminator 59-Phosphate-Dependent Exonuclease (Epicentre Biotechnologies). The RNA was fragmented, a random-primed cDNA library was made and run using the Roche 454 GS FLX System. The sequencing data were assembled in the GS de novo assembly software (Roche). The de novo assembled contigs for each isolate were individually aligned using Seqman (DNAStar) using reference sequence EU293117 and/or specific isolate sequences where available. The resulting consensus sequences were used in GS Reference Mapper (Roche) to obtain further sequence reads from the original raw data for each isolate. All four complete genome sequences were obtained, apart from the extremities of the genome (UTRs). The UTRs were inferred from the previously determined MOKV UTR sequences by using RT-PCR primers situated at the beginning and end of the genome (Marston, unpublished).

Author Summary
According to the World Health Organization, rabies is considered both a neglected zoonotic and tropical disease. Among all the lyssavirus species known to exist today, Mokola virus is unique and appears to be exclusive to Africa. In contrast to all other known virus species in the genus Lyssavirus of the family Rhabdoviridae, its reservoir host has not been identified yet. As only limited sequence information is available, this study significantly contributes to the understanding of the genetic diversity and relatedness of Mokola viruses. In a collective approach, the complete nucleoprotein, phosphoprotein, matrix, and glycoprotein genes of all Mokola viruses isolated to date were sequenced in various rabies laboratories across the world. A phylogenetic analysis was undertaken and the most recent common ancestor was determined. Subsequently, results were linked to epidemiological background data. We also conducted a comparative study of distinct antigenic sites considered influential for viral properties within those genes.

Phylogenetic analyses
Nucleotide sequences were assembled and edited using Vector NTI 9.1.0 (Invitrogen). Multiple sequence alignments were generated using ClustalX and exported in FASTA format. Phylogenetic and evolutionary analyses were conducted using Mega 5.05 [41] for a variety of data sets, i.e. the N, P, M and G gene nucleotide sequences as well as the concatenated sequence. The p-distances between MOKV N gene nucleotide and amino acids sequences were also calculated.
The Maximum Clade Credibility (MCC) phylogenetic tree, estimates of the rate of molecular evolution (substitutions per site per year) and the most recent common ancestor (MRCA) for MOKV were inferred using a Bayesian Markov Chain Monte Carlo (MCMC) method in the BEAST package (BEAST and associated programmes are available via http://beast.bio.ed.ac. uk/) [42]. For this analysis, an input file for BEAST was generated using the BEAUti programme. For MCMC analysis of the concatenated gene sequence dataset (N, P, M and G genes) partitioning into genes was implemented The analysis utilized the general time reversible model with gamma distribution and proportion of invariable sites (GTR+G+I) with site heterogeneity [43] and population histories were constructed using the Bayesian skyline plot [44]. The relaxed (uncorrelated lognormal) molecular clock was chosen as demographic model. The statistical uncertainty in the data for each parameter estimate is reflected by the value of the 95% highest posterior density (HPD  [42]. Posterior probability values represent the degree of support for each node on the tree.

Results
We have generated a comprehensive dataset of all available isolates of MOKV, however, we were unable to trace some, as indicated in Table 1. Of the 24 reported detections of MOKV over the past 50 years, only 18 isolates could be included in this study. We were unable to track virus isolates for four cases reported in literature. Three of these were historical cases from Nigeria, the existence of which are now uncertain viz. two human isolates and a further isolate from a shrew [6,8,9]. The fourth was a dog-associated case reported in recent times from South Africa  [20], for which an isolate was never produced. Isolates RV39 (Cameroon, 1974) and RV610 (Ethiopia, 1990) were excluded from the MCC phylogenetic analyses as sequence data indicated that these isolates, in our hands, were likely to be the same virus (Fig. 2). A small number of nucleotide differences on some genes were believed to be due to mutations introduced from multiple cell culture passages over years of laboratory maintenance.
A number of domains on the lyssavirus genome have been implicated in the varying degrees of virulence between virus isolates of a lyssavirus species, as well as between virus isolates of different species of the Lyssavirus genus. A comparison of these amino acid positions is provided in Table 2 Fig. S4) were used. The same tree topology was observed for both nucleotide and amino acids sequence analysis (data not shown). The MCC trees indicated that isolates grouped according to geographic location. Phylogenetic analysis of MOKV isolates from South Africa and Zimbabwe demonstrated geographic clustering consistent with previous findings [18,34]. The South African isolates formed two clusters consisting of KZN and EC provinces respectively. The Zimbabwean isolates from the 1980s (all from Bulawayo) formed a single cluster, distinct from the single 1993 isolate (from Selous). The same grouping was demonstrated for these isolates (South African and Zimbabwean) irrespective of the gene used for phylogenetic analysis.
The Central African Republic and Nigeria isolates formed independent clusters irrespective of the gene used for analyses.
The P-distance comparison between different MOKV isolates was performed using the N gene nucleotide and amino acid sequences ( Table 3). Comparison of the nucleotide sequences indicated the difference between the MOKV isolates to be between 0 and 15% (85% nt seq identity), with the highest value (15%) being between U22843 (Zimbabwe) and RV4 (Nigeria). The highest nucleotide difference between South African isolates was 5.7% (226/08 and 229/97) while for Zimbabwean isolates it was 12.3% (between U22843 and RV1017/21846). Collectively, the nucleotide difference between South African isolates and Zimbabwean isolates was 14.4% (U22843 and 226/08). When comparing amino acid differences between MOKV isolates the same trend was observed, with MOKV isolates displaying an overall intragenotypic amino acid variation of 6.4%.
In order to investigate the evolutionary relationship of MOKV, a MCMC analysis was used to estimate the rate of nucleotide substitution calculated in substitutions/site/year as well as the time of the most recent common ancestor (MRCA) of MOKV (Table 4). When analyzing the N and G gene datasets, the mean nucleotide substitution rate was estimated to be 2.172610 24 (N) and 2.123610 24 (G). This is in agreement with previously published nucleotide substitution rate estimates (N gene: 1.  Fig. S2 and S3) with 95% HPD ranges much wider that the estimates for the N and G gene datasets. This is possibly due to the more variable nature of the M-and P-genes. Estimates based on the concatenated sequences (N, P, M and Ggene coding regions) yielded estimates within the ranges of the other genes (1157 years old, 95%HPD 413-2034 years) (Fig. 3).
The successful use of NGS on four of the isolates (RV4, RV1017, RV1021 and RV1035) enabled full genome consensus sequences to be obtained without the use of specific primers. In comparison to the work involved to obtain gene specific sequences for N, P, M and G on each of the MOKV isolates this approach was relatively simple and time efficient. Of the total number of reads obtained from the brain RNA preparations, between 0.25 and 1% were viral equating to between 294 and 1006 reads.

Discussion
This study was aimed at producing further insights into the phylogeny and diversity within a unique African lyssavirus species, MOKV. It was our objective to include all MOKV's encountered in history, but the existence or identity of several reported viruses and/or isolates could not be corroborated (Table 1). These included 3 virus isolates that were reported from Nigeria, the existence of which is now doubtful [6,8,9] and an isolate reported recently from South Africa [20]. It was also unfortunate that isolates from Cameroon and Ethiopia had to be excluded from this study, as these viruses, in our hands, were likely of the same original stock.
Nevertheless, a panel of 18 MOKV isolated over a period of nearly 50 years (Table 1) could be included and thus represents the most comprehensive phylogenetic analysis of full N, P, M and G genes of the MOKV species.
The monophyletic grouping of isolates from the KZN province of South Africa, isolated over a period of 28 years, indicates the continual presence and stability of the same viral lineage in this geographical domain. This KZN group could be distinguished from the other South African MOKV group, from the EC province, but the time point of divergence is rather recent with the MCRA for these two MOKV groups in the order of 150 years. The sequence diversity observed also seems to determine biological properties of the isolates.
Parallel experimental infection studies in mice showed that the pathogenicity of MOKV (isolates 12341, 252/97, Table 1) had been underestimated, although specific markers could not be determined [49]. Our analysis using a more comprehensive set of sequences corroborated these results ( Table 2), but the relevance need to be confirmed by further studies.
Previously, when lyssaviruses were still classified according to genotypes, it was proposed that a new genotype is defined by .80% nucleotide differences and .92-93% amino acid differences [50,51]. Although this classification is no longer used, the p-distance analysis in this study indicated the MOKV isolates fall within the defined ranges. The MOKV isolates also displayed less sequence divergence than that seen among LBV isolates (20.9% nucleotide sequence difference and 6.7% amino acid sequence difference between LBV isolates). However, the delineation between LBV and MOKV using maximum clade credibility appears not as robust as when using M-gene where LBV isolates rather cluster with MOKV (supplementary material, Fig. S3).
MRCA estimates of MOKV utilizing different genes ranged widely from 591 to 1883 years (HPD 214-4318 years). Although these dates for the MOKV MRCA correspond with the timeframe estimated by Bourhy et al. [47] for the emergence of RABV associated with non-flying mammals (749 years ago, 95% HPD 363-1215 years), it must be noted that the small sample size of MOKV could also influence the robustness of the results. Also, purifying selection can mask the ancient age of viruses that appear to have recent origins as shown for other RNA-viruses [52,53], thus making it difficult to objectively model the evolutionary history of MOKV.
Use of NGS technologies to obtain four of the MOKV genomes directly from RNA preparations without amplification using specific primers was highly successful. Unlike the approach taken for the other isolates where often primers had to be designed for each specific isolate due to the high divergence seen in the sequences between the MOKV viruses, for the NGS approach    294 and 1006 reads per position the inclusion of sequencing errors are highly unlikely. Twenty three MOKV isolations and one PCR-based identification have been made to date from six African countries in a period of more than 40 years. Since these African countries are from regions in Africa that are far apart, it is likely that MOKV is present in many others African countries and spread over vast portions of the continent. Moreover, over the past almost 20 years MOKV has only been isolated from South Africa (Fig. 1). Since it is known that MOKV is not only limited to South Africa, the lack of isolation from elsewhere is reflective of the non-existence of appropriate surveillance, including for rabies virus, across Africa. Limited diagnostic capabilities (e.g. typing or sequencing of rabies cases/specimens/isolates) across the continent, remains a key factor. Such enhanced surveillance would likely result in the discovery of more isolates and therefore, a higher diversity of MOKV and would thus improve our understanding of MOKV incidence and circulation. Since rabies vaccines do not offer protection against MOKV, a case can be made for the relative importance of a better understanding of the ecology of MOKV [32]. One of the limiting factors in studying MOKV is the fact that the reservoir species for this virus is not known. Although shrews have been implicated, it remains speculative. Lyssaviruses have a strong association with bats and it seems peculiar that MOKV may be the only exception in this regard -among all the other members of the genus. Indeed, virus neutralizing antibodies (VNA), neutralizing both LBV and MOKV have been detected in sera from frugivorous bats (Rousettus aegyptiacus and Eidolon helvum) [54,55,56]. However, belonging to the same phylogroup II, LBV and MOKV have been reported to cross react in serological assays [7,24,45,57]. Since there have been repeated reports of LBV isolations from fruit bats [58,59], the neutralizing activity of bat sera to MOKV apparently does not confirm the circulation of MOKV in those bat species. However, it cannot be excluded that other yet unidentified African bats may act as reservoir for MOKV. On the other hand, consistent encounters of MOKV in domestic cats and small mammalian species invite speculation along the lines of a prey-to-predator transmission pathway. For MOKV, the estimated MRCA from our study coincides and provides support for the timeframe suggested for the emergence of terrestrial rabies [47]. It is possible that MOKV remained stable in an extant African host environment, while RABV evolution was vastly accelerated given a plethora of host opportunities and global distribution.