African swine fever (ASF) is a highly lethal disease of domestic pigs caused by the only known DNA arbovirus. It was first described in Kenya in 1921 and since then many isolates have been collected worldwide. However, although several phylogenetic studies have been carried out to understand the relationships between the isolates, no molecular dating analyses have been achieved so far. In this paper, comprehensive phylogenetic reconstructions were made using newly generated, publicly available sequences of hundreds of ASFV isolates from the past 70 years. Analyses focused on B646L, CP204L, and E183L genes from 356, 251, and 123 isolates, respectively. Phylogenetic analyses were achieved using maximum likelihood and Bayesian coalescence methods. A new lineage-based nomenclature is proposed to designate 35 different clusters. In addition, dating of ASFV origin was carried out from the molecular data sets. To avoid bias, diversity due to positive selection or recombination events was neutralized. The molecular clock analyses revealed that ASFV strains currently circulating have evolved over 300 years, with a time to the most recent common ancestor (TMRCA) in the early 18th century.
Citation: Michaud V, Randriamparany T, Albina E (2013) Comprehensive Phylogenetic Reconstructions of African Swine Fever Virus: Proposal for a New Classification and Molecular Dating of the Virus. PLoS ONE 8(7): e69662. https://doi.org/10.1371/journal.pone.0069662
Editor: Maureen J. Donlin, Saint Louis University, United States of America
Received: January 31, 2013; Accepted: June 11, 2013; Published: July 25, 2013
Copyright: © 2013 Michaud et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was financially supported by the Wellcome Trust (N°210183. 183, AHDW/03/04)(http://www.wellcome.ac.uk/) and the European Community's Seventh Framework Programme (FP7/2007–2013) (http://cordis.europa.eu/fp7/home_en.html) under grant agreement KBBE- 211691- ASFRISK, partially funded by the European Community's Seventh Framework Programme (FP7/2012–2015) (http://cordis.europa.eu/projects/rcn/105070_en.html) under grant agreement KBBE - 311931 - ASForce, and also partially funded by the European Union through the Network of Excellence EPIZONE (number FOOD-CT-2006-016236)(http://www.epizone-eu.net/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
African swine fever (ASF) is an infectious and contagious hemorrhagic disease of domestic pigs . It is highly lethal, causing up to 100% mortality in naive animals, with devastating effects on pig production and animal trade, and major economic losses in affected countries . First described by Montgomery in 1921 in Kenya , ASF has then been observed in most sub-Saharan countries, where it has often become endemic . From Africa, it reached Europe, i.e. Portugal in 1957 and again in 1960, from where it colonized Spain, France, and Belgium. From there, the virus reached Latin America during the 70s–80s. In Europe, ASF remained endemic in the Iberian Peninsula up to the middle of the 90s and the disease is still present in Sardinia . Recently, it has been re-introduced on the borders of Europe, in Georgia in 2007  and then it extended to the Caucasus and Russia . No vaccine is available and disease control is based only on quarantine and animal slaughtering. In this context, its great ability to spread makes the ASF virus one of the most important infectious threats for the domestic pig industry worldwide. African swine fever virus (ASFV) is a large icosahedral and enveloped dsDNA virus; it is the only recognized DNA arbovirus and also the only member of the Asfarviridae family and Asfivirus genus . However, ASFV shares characteristics with the other members of the Nucleo-Cytoplasmic Large DNA virus family , suggesting that they all may have had a common ancestor , .
ASFV is believed to be an ancestral virus of soft tick (Ornithodoros genus)  infecting wild swine like warthogs (Phacochoerus fricanus), bushpigs (Potamochoerus porcus), and giant forest hogs (Hylochoerus meinertzhageni) with asymptomatic effects. The virus replicates in ticks and is then transmitted to wild swine during blood feeding; wildlife are considered as the natural reservoir of the virus. The virus can persist in ticks for years, even in quiescent ticks waiting for host feeding. The sylvatic cycle of ASFV established between ticks and wild suids can be maintained indefinitely. This cycle allows the maintenance of virus circulation and probably enables the persistence of ancient viruses and the emergence of new variants. At the laboratory level, virus variants were initially characterized by genome size and enzymatic restriction profiles . A high level of variability is observed mainly within the 35 kb at the 3′ end and the 15 kb at the 5′ end of the genome (170–190 kb) , , . These two regions contain the multigene families (MGF), which vary in number between isolates and enable virus variability by gene homologous recombination. Moreover, variability is also generated by a change in the number of amino-acid repeats in 14 proteins, including the envelope protein p54 encoded by the E183L gene . More recently, gene sequencing and analysis were introduced to increase differentiation between ASFV isolates collected worldwide. The first group  used phylogenetic reconstructions based on the partial sequence of B646L gene coding for the major viral protein (MCP) VP72. Their trees showed a very close relationship between West African, European, and South American isolates, all clustered in genotype 1. Despite more than 50 years of circulation in three continents, the limited accumulation of genetic changes has made it impossible to discriminate isolates within genotype 1. In contrast, eastern and southern African isolates are more diverse and segregate into 21 additional genotypes , , . This could be explained by the fact that these viruses are propagated within a sylvatic cycle, in contrast to viruses of genotype 1 that mainly replicate in domestic pigs, although they were secondarily detected in European soft ticks O. erraticus and wild boars in Spain and Portugal. This supports the assumption that the virus diversity may be generated during the sylvatic cycle of the virus . Other genes or genome sequences have been used successfully to discriminate ASFV isolates collected at a regional level. For instance, the B602L gene from the central variable region of the genome (CVR, coding J9L protein), the CP204L gene (coding the phospho-protein P32), and the E183L gene (envelope protein p54) have been used to further split the local isolates , , .
The aim of this study was to reassess the phylogenetic reconstructions and nomenclature of ASFV by including recent sequences and to explore the evolution of the virus based on a comprehensive analysis of the available sequence data sets. Accordingly, three genes were targeted, all of them being the most sequenced and uploaded in public databases. The B646L, E183L, and CP204L genes belong to the most conserved central part of the genome and encode the structural virus proteins VP72 (capsid), p54 (membrane protein), and p32 (membrane protein), respectively. They are also known to generate antibodies in pig . The origin and the evolution of the virus were inferred from these three genes.
Materials and Methods
A large collection of ASFV isolates were included (Table S1). The majority of ASFV sequences used were downloaded from GenBank (http://www.ncbi.nlm.nih.gov) and a CISA-INIA web site data bank (http://wwwx.inia.es/cisa/asfv/). Additional sequences of Madagascar isolates were generated after virus isolation on pig alveolar macrophages from pigs sampled during outbreaks between 1998 and 2008. These sequences are interesting for the study of ASFV evolution since they are considered to have derived from a unique introduction of the virus in 1998: twenty-one samples were selected to cover the whole territory and the 1998–2008 period. PCRs were performed using the following primers: VP72-d (5′-GGCACAAGTTCGGACATGT-3′) and VP72-U (5′-GTACTGTAACGAAGCAGCACAG-3′) , E183L for p54 (5′-GGTTGGTTTTCAAATGTTGGCGAAGGTA-3′) and E183Lrev p54 (5′-CCATAAATTCTGTAATTTCATTGCGCCACAAC-3′), and p30/32-P1 (5′-TG CCAAGCATACATAAGTTG-3′) and p30/32-P2 (5′-ATTT TGCTGTTTATGAATCC-3′)  for the amplification of B646L, E183L, and CP204L genes, respectively. PCR products were cloned in E coli, and sequences from these clones were generated by a private company (Cogenics, France). Sites with mutations were particularly checked for sequencing errors: only bases confirmed by two-direction reading were retained as mutations. In all, the analyses were performed on 356 sequences (399 nt long), 251 sequences (480 nt long), and 123 sequences (543 nt long) for B646L, E183L, and CP204L genes, respectively.
Sequences were aligned by ClustalW with default parameters and then scrutinized and edited using Mega version 5 software . From the multiple sequence alignments, an index of substitution saturation to estimate the degree of sequence information was calculated using Dambe software . DNA polymorphism was also analyzed. The site diversity between two sequences (π) and the number of segregating sites (i.e. the number of sites where one or several substitutions occurred) were obtained by DnaSP version 5 software . In the segregating sites, the ratio of transitions and transversions was assessed. The average number of nucleotide differences (k) between two sequences was also determined. All this information was used to check the quality of the sequences.
Test for recombination.
The presence of sequence recombination events in the data set was assessed in the multiple alignments with RDP3 package version 3  using the default setting for all recombination tests applied on linear sequences (RDP , GENECONV , MAXCHI , BOOTSCAN/RESCAN , and SISCAN ).
Maximum likelihood reconstructions  generating trees that best fit the evolution of a set of sequences through a probabilistic model of evolution were done using TREEFINDER version March 2011 software . The evolution model was selected according to the Akaike information criterion (AIC) , the corrected AIC (AICc) , and Bayesian Information Criterion (BIC)  with a number of gamma rate categories fixed at 5. The consensus model given by the three information criteria or alternatively, the simplest model, was selected for the reconstruction. Thus, the B646L tree was constructed under HKY+Г5 model , . For E183L and CP204L, HKY+Г5 and HKY on the one hand and HKY+Г5 and TN+ Г5 models on the other hand were selected. The most complex model, GTR , was also systematically included and compared with the others. All the reconstructions were done on 1,000 replicates and bootstraps were approximated using the Expected-Likelihood Weights defined by Strimmer and Rambaut (2002)  applied on local rearrangements (LR-ELW) as implemented in TREEFINDER.
Bayesian inference phylogeny was performed using Monte Carlo Markov Chain (MCMC) implemented in MrBayes version 3.1 software , . According to the best fit models proposed by TREEFINDER, MrBayes was set with HKY+Г5, HKY and HKY+Г5, and HKY+ Г5 for B646L, E183L, and CP204L, respectively. The GTR model was also used for each gene. MCMC was run for a maximum of 10 million trees or alternatively when the run reached stationarity as measured by a standard deviation of split frequencies either becoming lower than 0.01 or fluctuating randomly above 0.01 for at least 500,000 generated trees. Consensus trees were generated after having discarded the first 25% of the MCMC burn-in phase.
Tree congruence with data sets was tested by submitting them to the statistical test ELW  implemented in TREEFINDER. The tree selected for each gene was the one with the highest ELW score.
Since ASFV is the only member of the Asfarviridae family, 37 outgroup viruses for tree rooting were selected in the closest related DNA virus families, the NCLDVs , , . Because of the high level of nucleotide divergence, multiple sequence alignments were done on the complete amino-acid sequences of the major capsid protein of both outgroup viruses and ASFV isolates (equivalent to B646L protein) using Mega5 software (see File S1). Tree reconstructions were performed on 1,000 replicates using maximum likelihood method set with WAG+G+I+F and WAG+G+I models using “all sites” and “complete deletion” options, respectively. The topology of the resulting rooted tree was subsequently applied for placing roots on the B646L, E183L, and CP204L trees.
Analysis of selection pressure.
Codons under positive selection pressure in DNA coding sequences may evolve faster than the natural evolutionary rate of the virus genome. To avoid bias in the molecular clocking analysis, the selection pressure acting on the targeted genes was assessed. The ratio of non-synonymous (dN) – synonymous substitution (dS) per site (dN/dS ratio) was calculated and the codons under positive selection pressure were identified by using Codeml software implemented in PAML 4 package.
Isolate genotyping was assessed by comparing the genetic distance between all B646L sequences. Average intra- and inter-branch distances were globally compared to determine the strength of cluster segregation. Additionally, a haplotype network of the isolates was constructed using TCS1.21 software to identify relationships between isolates potentially poorly represented by conventional phylogenetic tree reconstruction. Lastly, specific nucleotide signatures of the different ASFV clusters were searched using multiple sequence alignments containing only the 67 unique B646L sequences. The three approaches were finally combined to raise conclusions about ASFV genotyping.
Two methods were used in parallel and compared to determine the evolutionary rate and the time to the most recent common ancestor (TMRCA) of circulating ASFV isolates. The first was based on the maximum likelihood method Baseml implemented in PAML 4 package  and the second on the Bayesian MCMC implemented in BEAST package version 1.6.2 . Codons under positive selection (dN/dS >1) and recombined sequences were removed from the multiple sequence alignments to avoid bias in the substitution rate determination and consequently in the Tmrca estimation. The best fit tree generated in the phylogenetic reconstructions was used to perform Baseml implemented in PAML 4 package, using as evolution model HKY+Г5 for B646L, CP204L, and E183L genes. Strict and relaxed molecular clock hypotheses  were used to generate dated trees for all genes. These two trees were individually compared with the tree generated without a clock constraint to accept or reject the molecular clock hypothesis. A likelihood ratio test (LRT) and a χ2 comparison were performed to support this analysis. For the relaxed molecular clock, branches delineating the different genotypes were individually relaxed.
All analyses performed with BEAST package were done under an uncorrelated lognormal relaxed clock model. Considering that at least 20% of our sequences were from isolates persisting in wildlife, a constant population size prior was selected. The initial value and the range of substitution rates were estimated from preliminary analyses and entered into the model of evolution. For each gene, analyses of two independent runs of 100 million steps were performed with 1/10,000 trees sampled. MCMC samples were examined using Tracer version 1.4 ; the first 25% of samples in the chain were discarded as burn-in phase. Tree consensus was generated using the maximum clade credibility (MCC) tree using Tree Annotator version 1.4.7 . Only posterior probabilities higher than 0.90 are indicated.
All trees were represented and edited in Fig Tree version 1.3.1 developed by Andrew Rambaut (http://tree.bio.ed.ac.uk/software/figtree/).
Comprehensive Phylogenetic Inference of ASFV Depicts 4 Major Lineages
Before phylogenetic inference, data sets and multiple sequence alignments were thoroughly examined to eliminate misalignments and ensure correct framing of coding sequences. All gaps were considered as missing information to avoid artificial nucleotide divergence. None of the different methods used in RDP3 package identified recombination events in B646L and CP204L sequences. In contrast, several recombination events were detected among E183L sequences. A total of 17 isolates were subsequently removed from the E183L multiple sequence alignments: 16 were Italian isolates (24/Or/04, 26/Ss/04, 30/Ol/04, 48/Ss/08, 5/Ca/02, 04/Ol/02, 3/Og/98, 1/Nu/97, 46/Ca/08, 25/Nu/04, 43/Og/07, 42/Og/0, 22/Nu/04, 23/Or/04, 41/Og/07 and 36/Ss/05)  and one was a South African isolate (RSA/85/1). The recombination events were all identical for Italian isolates (Figure 1). There were no saturated codons in our alignments (DAMBE, pvalue<<0.03), thus indicating the genetic information in the data sets was suitable for phylogenetic studies.
16 Italian isolates and 1 South African isolate were detected to be recombinant. Italian isolates are linked and because recombination events take place in the same region of the sequence, these isolates have probably emerged from a common ancestor.
To check the nucleotide composition of the alignments, statistical tests were performed using DnaSP software. The tests gave the number of nucleotide substitutions, the average diversity per site between two sequences (π), and the average nucleotide difference between two sequences (k). The diversity of B646L and CP204L was approximately half that of E183L (Table 1). In addition, E183L showed a clear bias in non-synonymous mutations. Based on the observed nucleotide substitutions, the minimum and the maximum evolutionary rates were also calculated from each multiple alignment (Table 1). We determined the dN/dS of each gene (Table 1) and the amino acids under positive selection in the alignments. This led to removing 6 nt (2 aa: His4 and Thr28) from B646L alignments, 9 nt (3 aa: Glu31, Pro123 and Leu176)) from CP204L alignments, and 27 nt (9 aa: Tyr10, Thr23, Asp100, Thr104, Ser122, Pro140, Val142, Glu143 and Ser149) from E183L alignments for subsequent molecular dating analyses.
The outgroup-rooted trees constructed from the multiple sequence alignments of the major capsid protein amino acid sequences of 30 ASFVs and with 37 out-group viruses from the NCLDV family showed that the common ancestor of all these viruses connects the ASFV group within eastern African isolates, more precisely between the genotype VIII, IX and X on the one hand and genotypes I and the other genotypes on the other hand (Figure 2). Accordingly, the root on all subsequent trees was placed in this position. This reconstruction also shows that the Asfarviridae family is rather divergent from the other NCLDV families.
The tree was constructed under a WAG+G+I+F model and maximum likelihood method with 1,000 bootstrap resampling. Numbers indicate the statistical value (Expected-Likelihood Weight) of internal nodes, given in percentages (only numbers above 70% are indicated). The outgroup connects ASFV group by the branch from genotypes VIII (MwLil20/1), IX (UgH03), and X (Kenya1950, kn66 and Uganda) to other genotypes.
Phylogenetic trees constructed with B646L sequences showed four major lineages (L) (Figure 3): L1 includes the previously described genotypes I, II, XVII, and XVIII, and L2, genotypes III, IV, V, VI, VII, XIX, XX, XXI, and XXII and an ungenotyped isolate (Cro3.5) . L3 includes genotypes VIII, XI, XII, XIII, XV, and XVI and one isolate TAN/08/MAZIMBU, previously included within genotype XV . L4 gathers genotypes IX and X. Interestingly, the NYA/1/2 isolate ascribed to genotype XIV is the only isolate that does not segregate within one of the four lineages. However, the bootstrap value of its branch is <70%, thus rendering difficult any conclusion about this isolate. Further clustering of the isolates within these four lineages becomes tricky because of the presence of long branches and multifurcation for some isolate groups or sub-lineages. The TCS network analysis showed that conventional phylogenetic reconstruction based on bifurcations may fail to explain the complex relationships between some isolates (Figure 4). The TCS network confirms the existence of the four lineages that include the same isolates as in bifurcated reconstructions. However, the TCS network seems to better explain the relationships of isolates within a given genotype (e.g. genotype I or X) or between distinct genotypes (e.g. between genotypes III, IV, XIX, XX, and XXI, or between genotypes IX and X). In these cases the pattern of isolate relationships is not strictly bifurcative. Three ways exist between genotype XIX and genotype XX: through genotypes III or IV and/or XXI, which represent internal nodes of the tree, and two ways between genotypes IX and X. Within genotype X, several isolates are internal nodes of the tree, meaning that an isolate can have more than one ancestor, which is inconsistent with bifurcative relationships between isolates. In attempts to refine the clusterization of ASFV isolates, the multiple sequence alignments containing 67 unique B646L sequences were searched for specific molecular signatures (Figure 5). Lineage 1 is characterized by 2 nt, and L2, L3, and L4 by 4, 6, and 12 nt, respectively. Genotype XIV, which is not included in one of the four lineages, is characterized by 8 nt (G88, G93, G162, T214, C240, T258, T333, and T348). However, this is the only virus generating this branch, which in addition is not supported by a high bootstrap value (<70%). Therefore, it cannot yet be considered as a fifth lineage. Lineages can be subsequently sub-divided into sub-lineages: 4 for lineage 1, 3 for lineage 2, 7 for lineage 3, and 2 for lineage 4 (Figure 5). Further sub-divisions can be drawn from the molecular signatures (Figure 5) and all are supported by the evolutionary distance matrix, except for some sub-lineages within L1-1, L1-2, L1-3, L2-2, L2-3, and L4-2-2 (Table 2). The average evolutionary distances inside and between all sub-lineages were 0.0023 and 0.055, respectively. L4 is the most complex lineage, composed of isolates from countries of the Great Lakes Region in Africa (Tanzania, Uganda, Burundi, and Kenya) and divided into several sub-lineages. Sub-lineage L4.2 (including isolates belonging to former genotype X) is the most diverse, with isolates clustering into seven sub-lineages (from L4-2-1 to L4-2-2-2-3). This new clusterization into lineages almost perfectly overlays the previous genotype discrimination, with the exception of Cro3.5 isolate, which forms a new cluster within L2 (sub-lineage L2-3-4) and TAN/O8/MAZIMBU isolate, which splits from genotype XV to form a new sub-lineage of L3 (L3-7).
The tree was constructed under HKY85+ evolutionary model with 1,000 bootstrap resampling. Numbers indicate the statistical value (Expected-Likelihood Weight) of internal nodes, given in percentages (only numbers higher than 70 are indicated). Lineages were collapsed for improved tree visibility. The tree shows four main lineages (L1 to L4).
The network shows the same four main lineages that were observed in the bifurcative phylogenetic tree constructed in maximum likelihood under the HKY+ model, but clearly demonstrates that relationships between some ASFV isolates are too complex to be resolved by only bifurcations.
Corresponding genotypes are indicated in the right column. Non synonymous substitutions are labeled with “*”. NA: non assigned.
The trees generated with CP204L and E183L genes (data not shown) confirmed the existence of four lineages including the same genotypes. However, the E183L gene tree shows some differences in the clustering: SPEC/205 belonged to L1.1.1 lineage with B646L while it moves to L126.96.36.199 (genotype XI) with E183L. NYA/1/2, the sole member of former genotype XIV in B646L classification and which segregated between lineages L1, L2, and L3, is placed within lineage L3 in E183L classification. Whether these modifications may be ascribed to inter-gene recombination events remains unclear.
Molecular Dating Leads to a most Recent Common Ancestor of about 300 Years
E183L gene was removed from molecular dating analyses because of the detection of several recombination events and a non-synonymous bias in the gene alignment both due to a strong positive selection of the immune system on this gene. The strict molecular clock hypothesis, meaning an equal substitution rate for every nucleotide site along the DNA sequences, was rejected for the other two genes by the maximum likelihood analysis performed by Baseml in PAML software suite. In PAML, the branches were individually relaxed in the tree submitted to the analysis. Several trees with different numbers of relaxed branches were tested. The resulting TMRCAs for B646L and CP204L genes were highly variable: from 1597 BC to 700 AD or even undetermined date (because of a tree likelihood value of zero at the beginning of the analysis). This high level of heterogeneity in the TMRCA using maximum likelihood method led us to select Bayesian approach in the BEAST package. The Bayesian MCMC inference of the two data sets performed with BEAST package showed a satisfactory convergence in the posterior statistic estimates of the substitution rate. Preliminary analyses were used to set the initial value of µ, the parameter of substitution/site/year (data not shown). Accordingly, the prior distribution of this parameter was set from 0.1× µ to 1× µ. Thus, calibrations of molecular clocks were set at 5.3×10−3 substitution/site/year [5.3×10−4–1.4×10−1] for B646L gene and 5.36×10−3 [5.36×10−4–1.99×10−1] for CP204L gene. With these priors, the mean estimates of substitution rates for each gene were finally calculated by BEAST and ranged from 6.6×10−4 (CP204L) to 6.9×10−4 (B646L) subst/site/year (Table 3). These results are robust in terms of clock model, rate distribution, and population size parameters. The dated trees generated four lineages as previously described (Figure 6) and, again, the same isolates were found within these lineages. The four lineages were organized differently for the two genes: for B646L CP204L gene L1 and L2 were on the one hand and L3 and L4 on the other hand and in contrast, CP204L gene tree rendered different connections: L1, L2, and L4 together and L3 on the other hand. In both cases, the oldest lineage (TMRCA = 111 years) was L4, which gathers isolates from eastern Africa, the presumed birthplace of ASFV. It was followed by L1 (104 years), L2 (74 years), and L3 (47 years).
Lineages and corresponding genotypes are indicated. The tree was constructed by Beast software and the MCMC were run 108 times. Time for all isolatesTMRCA is 1712. TMRCAs of lineage L1-1 (genotype I) and lineage L1-2 (genotype II) are 1943 and 1990, respectively.
Because the localization of the major capsid protein VP72 in the virus core prevents exposure to circulating neutralizing antibodies, the corresponding B646L gene is not expected to be submitted to immune system pressure. Accordingly, only two amino-acid positions were detected as being under positive selection, suggesting no real impact on the evolutionary force. Therefore, the rate of substitutions of VP72 probably bears the information needed to estimate natural virus evolution. The VP72 homologues of closely related virus families have already been used in evolutionary studies  and for a decade in ASFV phylogenetic reconstructions. In contrast, P54 is an envelope protein and the pressure of the immune system on E183L evolution is revealed by nine amino-acid positions placed under positive selection, a strong non-synonymous bias and recombination events within the gene sequence. P32 is also an envelope protein but is involved in translation of viral genes by its interactions with hnRNP cellular protein . In this context, mutations may be detrimental and, thus, the gene may be submitted to purifying selection, as corroborated by the detection of only three amino-acid positions under positive selection.
The evolution of ASFV mapped through partial genomic sequences and phylogenetic reconstructions shows a certain degree of complexity that may not be well represented by bifurcative methods. However, both bifurcative and network analyses in this study clearly provided clear clusterization into four major lineages (L1 to L4) while only three have been described so far . Within these lineages, molecular signatures of the twenty-two already described genotypes were established and two new sub-lineages can be proposed, that is, Cro3.5 isolate, and TAN/08/MAZIMBU previously ascribed to genotype XV. Molecular signatures do not rely on the same number of substitutions and do not have an equal weight. For instance, L1 is characterized by 2 specific nucleic acid positions and 4 synonymous substitutions while L4 is defined by 12 sites and 13 nucleotide substitutions of which 3 are not synonymous. Within the L1 lineage, genotype I, which is the most represented in terms of sequences (Europe, West Africa, Caribbean and South America), is characterized by only one synonymous substitution (A216). This mutation, however, leads to an increase in ASFV codon preference for alanine (GCG to GCA) (http://www.kazusa.or.jp/codon/), which has surely helped to fix the substitution in the lineage for almost 60 years in three continents. Besides the molecular signature, the distance matrix also supports our proposal for new ASFV classification, which includes the previous genotype subdivision and additional sub-clustering.
ASFV shows a high evolutionary rate relative to that of other DNA viruses . Consequently, this high substitution rate led to very recent TMRCAs: the most common ancestor of ASFV strains currently circulating emerged in around three centuries, in 1700. It is commonly agreed that ASFV is native of East Africa as the disease was first described in Kenya in 1921 after a first outbreak in 1903. Then, during decades ASFV showed a great ability to spread worldwide following major trade routes. In the wild, the virus is thought to be originally a virus of tick  as it infects argasids ticks of the Ornithodoros genus. Ornithodoros, which infest warthogs’ burrows, are endophile ticks, meaning that they need regular temperature and hygrometry. They also are photophobic so they do not spread out over long distances. ASFV is transmitted horizontally and vertically between ticks ,  and between ticks and juvenile wild swine that stay in and close to their burrows. Under such circumstances, the virus is not supposed to spread much and its genetic drift over long periods may have resulted in isolated spots of diversity maintained by the sylvatic cycle with only few entries of new strains. In contrast, the domestic pig cycle is short with a dead-end disease essentially transmitted by contacts with pigs or pig meat and rarely by tick bites. Accordingly, the phylogenetic trees constructed in this study showed higher diversity within lineages of eastern and southern African isolates submitted to a sylvatic cycle than in lineages of domestic pigs from other regions. New variants are not easy to characterize because of the lack of sequence data from their parent lineage. For example, TAN/08/Mazimbu isolate collected in Tanzania in 2008 and originally placed in genotype XV  constitutes in this study a sub-lineage of L3. Thus, it should not be considered as a re-emergence of the TAN/01/1 isolate collected during an outbreak in Tanzania in 2001. Sixteen Italian isolates showed recombination events in the E183L gene and were subsequently removed from the corresponding reconstruction. This does not change the affiliation of these isolates to L1, as demonstrated by B646L and CP204L reconstructions (not shown). However, since all these isolates are linked together and show the same recombination events, assuming they all have emerged from a common recombined ancestor, the possibility that they will form a new sub-lineage within L1 has to be considered.
Two different genes and two methods were used to consolidate TMRCA estimation. Maximum likelihood method using PAML package showed that a strict molecular clock could not be validated for our set of genes. However, it did not provide consistent results when using a relaxed clock, with TMRCAs from −12000 to 1500. In contrast, the Bayesian approach generated consistent results, B646L and CP204L analyses dating a TMRCA around 1700 AD with a rate of subst/site/year estimated to be around 6.7×10−4. As illustrated by the E183L gene analysis, the role of the immune system on sequence variability may influence the sequence evolution of some ASFV genes which may consequently render a biased TMRCA (data not shown for the E183L gene in this paper). Therefore, the natural evolution of the virus may be well represented by B646L and CP204L genes in which neither recombination events nor non-synonymous bias or too many codons under positive selection were detected. The TMRCA scale going back to 1700 AD for all ASFV isolates can be considered with confidence since within this scale, the TMRCA of lineage L1-1 and L1-2 were 1943/1955 (for B646L/CP204L genes) and 1990 (for both genes), respectively. L1-1 is supposed to have emerged in the late 1950s  and L1-2 includes mainly isolates from Madagascar that were first introduced in 1998 . The substitution rates determined in this study were much higher than expected relative to other large dsDNA viruses like gamma-herpes viruses of vertebrate (10−9 subs/site/year) or even small dsDNA viruses like the John Cunningham polyomavirus (10−7 subs/site/year) . With a substitution rate between 10−4 and 10−5, ASFV approaches RNA viruses that usually have 10−2 to 10−5 subs/site/year .
Like many other large dsDNA viruses , ASFV may have co-evolved with its host. This means a long and ancient history of the virus in the wild. A high substitution rate combined with recent TMRCAs is not consistent with ancient co-evolution of viruses and their hosts, which in contrast should lead to a low rate of substitution . However, for a virus that replicates at a high level in its host, a low rate of subst/site/replication can still lead to an increased accumulation of diversity, which in turns generates high rates of subst/site/year . This has been described for highly contagious viruses that induce acute forms of infection and show a higher observed rate of subst/site/year . In contrast, an asymptomatic infection of the host may not allow an exponential replication rate. ASFV presents these two characteristics, being asymptomatic in natural African wild swine and soft ticks and highly contagious and lethal in domestic pigs. Consequently, a stochastic event may have occurred around 300 years from now that would explain the emergence of an ancestor common to all known ASFV isolated so far in domestic and wild pigs. Our assumption is based on the introduction of domestic pigs in Africa. Domestic pigs have Eurasian and North African ancestral wild boar origins . Even though Plug (2001) , claimed pigs were introduced in South Africa between the 3rd and 7th centuries, Swart (2010)  believes domestic pigs were not present in eastern and southern African livestock because of the nomadic lifestyle of pastoralists at this time. Domestic pigs may have been brought first by the Chinese around 600 years ago  then by the Portuguese 300 to 400 years ago , both during their exploration and conquest period for trade opportunities. The assumption of pig introduction from Europe and the Far East was confirmed by phylogenetic analysis revealing contributions of both origins in the genetic pattern of local African pigs . Following the circumnavigation of Africa by European nations during 15th - 17th centuries, pig breed types were introduced during 16th and 17th centuries , mainly by the Portuguese to the East Africa coast via Goa. Pig breeding diffused then slowly northward from Mozambique . The Portuguese did not colonize Kenya for settlement but as a step to India and definitely left the country in 1720 after being defeated by the Arabs in 1698. Despite Arab colonization and the pig-eating taboo, domestic pigs were eaten by ethnic groups like the Waata in southern Kenya since the 16th century and called Walyankuru: “those who eat pig” . This may have enabled the virus to spread silently among sensitive pig species. Kenya was then colonized by the British. At the end of the 19th century, the extensive pig industry in the native region of ASFV started after a massive loss of bovine cattle due to rinderpest outbreak. Pigs were massively imported for breeding by colonizers from Seychelles in 1904 and from England in 1905. Pig farming was free ranging at this time and the first outbreak of ASF was reported in 1907. Trade routes and virus resistance in the environment then enabled further spreading of ASFV.
List of ASFV and NCLDVs isolates and corresponding genes used in this study.
The authors wish to acknowledge la Direction de la Santé Animale et du Phytosanitaire du Ministère de l’Agriculture, de l’Elevage et de la Pêche of Madagascar for their permission to use Madagascar isolates in this study. François Roger and Renaud Lancelot are warmly thanked for help in Madagascar and Gaël Thébaud for scientific advices.
Conceived and designed the experiments: VM EA. Performed the experiments: VM. Analyzed the data: VM EA. Contributed reagents/materials/analysis tools: TR. Wrote the paper: VM EA.
- 1. Penrith ML, Vosloo W (2009) Review of African swine fever: transmission, spread and control. J S Afr Vet Assoc 80: 58–62.
- 2. Costard S, Wieland B, de Glanville W, Jori F, Rowlands R, et al. (2009) African swine fever: how can global spread be prevented? Philos Trans R Soc Lond B Biol Sci 364: 2683–2696.
- 3. Montgomery R (1921) On a form of swine fever occurring in British East Africa (Kenya colony). J Comp Pathol 34: 159, 191, 243–262.
- 4. Penrith ML, Thomson GR, Bastos ADS (2004) African swine fever. In Infectious diseases of livestock, vol 2 (eds Coetzer JAW, Tustin RC, editors.), 1088–1119 Oxford, UK: Oxford University Press.
- 5. Rowlands RJ, Michaud V, Heath L, Hutchings G, Oura C, et al. (2008) African swine fever virus isolate, Georgia, 2007. Emerg Infect Dis 14: 1870–1874.
- 6. Gulenkin VM, Korennoy FI, Karaulov AK, Dudnikov SA (2011) Cartographical analysis of African swine fever outbreaks in the territory of the Russian Federation and computer modeling of the basic reproduction ratio. Prev Vet Med 102: 167–174.
- 7. Dixon LK, Escriban JM, Martins C, Rock DL, Salas ML, et al.. (2005) Asfarviridae. In: Fauquet, CM.M, Mayo, M.A, Maniloff, J, Deselberger, U, Ball, L.A., editors. Virus Taxonomy, VIIIth report of the ICTV. London (UK): Elsevier/Academic Press: 135–143.
- 8. Ogata H, Toyoda K, Tomaru Y, Nakayama N, Shirai Y, et al. (2009) Remarkable sequence similarity between the dinoflagellate-infecting marine girus and the terrestrial pathogen African swine fever virus. Virol J 6: 178.
- 9. Iyer LA, Balaji S, Koonin EV, Aravind L (2006) Evolutionary genomics of nucleo-cytoplasmic large DNA viruses. Virus Research 117: 156–184.
- 10. Iyer LM, Aravind L, Koonin EV (2001) Common origin of four diverse families of large eukaryotic DNA viruses. J Virol 75: 11720–11734.
- 11. Plowright W (1977) Vector transmission of African swine fever virus. In: Seminar on Hog cholera, classical swine fever and African swine fever, 575–587. Eur 5904EN, commission of the European communities.
- 12. Blasco R, Aguero M, Almendral JM, Vinuela E (1989a) Variable and constant regions in African swine fever virus DNA. Virology 168: 330–338.
- 13. Wesley RD, Tuthil AE (1984) Genome relatedness among African swine fever virus filed isolates by restriction endonuclease analysis. Prev Vet Med 2: 53–62.
- 14. Blasco R, de la Vega I, Almazan F, Aguero M, Vinuela E (1989b) Genetic variation of African swine fever virus: variable regions near the ends of the viral DNA. Virology 173: 251–257.
- 15. Sun H, Jacobs SC, Smith GL, Dixon LK, Parkhouse RM (1995) African swine fever virus gene j13L encodes a 25–27 kDa virion protein with variable numbers of amino acid repeats. J Gen Virol 76 (Pt 5): 1117–1127.
- 16. Bastos AD, Penrith ML, Cruciere C, Edrich JL, Hutchings G, et al. (2003) Genotyping field strains of African swine fever virus by partial p72 gene characterisation. Arch Virol 148: 693–706.
- 17. Lubisi BA, Bastos AD, Dwarka RM, Vosloo W (2005) Molecular epidemiology of African swine fever in East Africa. Arch Virol 150(12): 2439–52.
- 18. Boshoff CI, Bastos AD, Gerber LJ, Vosloo W (2007) Genetic characterisation of African swine fever viruses from outbreaks in southern Africa (1973–1999). Vet Microbiol 121: 45–55.
- 19. Dixon LK, Wilkinson PJ (1988) Genetic diversity of African swine fever virus isolates from soft ticks (Ornithodoros moubata) inhabiting warthog burrows in Zambia. J Gen Virol 69 (Pt 12): 2981–2993.
- 20. Gallardo C, Mwaengo DM, Macharia JM, Arias M, Taracha EA, et al. (2009) Enhanced discrimination of African swine fever virus isolates through nucleotide sequencing of the p54, p72, and pB602L (CVR) genes. Virus Genes 38: 85–95.
- 21. Nix RJ, Gallardo C, Hutchings G, Blanco E, Dixon LK (2006) Molecular epidemiology of African swine fever virus studied by analysis of four variable genome regions. Arch Virol 151: 2475–2494.
- 22. Neilan JG, Zsak L, Lu Z, Burrage TG, Kutish GF, et al. (2004) Neutralizing antibodies to African swine fever virus proteins p30, p54, and p72 are not sufficient for antibody-mediated protection. Virology 319: 337–342.
- 23. Tamura KPD, Peterson N, Stecher G, Nei M, Kumar S (2011) MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Molecular Biology and Evolution 10: 2731–2739.
- 24. Xia X, Xie Z (2001) DAMBE: software package for data analysis in molecular biology and evolution. J Hered 92: 371–373.
- 25. Librado P, Rozas J (2009) DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25: 1451–1452.
- 26. Heath L, van der Walt E, Varsani A, Martin DP (2006) Recombination patterns in aphthoviruses mirror those found in other picornaviruses. J Virol 80: 11827–11832.
- 27. Martin D, Rybicki E (2000) RDP: detection of recombination amongst aligned sequences. Bioinformatics 16: 562–563.
- 28. Padidam M, Sawyer S, Fauquet CM (1999) Possible emergence of new geminiviruses by frequent recombination. Virology 265: 218–225.
- 29. Smith JM (1992) Analyzing the mosaic structure of genes. J Mol Evol 34: 126–129.
- 30. Martin DP, Posada D, Crandall KA, Williamson C (2005) A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints. AIDS Res Hum Retroviruses 21: 98–102.
- 31. Gibbs MJ, Armstrong JS, Gibbs AJ (2000) Sister-scanning: a Monte Carlo procedure for assessing signals in recombinant sequences. Bioinformatics 16: 573–582.
- 32. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17: 368–376.
- 33. Jobb G, von Haeseler A, Strimmer K (2004) TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics. BMC Evol Biol 4: 18.
- 34. Akaike H (1974) A new look at the statistical model identification. IEEE Transactions on Automatic Control 19: 716–723.
- 35. Sugiura N (1978) Further analysis of the data by Akaike's information criterion and the finite corrections. Communication in Statist Theor Meth 7: 13–26.
- 36. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–464.
- 37. Hasegawa M, Kishino H, Yano T (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22: 160–174.
- 38. Yang Z (1994) Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol 39: 306–314.
- 39. Rodriguez F, Oliver JL, Marin A, Medina JR (1990) The general stochastic model of nucleotide substitution. J Theor Biol 142: 485–501.
- 40. Strimmer K, Rambaut A (2002) Inferring confidence sets of possibly misspecified gene trees. Proc Biol Sci 269: 137–142.
- 41. Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17: 754–755.
- 42. Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572–1574.
- 43. Delaroque N, Wolf S, Muller DG, Knippers R (2000) Characterization and immunolocalization of major structural proteins in the brown algal virus EsV-1. Virology 269 (1): 148–155.
- 44. Garcel A, Crance JM, Drillien R, Garin D, Favier AL (2007) Genomic sequence of a clonal isolate of the vaccinia virus Lister strain comparison to other orthopoxviruses. J Gen Virol 88 (PT 7): 1906–1916.
- 45. Schnitzler P, Handermann M, Szepe O, Darai G (1991) The primary structure of the thymidine kinase gene of fish lymphocystis disease virus. Virology 182 (2): 835–840.
- 46. Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24: 1586–1591.
- 47. Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7: 214.
- 48. Zuckerkandl E, Pauling L (1965) Molecules as documents of evolutionary history. J Theor Biol 8: 357–366.
- 49. Rambaut A (2003) Tracer [computer program]. http://evolve.zoo.ox.ac.uk/software/.
- 50. Giammarioli M, Gallardo C, Oggiano A, Iscaro C, Nieto R, et al. (2011) Genetic characterisation of African swine fever viruses from recent and historical outbreaks in Sardinia (1978–2009). Virus Genes 42(3): 377–87.
- 51. Zsak L, Borca MV, Risatti GR, Zsak A, French RA, et al. (2005) Preclinical diagnosis of African swine fever in contact-exposed swine by a real-time PCR assay. J Clin Microbiol 43 (1): 112–119.
- 52. Misinzo G, Magambo J, Masambu J, Yongolo MG, Van Doorsselaere J, et al. (2010) Genetic characterization of African swine fever viruses from a 2008 outbreak in Tanzania. Transbound Emerg Dis 58: 86–92.
- 53. Tidona CA, Schnitzler P, Kehm R, Darai G (1998) Is the major capsid protein of iridoviruses a suitable target for the study of viral evolution? Virus Genes 16: 59–66.
- 54. Hernaez B, Escribano JM, Alonso C (2008) African swine fever virus protein p30 interaction with heterogeneous nuclear ribonucleoprotein K (hnRNP-K) during infection. FEBS Lett 582: 3275–3280.
- 55. Duffy S, Shackelton LA, Holmes EC (2008) Rates of evolutionary change in viruses: patterns and determinants. Nat Rev Genet 9: 267–276.
- 56. Hess WR, Endris RG, Lousa A, Caiado JM (1989) Clearance of African swine fever virus from infected tick (Acari) colonies. J Med Entomol 26: 314–317.
- 57. Plowright W, Perry CT, Greig A (1974) Sexual transmission of African swine fever virus in the tick, Ornithodoros moubata porcinus, Walton. Res Vet Sci 17: 106–113.
- 58. Gonzague M, Roger F, Bastos A, Burger C, Randriamparany T, et al. (2001) Isolation of a non-haemadsorbing, non-cytopathic strain of African swine fever virus in Madagascar. Epidemiol Infect 126: 453–459.
- 59. Hanada K, Suzuki Y, Gojobori T (2004) A large variation in the rates of synonymous substitution for RNA viruses and its relationship to a diversity of viral infection and transmission modes. Mol Biol Evol 21: 1074–1080.
- 60. Holmes EC (2004) The phylogeography of human viruses. Mol Ecol 13: 745–756.
- 61. Holmes EC, Drummond AJ (2007) The evolutionary genetics of viral emergence. Curr Top Microbiol Immunol 315: 51–66.
- 62. Hughes AL, Irausquin S, Friedman R (2009) The evolutionary biology of poxviruses. Infect Genet Evol 10: 50–59.
- 63. Firth C, Kitchen A, Shapiro B, Suchard MA, Holmes EC, et al. (2010) Using time-structured data to estimate evolutionary rates of double-stranded DNA viruses. Mol Biol Evol 27: 2038–2051.
- 64. Gifford-Gonzalez D, Hanotte O (2011) Domesticating Animals in Africa: Implications of Genetic and Archaeological Findings. J World Prehist 24: 1–23.
- 65. Plug l, Badenhorst S (2001) The distribution of mammals in Southern Africa over the past 30,000 years. Transvaal Museum Monograph. 13, South Africa.
- 66. Swart T, Kotze A, Olivier PAS, Grobler JP (2010) Microsatellite-based characterization of Southern African domestic pigs (Sus scrofa domestica). South African Journal of Animal Science 40: 121–132.
- 67. Levathes LE (1994) When China Ruled the Seas: The Treasure Fleet of the Dragon Throne, 1405–1433. New York: Oxford University Press.
- 68. Blench RM (1999) A history of pigs in Africa. In: Blench, R.M, Mac Donald, K., editors. Origins and development of African livestock: archaeology, genetics, linguistics and ethnography. Florence, K.Y.: Routledge Books: 335–367.
- 69. Ramirez O, Ojeda A, Tomas A, Tomàs A, Gallardo D, et al. (2009) Integrating Y-chromosome, mitochondrial, and autosomal data to analyze the origin of pig breeds. Mol Biol Evol 26: 2061–2072.
- 70. Kusimba CM, Kusimba SB (2000) Hinterlands and cities: Archaeological investigations of economy and trade in Tsavo, sout-eastern Kenya. Department of Anthropology, The field Museum, Chicago, Illinois, 606005 54: 13–24.