Time-Calibrated Phylogenomics of the Classical Swine Fever Viruses: Genome-Wide Bayesian Coalescent Approach

The phylogeny of classical swine fever virus (CSFV), the causative agent of classical swine fever (CSF), has been investigated extensively. However, no evolutionary research has been performed using the whole CSFV genome. In this study, we used 37 published genome sequences to investigate the time-calibrated phylogenomics of CSFV. In phylogenomic trees based on Bayesian inference (BI) and Maximum likelihood (ML), the 37 isolates were categorized into five genetic types (1.1, 1.2, 2.1, 2.3, and 3.4). Subgenotype 1.1 is divided into 3 groups and 1 unclassified isolate, 2.1 into 4 groups, 2.3 into 2 groups and 1 unclassified isolate, and subgenotype 1.2 and 3.4 consisted of one isolate each. We did not observe an apparent temporal or geographical relationship between isolates. Of the 14 genomic regions, NS4B showed the most powerful phylogenetic signal. Results of this evolutionary study using Bayesian coalescent approach indicate that CSFV has evolved at a rate of 13×.010-4 substitutions per site per year. The most recent common ancestor of CSFV appeared 2770.2 years ago, which was about 8000 years after pig domestication. The effective population size of CSFV underwent a slow increase until the 1950s, after which it has remained constant.


Introduction
Classical swine fever (CSF), also known as hog cholera is a highly contagious viral disease of domestic pig and wild boar that causes watery diarrhea and weakness. The high mortality rate of CSF leads to significant economic losses in the global swine industry [1]. From the 1990s to the early 2000s, sporadic outbreaks of CSF in European swine industries were reported [2][3][4]; for instance, 117, 48, and 429 farms were contaminated in Germany, Belgium, and the Netherlands respectively. In the case of the Netherlands, economic losses were calculated to be almost 2.3 billion US dollars [5,6]. Many countries are currently considered to be CSF free or low risk by the United States Department of Agriculture (USDA), including Australia, Canada, and several European countries. Given the potential for devastating economic impact of a CSF outbreak on the swine industry, circumspect governmental regulations are in place to prevent CSF exposure in large, dense populations of pigs. Nevertheless, CSF outbreaks in small populations of wild boar are still observed in several regions like India [7].
As populations in the swine industry become denser and larger, more government attention to CSF is needed. Additionally, countries that are not CSF free, such as India or China, account for a large portion of the global swine industry [8], which means that global attention to the CSF is necessary as well.
Classical swine fever virus (CSFV), the causative agent of CSF, is a member of the genus Pestivirus within the family Flaviviridae [1,9]. In viral epidemiology, mortality rate is defined as the number of deaths over the number of individuals infected during a specific period. [10]. While the mortality rate of CSF is high, the severity differs by case, especially depending on the age of the animal and virulence of the virus. While symptoms in older animal tend to be milder, the mortality rate of young pigs is almost 90 percent [1,9]. CSF can often be confused with American swine fever (ASF), which produces clinical symptoms similar in pigs to that of CSF, including haemorrhagic fever. However, despite these similarities, CSF virus is a small singlestranded RNA virus, while ASF virus is a large double-stranded DNA virus of genus Asfivirus in the family Asfarviridae [11]. While it is difficult to diagnose these viruses from clinical signs and lesions alone, confirmatory experiments like RT-PCR is necessary [12]. CSFV is also structurally close to the bovine viral diarrhea (BVD) virus and the border disease (BD) virus, which also belong to the genus Pestivirus [9]. Although these viruses are capable of infecting pigs, they do not have the ability to spread out without their original hosts and can easily be genetically differentiated from CSFV [13].
To fully comprehend the phylogeny and evolutionary dynamics of CSFV, as mentioned in previous phylogenomic studies [35], it is necessary to conduct a genomic scale analysis. Here, we performed genome wide study on the time-calibrated phylogenomics of the virus. We analyzed 37 available public genome sequences that sampled during a time period of 68 years (1945-2012). The objectives of this study are (1) to analyze the characteristics of CSFV genome sequences; (2) to reconstruct the genome wide phylogeny of the global CSFVs using two different analyses methods, Bayesian inference (BI) and maximum likelihood (ML); and (3) to elucidate the evolutionary mechanisms such as selection pressure, substitution rate, divergence times, and effective population size changes using Bayesian coalescent approach.
To evaluate selective pressure acting on CSFVs, the relative rates of nonsynonymous and synonymous substitution (ω = dN/dS) across coding region of the viral genome were also estimated using ClustalX 1.81 [40] PAL2NAL [41], and CodeML of PAML 4.7 package [42]. A dN/dS ratio of < 1 indicated purifying selection, dN/dS = 1 suggested an absence of selection (i.e., neutral evolution), and dN/dS > 1 indicated positive selection.

Phylogenomic tree reconstruction
Phylogenomic trees were reconstructed using two different analytical methods-BI and ML. The best-fit model for the CSFV whole genome was determined using Akaike's information criterion (AIC) within Modeltest 3.7 [39]. BI analysis using MrBayes 3.1.2 [43] was performed with following options: nst = 6, rates = invgamma, number of generation = 20,000,000, and burn-in = 20,000. Bayesian posterior probability (BPP) values were shown on the BI tree [44]. ML analysis using Phyml 3.0 [45] was also conducted with the following parameters: model of nucleotide substitution = GTR, replicates = 500, pinvar = estimated, and number of substitution rate categories = 6. All trees were visualized in Figtree 1.4 [46].
In order to screen for congruent tree topologies with genome tree topologies, all 14 genomic regions were analyzed in the same methods as genome data. The best-fit models for each region used in analysis are summarized in Table 2.
Estimation of substitution rate, divergence times, and population size changes Using BEAST 1.7.4 [47], the mean rate of nucleotide substitution, time of the most recent common ancestor (tMRCA), and change in effective population size of the CSFV were estimated with the result of the model test. We combined three molecular clock models (strict, relaxed uncorrelated exponential, and relaxed uncorrelated log-normal) with five demographic models (constant size, exponential growth, expansion growth, logistic growth, and Bayesian skyline) to make 15 datasets for simulation. Next, each dataset was simulated with the following options: generation = 400,000,000, burn-in of 10%, and ESSs > 100. By comparing the highest Bayesian factors (log 10 Bayesian factor > 2) which were based on the relative marginal likelihood of 15 models, the relaxed uncorrelated exponential clock and expansion growth population model was selected as the best-fit evolutionary model. The resulting convergence was analyzed using Tracer 1.5 [48]. Trees were summarized as maximum clade credibility (MCC) tree using the TreeAnnotator 1.7.4 [49] and visualized using Figtree 1.4 [46]. For each tree node, estimated divergence times and 95% highest posterior density (HPD) intervals, which summarize the statistical uncertainties, were indicated. The change of effective population size was plotted using Bayesian skyline plot (BSP) analyses [50].

Sequence and selection pressure analyses
Features of the entire genome and 14 individual regions of the 37 CSFVs are summarized in Table 3. The whole genome alignment (including insertions) was 12,301 bps in length, and revealed relatively low similarities; 7,645 (62.1%) of the nucleotide sites were conserved. The differences in nucleotide and amino acid sequence at each site of the CSFV genome alignment are shown in Fig. 1. Both nucleotide and amino acid variations were evenly distributed throughout the genomes, though higher amino acid similarities were observed in three regions (NS3, NS4A, and NS4B). Pairwise comparisons also revealed that the average identities among the The mean ratio of nonsynonymous/synonymous substitution (dN/dS) derived from entire data sets was calculated and indicated that purifying selection acted on the CSFV genome sequences (Table 3, Fig. 2). The dN/dS value of entire coding sequences was 0.067 and all values for each component gene were lower than 1. Particularly, the highest dN/dS ratio was observed in N pro (0.137), while the lowest one was shown in NS3 (0.020).
Phylogenomic tree reconstruction 37 CSFV isolates were classified into one of five subgenotypes (1.1, 1.2, 2.1, 2.3, and 3.4). BI and ML methods presented similar configurations regarding the phylogenomics of CSFV, and also supported the topology of the maximum clade credibility (MCC) tree (Fig. 3). The phylogenomic tree revealed two sister-group relationships, subgenotype 1.1 + 1. To identify the most significant phylogenetic marker of CSFV among the 14 regions, phylogenetic trees for the 14 regions were reconstructed based on both BI and ML methods, and these individual trees were compared to the whole genome trees. NS4B gene trees were the most similar to the genomic tree, thus, NS4B gene was chosen as the most significant phylogenetic marker for CSFV (Fig. 4). The overall tree topologies of E2 gene, which was used as the   Time-Calibrated Phylogenomics of CSFV marker of CSFV phylogeny, were very different from those of genome tree (S1 and S2 Figs.); most groupings were disrupted in the E2 trees.

Discussion
Our research aimed to characterize the CSFV genome and elucidate its evolutionary features using genomic data. Our results revealed a high degree of genetic diversity between the 37 CSFV genome sequences. This feature confirmed the views of Domingo et al. [51] who proposed that RNA viruses have the high level of mutation rates of 10 -3 to 10 -5 nucleotide substitutions per site per replication cycle due to their inaccurate RNA replication. Of the 14 genomic regions, our analyses showed that E2 was significantly (p-value < 2.2 × 10 -16 ) variable, while 5'UTR was the most conserved. The E2 protein, a glycoprotein formed on the virus membrane, has been suggested as a virulence determinant [52,53]. In CSFV, it acts as an immunogenic protein which induces the neutralizing antibodies and protection reactions of the host [54]. 5'UTR of CSFV contains an internal ribosome entry site (IRES) where translation initiation Time-Calibrated Phylogenomics of CSFV occurs, which is indispensable for the virus [55,56]. Domains in IRES tend to be highly conserved between several viruses in genus Pestivirus, including CSFV [55,57]. These two regions (E2 and 5'UTR) have been considered as the most important targets for CSFV studies such as PCR detection and immunology as well as evolutionary phylogenetics. Because the sequence dissimilarity may affect PCR diagnosis and vaccination efficiency as well as investigations on immunology, epidemiology, and phylogenetic evolution, it is necessary to continuously monitor the genome sequence variability.
In agreement with the interpretations of Ji et al. [33], our results on the basis of complete coding genome sequences were also indicative of purifying selection acting on the CSF viruses; the mean ratio of nonsynonymous/synonymous substitution (ω = dN/dS) values for each genomic region were low in all cases (all, ω < 1). Of those, the N pro presented the highest dN/dS values, while the NS3 indicated the lowest ones.
Next, the phylogeny of CSFV was studied to provide further information regarding epidemiology and evolution. Despite of the importance of CSFV to the pig industry and recent rapid increase of CSFV data, most phylogenetic studies describing the virus have focused only on limited genome region sequences such as E2 [29,30,58,59], 5'UTR [25,26,29,30], and NS5B [31,60]. However, expanding information of genome-wide phylogeny has potential to improve our knowledge on its emergence and transmission. Our phylogenomic analysis of 37 CSFV genome sequences revealed that no apparent correlation between time and/or country and the evolution of the virus. Within the subgenotype 1.1, the isolates appeared in samples taken in four European countries (Denmark, France, Germany, and Italy) and two Asian countries (China and Japan) during 1945-2009. Subgenotype 2.1 viruses originated from six countries (South Korea, China, Taiwan, Germany, and Lithuania) during 1996-2012. Moreover, isolates of subgenotype 2.3 were found in five countries (France, Germany, Spain, Croatia, and Bulgaria) during 1980-2009. These configurations were also in concordance with the views of other authors [23,35] and may be largely due to their rapid spread via the frequent international trade in livestock. The mixed population structure can make vaccine development and local regulation more difficult. Thus, it is essential to continuously monitor the structural changes of the mixed population.
In order to trace the most appropriate phylogenetic marker, we compared 14 individual gene trees with complete genome trees. Although E2, NS5B, and 5'UTR are generally considered suitable for elucidating the CSFV phylogeny, our findings postulated that topologies of NS4B trees were the most similar to those of the complete genome trees, rather than E2, NS5B, and 5'UTR. Thus, we suggest that NS4B has the strongest signal to infer the genetic relationships of these viruses. Among three frequently used markers, topologies of E2 tree was more apparent than those of the other two regions. Given that research has shown that the complete E2 sequence is suitable for phylogenetic analysis of CSFV [61], comparison of topologies between NS4B and E2 full-length sequences can support the discriminatory power of NS4B as a phylogenetic marker (S1 and S2 Figs.). Both of those regions are regarded as significant determinant in viral activity. That is, E2 is related to viral entry to the target cell [62], E2 has been known as a virulence determinant of CSFV, and as the most immunogenic factor among the viral protein [63,64]. The nonstructural protein translated from NS4B was reported to contain Toll/Interleukin-1 receptor (TIR) domain in Brescia strain which is considerably virulent, and the viral replication of Brescia strain was shown to be decreased by mutation in that domain [65]. Additionally, after artificial re-injection of the GPE-vaccine, amino acid sequence changes in NS4B contributed to the recovery of virulence of the virus [66]. Thus, NS4B was considered to be essential in viral replication and to have interaction with immune system, and mutations in NS4B significantly affected the virulence of CSFV [65].
Finally, we attempted to elucidate the evolutionary mechanisms and features of CSFV including estimation of evolutionary rate, divergence times, and population size changes. Regarding the evolutionary dynamics of CSFV, there was only one previous study [34]. They utilized only three genes (5'UTR, N pro , and E2) sequences of 35 pestiviruses including six CSFVs, and reported that the most recent common ancestor of CSFV existed 1825 years ago; no analysis of substitution rate and population size changes was performed. In contrast to this small number of genes analyzed from a limited selection of representative viruses, we analyzed the complete genome sequences with an available year of isolation in order to co-estimate an overall substitution rate, population size changes as well as divergence times for CSFVs.
The mean evolutionary rate estimated in the present study was 1.03×10 -4 (95% HPD 2.03×-2.61×10 -4 ) substitutions/site/year; that calculation was within the range of 10 -2 to 10 -5 nucleotide substitutions/site/year for nearly all RNA viruses [67]. However, this value was lower than the results from previous studies based on E2 sequences; the mean evolutionary rates were calculated as 3.3×10 -3 and 2.41×10 -3 substitutions/site/year in two previous studies [18,62]. Because E2 was relatively variable and less conserved compared to the whole genome, the mean rate of substitution of the whole genome could be lower than that of E2. The fast evolutionary rates of RNA viruses including CSFV are affected by a combination of forces such as lack of proof-reading, small genome size, short generation times, and the extent of natural selection [68]. As a result, it is possible that RNA virus raises viral population adaptation, survival, and fitness, allowing them to rapidly spread to new hosts and novel environments [69]. The tMRCA of CSFV was 2770.2 (95% HPD 223.5-8611.6) years ago which was about 8000 years later than the domestication of wild boar [70,71].
In terms of the effective population size changes of CSFVs, our bayesian skyline plot (BSP) analysis depicted that a population increase until the middle of 1990s, after which population size appears to have evolved at a near constant rate population size until the present. The trend of effective population size can be attributed to the introduction of vaccine to prevent further global CSF outbreaks [72]. This plateau after 1950s also can be explained by huge slaughter of pigs during global outbreak of porcine reproductive and respiratory syndrome (PRRS) and swine influenza (SI) [73][74][75].
The present study is the first of its kind to use the complete CSFV genome to investigate the temporal dynamics of the virus. CSFV is still one of the most acute pathogens in the global swine industry. Accordingly, global strategies are essential for prevention and control of this virus. Results of the present study expand the limited information available on CSFV evolutionary dynamics, which may be crucial for the control of this virus, as well as improve our knowledge of its epidemiology and evolution.