Genetic diversity and evolution of Hantaan virus in China and its neighbors

Background Hantaan virus (HTNV; family Hantaviridae, order Bunyavirales) causes hemorrhagic fever with renal syndrome (HFRS), which has raised serious concerns in Eurasia, especially in China, Russia, and South Korea. Previous studies reported genetic diversity and phylogenetic features of HTNV in different parts of China, but the analyses from the holistic perspective are rare. Methodology and principal findings To better understand HTNV genetic diversity and gene evolution, we analyzed all available complete sequences derived from the small (S) and medium (M) segments with bioinformatic tools. Eleven phylogenetic groups were defined and showed geographic clustering; 42 significant amino acid variant sites were found, and 19 of them were located in immune epitopes; nine recombinant events and eight reassortments with highly divergent sequences were found and analyzed. We found that sequences from Guizhou showed high genetic divergence, contributing to multiple lineages of the phylogenetic tree and also to the recombination and reassortment events. Bayesian stochastic search variable selection analysis revealed that Heilongjiang, Shaanxi, and Guizhou played important roles in HTNV evolution and migration; the virus may originate from Zhejiang Province in the eastern part of China; and the virus population size expanded from the 1980s to 1990s. Conclusions/significance These findings revealed the original and evolutionary features of HTNV, which will help to illustrate hantavirus epidemic trends, thus aiding in disease control and prevention.


Introduction
Hantaviruses have gained worldwide attention as etiological agents of emerging zoonotic diseases-namely, hemorrhagic fever with renal syndrome (HFRS) in Eurasia and hantavirus cardiopulmonary syndrome in the Americas-with fatality rates ranging from <10% up to 60% [1]. Among all countries, China is the most seriously affected and accounts for over 90% of the total HFRS cases around the world [2,3]. It has been reported that the HFRS death rate in China was 2.89% during the years 1950-2014 [4]. Although a declining HFRS trend has been observed at a global scale in China, there still exist certain local regions that continue to display increasing HFRS trends [5]. However, the causative agent of the disease remained unknown until the early 1970s, when Lee and colleagues reported on Hantaan virus (HTNV) present in the lungs of its natural reservoir, the striped field mouse (Apodemus agrarius) [6].
HTNV, as one of the pathogenic hantaviruses of HFRS, is a member of genus Orthohantavirus, family Hantaviridae, in the order Bunyavirales. The virus genome consists of three separate segments of negative-stranded RNA, referred to as small (S), medium (M), and large (L) segments, which encode nucleocapsid protein (NP), two envelope glycoproteins (Gn and Gc), and viral RNA-dependent RNA polymerase, respectively [7]. Recent studies [8][9][10] reported genetic diversity and phylogenetic features of HTNV in different parts of China, but the analyses from the holistic perspective are rare. Also, the mechanisms underlying the emergence and evolution of HTNV are poorly understood. Understanding the phylogenetic factors contributing to HTNV transmission has important implications for our understanding of its epidemiology and provides insights about surveillance as well as outbreak predictions and preparation.
In this study, we focused on the genetic diversity and evolutionary history of HTNV. Whole-genome sequences were analyzed with bioinformatic tools. The results revealed the phylogenetic relationships among HTNV strains with both Bayesian and maximum-likelihood (ML) methods. To deduce geographic origins and migration patterns of HTNV epidemics, Bayesian evolutionary analysis sampling trees (BEAST) software was used in this study. Furthermore, the recombination and reassortment events were also detected.
viprbrc.org). Only one sequence was retained for the identity sequences with the same strain names. It should be mentioned that sequences from Shandong Province (accession no. KY639536-KY639711) were not defined as HTNV by both ViPR and Genebank but can be organized in H. orthohantavirus (see below). These sequences were still analyzed in this study. All the sequences were aligned using MAFFT version 7 [11] with default settings followed by manual refinement. The coding sequences were retained and used for the following analyses [12].

Recombination detection
As recombination seriously affects phylogenetic inference, the whole data set was tested for the presence of recombination signals using RDP4 [13]. RDP, GENECONV, Maximum Chisquared, Chimaera, Bootscan, Sister Scanning, and 3Seq were used only on events with P values < 0.01 that were confirmed by four or more methods, which were considered recombination events.

Phylogenetic analyses and amino acid substitution analyses
It is reported that recombination sequences will bias the shape of the inferred phylogenetic tree, the branch length, and reconstruction of ancestral sequences [13][14][15]. To reduce the potential bias, the recombination segments were first removed from the data sets. The phylogenetic relationships of the complete S and M gene sequences were estimated using a Bayesian Markov chain Monte Carlo (MCMC) method as implemented in MrBayes v3.2.2 [16] and an ML phylogenetic inference as implemented in IQ-TREE v1.6.8 [17]. Dobrava virus, Puumala virus (PUUV), Sin Nombre virus, and Thottapalayam virus were designated as the out-group.
For Bayesian MCMC analysis, the suitability of substitution models for our data sets were assessed using jModelTest v2.1.10 [18], which performed a statistical model selection procedure based on the Akaike information criterion. It identified the best-fitting substitution model GTR+Γ for both data sets of complete S and M gene sequences. The MCMC settings consisted of two simultaneous, independent runs with four chains each, which were run for 20 million generations and sampled every 200 generations with a 25% burn-in.
For the ML analysis, the best-fitting nucleotide substitution model that minimizes the Bayesian information criterion score was selected by ModelFinder [19], implemented in IQ-TREE. An ML tree was constructed using the best-fitting nucleotide substitution model, and statistical robustness of the branching order within the tree was assessed using ultrafast bootstrap support values [20] (1,000 replicates) and the SH-like approximate likelihood ratio test [21] (SH-aLRT, 5,000 replicates). The trees were visualized in FigTree v1.4.4 (http://tree. bio.ed.ac.uk/software/figtree/).
The Metadata-Driven Comparative Analysis Tool, supplied by ViPR, was used to analyze the variation of amino acid between the different phylogenetic groups, hosts, and sample collection years. The P value threshold was set to 0.05. If a specific amino acid exists only in one species or one group but not in other species or groups, this amino acid is considered as a "significant amino acid" marker (or synapomorphy).

Natural selection analysis
Natural selection analysis was assessed using the Datamonkey web-based facility [22]. Site-specific selection pressure was assessed using four varying methods, including single-likelihood ancestor counting (SLAC), fixed-effects likelihood (FEL), mixed-effects model evolution (MEME), and fast unconstrained Bayesian approximation (FUBAR) [23][24][25]. Significance levels were set to P < 0.05, and posterior probability (PP) > 0.9. The SLAC, FEL, and MEME methods were used to estimate the nonsynonymous to synonymous (dN/dS) ratio, which represents the differential effect of natural selection on these two types of mutations. The lower dN/dS values indicate stronger negative selection against amino acid change. To examine whether episodic diversifying selection had occurred on individual branches, we utilized the adaptive Branch-Site Random Effects Likelihood (aBSREL) [26] method, which is also available on the Datamonkey web server.

Coalescent and evolution analyses
Complete S gene sequences were used to deduce the evolutionary history of HTNV. To avoid potential biases due to sampling heterogeneity, the data set was reduced using CD-HIT [27] by clustering together sequences with a nucleotide sequence identity threshold of 99%. Temporal evolutionary signal in the ML tree was evaluated by TempEst v1.5.1 [28], which plots sample collection dates against root-to-tip genetic distances obtained from the ML phylogeny tree.
The most recent common ancestor (tMRCA), substitution rates, and evolutionary history were estimated using a Bayesian serial coalescent approach implemented in BEAST v1. 10.4 [29]. jModelTest v2.1.10 was used to determine the model of nucleotide substitution that best fit the data set, and the data set was subsequently run using GTR+I+Γ. Both strict and relaxed (uncorrelated exponential and uncorrelated lognormal) molecular clocks were used for the analysis, combined with different tree priors (constant population size, GMRF Bayesian Skyride, and Bayesian skyline were used in this study). Timescale was inferred using an informative substitution rate prior (a lognormal distribution with 95% of the density lying between 1 × 10 −4 and 1 × 10 −3 substitutions per site per year) previously estimated for the N gene of rodent-borne hantavirus [30]. Models were compared pairwise by estimating the log marginal likelihood via path sampling and stepping stone analysis [31][32][33]. For each model, an MCMC was run for 100 million generations, with sampling every 10,000 steps. All other priors were left on their defaults. Posterior probabilities were calculated using the program Tracer1.7 after 10% burn-in. The results were accepted only if all the parameters have effective sample sizes over 200.
The selected molecular clock/demographic model (strict clock with Bayesian skyline prior in this study) was then used for the Bayesian phylogeographic inference based on the continuous-time Markov chain process over discrete sampling locations to estimate the diffusion rates among locations, with Bayesian stochastic search variable selection. The MCMC was run for 300 million generations, with sampling every 30,000 generations. Convergence was assessed by estimating the effective sampling dates using Tracer and accepting effective sample size values of 200 or more for all the parameters. A maximum clade credibility (MCC) tree with the phylogeographic reconstruction was selected from the posterior tree distribution after a 10% burn-in using the Tree Annotator v1.10 and was manipulated in FigTree v1.4.4. The migration routes were visualized using SPREAD3 [34]. Migration pathways were considered to be important when they yielded a Bayes factor greater than 15 and when the mean posterior value of the corresponding migration event was greater than 0.50. Bayes factors were interpreted according to the guidelines of Kass and Raftery [35].

Recombination events detection
Recombination events were detected among all the 225 complete S gene sequences and 180 complete M gene sequences from 238 HTNV strains collected in this study. The RDP analysis suggested nine recombination events in six HTNV isolates, and all the recombination events occurred in the M gene segment ( Table 2). Similar recombination events were detected in three isolates (CGAa31MP7, CGAa31P9, CGHu2). All three isolates were collected from 2004 to 2005 in Guizhou Province [8]. This would imply that these isolates have the same ancestor, which has experienced recombination previously. Although only five methods suggested the recombination event occurred in strain JN131026, the P values were all lower than 0.01. Considering the consensus score was higher than 0.6 (0.646), we defined it as a recombinant isolate. P values of the event tested in strain A16 were in the high credibility level.

Phylogenetic analyses and amino acid substitution analyses
The recombination isolates were excluded first. Our phylogenetic analysis showed that 11 groups were defined with the S segment, each with a high degree of support (Fig 2). The isolates were mainly clustered together by region. The isolates from South Korea all belonged to group A. All the sequences from Jiangsu Province and one strain from Sichuan (S85_46) Province clustered in group A with sequences from South Korea. Group B was formed with sequences collected from Russia and the northeast of China. The isolates collected in Shaanxi were all clustered in group C, except one strain, 84FLi, which was isolated years ago in 1984.
All the three isolates from Hubei constituted group D, and all the isolates from Shandong clustered as a subgroup in group E. The other subgroup of group E contains strains collected from Anhui, Hunan, Shaanxi, Sichuan, Guangdong, and Yunnan. The isolates collected from Zhejiang constitute group G and group H. Group J contains isolates all from Jiangxi. Group K was formed with three isolates from Russia. We noticed that the isolates collected from Guizhou Province were highly divergent. They dispersed in group B, C, F and I, indicating that HTNV has spread in Guizhou for a long period. There were 10 distinct phylogroups in the M gene tree. The structure of the phylogenetic tree reconstructed by the M gene was identical to the S gene tree, except for seven isolates (CGRn15, CGAa2, CGAa75, CGRn2616, CGRn2618, CGRn45, CGHu3) derived from Guizhou Province. They were classified into group I, but in the S gene tree, they were included in group C. Additionally, isolate N8, collected form Jiangxi, was classified into group G in the M tree but group J in the S tree. This suggested that reassortment events occurred in these strains. The branching order within the lineages differed in S and M trees, which indicates the different evolutionary history between the M and S segments. It should be clarified that all the branching orders were well supported.
We analyzed the variation of amino acids between the different phylogenetic groups. We noticed that five amino acid sites on NP gene and 37 on Gn/Gc gene were group specific and could be markers to distinguish different groups (Fig 3). No significant variant was found at the highly conserved pentapeptide motif WAASA, which is thought of as the cleavage site on glycoprotein precursor (GPC) polypeptide into the Gn and Gc transmembrane proteins [36].
To determine whether any of these amino acid sites were located in any known immune epitopes, we compared these significant positions against information about experimentally determined immune epitopes curated by the Immune Epitope Database. We found a total of 19 positions that were located in known immune epitopes (Table 3). No significant amino acid marker was found among different hosts or collected times.

Natural selection analysis
The natural selection analysis showed that purifying selection plays a dominant role in HTNV evolution, with very low dN/dS values and an abundance of negatively selected sites ( Table 4). The dN/dS values for Gn were the highest, but they still show little evidence of positive selection. In total, 10 positively selected sites in the Gn protein and 11 in the Gc protein were found by MEME. One and 14 positively selected sites in N protein were found by FUBAR and MEME, respectively. We noticed that none of the significant amino acid substitutions sites (Fig 3) were being subjected to positive selection. Evidence of episodic diversifying selection was detected in two branches (CGRn53, HoJo) in the phylogeny of M segment by aBSREL.

Coalescent analyses and evolution of HTNV
To avoid potential biases in the phylogeographic reconstructions due to sampling heterogeneity, we obtained a "nonredundant" subset including 51 clusters. In order to make our data set represent all 15 regions (there was no sequence with exact collected year from Anhui, Sichuan, and Guangdong), isolate E142, Hunan03, and JS10 were added to the data set. Temporal evolutionary signal analyses showed a positive correlation between genetic divergence and sampling time (S1 Fig). It should be mentioned that though the data set we used showed positive correlation by root-to-tip analyses, it failed to pass the date-randomization test (DRT, S2 Fig) [37]. As DRT is stricter than root-to-tip analyses in clock signal evaluation, we think our data set still shows clock signal, which is not so strong. The strict clock with Bayesian skyline prior model yielded a higher log marginal likelihood than the others, indicating the best-fit model for our data set (S1 Table).
The results of our Bayesian phylogenetic analysis showed that HTNV probably first emerged in Zhejiang Province of China (root PP = 0.28), with a most recent common ancestor in 1214 (95% credibility interval 590-1592; Fig 4A). The root PP of Jiangxi, Shaanxi, and Guizhou were also in higher levels. The MCC tree of HTNV obviously showed two separate clusters. One was composed of isolates from the northeast of China and its surroundings, and the other was composed of isolates from the south and middle of China. These indicated that HTNV has been spreading in China for a long time. Reconstruction of the demographic history using the Bayesian skyline plot revealed that the HTNV population size was relatively

PLOS NEGLECTED TROPICAL DISEASES
Genetic diversity and evolution of Hantaan virus constant from 1500 to the 1970s, expanded between the 1980s and 1990s, and then stayed steady from 2000 until now (Fig 3B). The mean substitution rate of 2.0185 × 10 −4 substitution per site per year, with a 95% highest posterior density (HPD) that ranged from 9.3435 × 10 −5 to 3.2011 × 10 −4 , was estimated.
The spatiotemporal linkage of HTNV is shown in Fig 5. Significant location transitions were found mainly in two regions, the northeast and the middle of China (Fig 5A). This indicated that Heilongjiang may be the radiation center of the northeast regions of China and the surrounding areas. Significant migration events were found from Heilongjiang to Liaoning and the Far East of Russia. The migration from Heilongjiang to Khabarovsk was detected at a high credibility level (Bayes factor = 163.0). Shaanxi was assumed as the radiation center of the middle of China. Virus migration from Shaanxi to Tianjin, Yunnan, Hunan, and Guizhou was observed. The number of observed state changes pinpointed Shaanxi as the main source of HTNV epidemics in China, at least in the middle of China, and Heilongjiang as the dominating source of HTNV in northeast of China (Fig 5B).

Discussion
In this study, we used the whole-genome sequences of segment S and M obtained from ViPR to analyze the genetic diversity and evolution of HTNV. Our analyses showed that HTNV can be divided into 11 groups. We named the groups based on the S tree for NP gene, which is more stable and has more sequences available. The isolates were geographically clustered. Sequences obtained from the same geographic area clustered together. It is worth noting that sequences from Guizhou were more diverse. We will discuss this later.
A total of 42 significant amid acid variant sites were found among the different groups. More significant variant sites and greater nonsynonymous variations were found for Gn than Gc, which is consistent with membrane-distal localization and supported the assumption that Gn was subjected to selective pressure of the humoral immune response [38]. But selection analyses showed none of these sites were being subjected to positive selection. We noticed more significant variants in group I. Isolates contained in group I were all from Guizhou. The collection dates were from the 1980s to 2000s. This also indicated high genetic diversity of HTNV in Guizhou. In total, 19 significant amid acid variant positions were found located at 25 known immune epitopes. Three of the epitopes are B-cell epitopes, and 21 are T-cell epitopes. It should be mentioned that epitope 807 was predicted by bioinformatics approaches and needs to be validated further. Liang and colleagues [39] reported peptide 437-GQRKVILTKLVIGQ-451 (although the epitope is not recorded in Immune Epitope Database), which showed high antibody binding and replacement at any position of the sequence LTKTLVIGQ (amino acid 443-451), leading to a substantial decrease in reactivity. We found a mutation L447M in group E, which contains isolates collected from Shandong, Shaanxi, and Sichuan. It is also reported [40] that amino acid exchanges of epitope (954-LVTKDIDFD-963) could lead to a loss in reactivity of binding with recombinant human antibodies. We found a mutation, D963E, in group F, which contains isolates from Guizhou, Jiangsu, and Hubei. This might be a clue to the possibility of immune escape in these regions. We found an epitope (ID 742143) located on signal peptide, which showed high binding affinity to HLA-A � 0201 molecules and frequencies of epitope-specific cytotoxic T lymphocytes in the peripheral blood mononuclear cells of patients with HFRS and could induce CD8+ T-cell responses to inhibit HTNV replication [41]. Experimentation is needed to determine whether peptides containing these residues can confer escape or can be used to develop serotype-specific reagents for serology-based diagnostics. No significant amino acid marker was found among different hosts and periods, indicating that geographic region is the main factor affecting the genetic diversity of HTNV. The topologies between the M and S segments are different. We inferred that this was due to the different evolutionary history, which was in accordance with others' assumptions [42,43], but this still needs to be validated in the future. Our analyses showed that both reassortment and recombination play important roles in HTNV evolution. A number of studies have revealed that genetic reassortment can occur naturally or experimentally between the members of the Bunyavirales [44][45][46][47][48][49]. Seven of eight reassortment events occurred in M segments found in our study are consistent with Zou and colleagues [8]. We found nine recombination events in our study. All the recombination events occurred in segment M. The reassortment and recombination events found in Gn/Gc envelope glycoproteins were consistent with their functional roles in the viral escape from immunological responses. It should be noted that recombinant A16, isolated in Shaanxi, has both a major parent, H8205, and a minor parent, SN7. Strain H8205 has been classified into Amur virus, which indicates that recombination occurred across different species in hantavirus. Strain SN7 was isolated in Sichuan, which implies the relationship of the evolution of HTNV between Shaanxi and Sichuan. Evidence of recombination has also been reported for Tula orthohantavirus [50] and Puumala orthohantavirus (PUUV) [51]; even the recombination in hantavirus is rare [52].
We investigated the evolution and migration of HTNV in China and its surrounding countries using Bayesian phylogeographic inference. Our phylogenetic analyses placed the root of the tree for HTNV in Zhejiang with strong support (Fig 3B). But no migration event was found from Zhejiang to other regions. This may result from the lack of whole-genome data from Zhejiang and its surrounding areas. But the reassortment event of strain N8 indicates the correlation of the isolates in Zhejiang and Jiangxi. Markov jump estimates between different locations pinpointed Heilongjiang as an important source of HTNV epidemics in northeast China and the Far East of Russia. This finding may explain the high level of epidemics in these areas since 1931, when HFRS was first recognized in northeast China [53,54], due to the migration from Heilongjiang to its surroundings. In a similar pattern, Shaanxi was recognized as the origin of HTNV spread in the middle of China. These can be explained by a scenario in which the virus was first introduced into these areas and then expanded there. But the transmission sources to Heilongjiang and Shaanxi are still unclear. The samples collected from South Korea have been studied before [55][56][57], so we do not discuss them too much here. But the migration of HTNV from South Korea to Jiangsu indicates the possibility of HTNV import from other countries.
Our analyses showed the high diversity of isolates from Guizhou Province. Phylogenetic analyses showed that isolates collected from Guizhou Province clustered in group B, C, F, and I. Seven of eight reassortment events and seven of nine recombination events occurred in Guizhou. We found the same recombination pattern in strain CGAa31MP7, CGAa31P9, and CGHu2. All of these indicate that different sources of HTNV were transmitted to Guizhou and evolved for a long time in this place [58,59]. This assumption could be supported by our Markov jump estimate (Fig 5). But our Markov jump estimate as well as the phylogenetic correlation with other regions also indicated the dispersion from Guizhou to other areas, even though no significant migration was found by phylogeographic inference. Furthermore, the PP that Guizhou was the root location was high, so we cannot deny the possibility of Guizhou to be the place of HTNV's common ancestor. With its rich mountainous topography, Guizhou is more than 1,000 meters above sea level, and as much as 92.5% of the province's total area is characterized by mountains. This makes A. agrarius, the main host of HTNV, sympatric in rural resident areas more common. The Hengduan Mountains region was hypothesized to have played an important role in the evolutionary history of Apodemus since the Pleistocene era [60,61]. The Yunnan-Guizhou Plateau, which connects and overlaps with the Hengduan Mountains region, is also thought to play an important role in HTNV evolution. But we only found acceptable migration from Shaanxi to Guizhou in our study. This may result from the lack of sequences in Guizhou. To reduce the bias, recombinant and reassortment isolates were removed before the phylogeographic analysis. We also got rid of similar sequences with identity more than 99%. Thus, limited sequences from Guizhou were used for evolutionary and phylogeographic analyses. More gene information should be collected in order to know more about the HTNV evolution and spread in this area.
The most recent common ancestor of HTNV was determined about more than 800 years ago in our study, which is much earlier than when the first patient with HFRS was reported, in the early 1930s [62]. However, HFRS-like disease was described in Chinese writings about 1,000 years ago [63], so we believe HTNV has spread in China for more than 1,000 years. The most common recent ancestor according to Ramsden and colleagues [64] existed 859 years before the present for all rodent hantaviruses estimated and 202 years before the present for Murinae viruses, with a mean substitution rate for rodent hantavirus of 6.67 × 10 −4 substitutions per site per year with a 95% HPD that ranged from 3.86 × 10 −4 to 9.8 × 10 −4 substitutions per site per year. Partial sequences (275 nt) were used in this study, which can explain the difference in tMRCA estimation. A recent study [65] showed that the estimated rate of nucleotide substitutions for the N gene of all rodent-borne hantavirus was 6.8 × 10 −4 substitutions per site per year, which is similar to 6.67 × 10 −4 substitutions per site per year. Our results implied that HTNV evolved at a lower speed compared with the other rodent-borne hantaviruses. The estimated rate of nucleotide substitutions of Dobrava virus and PUUV, 3 × 10 −4 and 5.5 × 10 −4 substitutions per site per year, respectively, supports our hypothesis [30].
The relatively recent origin of HTNV apparently contradicts the virus-host codivergence theory. Previous estimates of evolutionary dynamics in hantaviruses were based on the critical assumption that the congruence between hantavirus and rodent phylogenies reflects codivergence between these two groups because of the divergence of the rodent genera Mus and Rattus, approximately 10−40 million years ago, which indicates a mean substitution rate in the range of 10 −8 substitutions per site per year [51,66,67]. However, the observation of hostpathogen phylogenetic congruence does not necessarily indicate codivergence. Phylogenetic congruence between a parasite and its host can also arise from delayed cladogenesis, in which the parasite phylogeny tracks that of the host but without temporal association [68]. This could occur if hantaviruses largely evolve host associations by cross-species transmission and related species tend to live in the same area, in which case a pattern of strong host-pathogen phylogenetic congruence could be observed in the absence of codivergence. Our evolutionary rates were estimated directly from primary sequence data sampled at known dates so that they more closely reflect the evolutionary changes undergone by the virus, at least in the short term. And with a mean substitution rate that is closer to other RNA virus, we consider our results to be more reliable than codivergence theory.
It should be noted that the substitution rate and tMRCA of HTNV estimated is not precise and could be improved in the future. As we show in the Results, the data set used for the molecular clock rate and phylogeography analyses did not pass the DRT, which revealed that the molecular clock rate of the data set is potentially unreliable. Therefore, a more accurate substitution rate and tMRCA would be estimated in the future when more sequences with stronger temporal structure are available. Furthermore, the selection pressure analyses showed that purifying selection plays a dominated role in HTNV evolution. It is reported [69] that the presence of strong purifying selection can lead to an underestimation of branch lengths. In our study, we calculated the total branch length of the MCC tree and aBSREL tree using the compute.brlen() function [70] supplied by ape package in R (http://ape-package.ird.fr/). The results showed that the total branch length for the MCC tree is smaller than the aBSREL tree (8.604 to 14.698), indicating the underestimation of the substitution rate and tMRCA of HTNV.
Our demographic analyses revealed that the HTNV population had expanded from the 1980s to 1990s. Because HTNV is a kind of rodent-borne virus, the population size of HTNV is relevant to rodent populations. Previous studies have revealed that climatic factors can influence HFRS transmission through their effects on the reservoir host (mostly rodents of the family Muridae) and environmental conditions [71][72][73]. It is reported that global annual average temperatures increased by more than 1.2˚F (0.7˚C) from 1986 to 2016 relative to temperatures seen from 1901 to 1960 [74]. Global warming may affect rodent winter survival through winter temperatures by a complicated process, and it may also influence the magnitude of HFRS outbreaks through summer climatic conditions (both temperature and rainfall). Additionally, as global climate change accelerates, the amount of annual rainfall increased accurately [75]. Moreover, agriculture improved rapidly in China during the same period, leading to more available food, which increases the reproduction rates of rodents. The steady effective population from 2000 until now has resulted from the successful strategy for HTNV prevention in China; however, it is worth noting that the uncertainty of demographic analyses increases toward the root of the genealogy, where population history is reconstructed from fewer lineages, and a majority of samples in our data set were collected in recent years. Thus, the population size of the viruses may not be precise, but the increase from the 1980s to 1990s and the constant size after 2000 could be used as a reference for HTNV demographic inference.
However, some caution must be taken when interpreting our results. We have estimated the age of genetic diversity based only on N gene sequences; the evolutionary rate of HTNV may be underestimated. The Gn/Gc envelope glycoproteins are more diverse and considered to evolve at a higher speed. Furthermore, there may be geographic biases due to the imbalance in the collection of sequences from different regions. Because we did the research using wholegenome sequences, partial sequences are not comparable; for instance, partial sequences from different research may be located in different regions of whole genome. The lack of wholegenome sequences in some regions may result in the lack of knowledge about HTNV migration. In addition, lack of sequences of L gene can be obtained now. More gene information should be collected and shared to solve these problems.
In conclusion, our study revealed 11 groups of HTNV and 43 significant amino acid markers for different groups which could be connected with different regions. We hope it could be an indicator of immune effect evaluation of vaccine in different regions. Both recombination and reassortment events can be detected in HTNV. The origin and migration of HTNV were also shown by our analyses. We found that Heilongjiang, Shaanxi, and Guizhou played important roles in HTNV evolution and migration. It is crucial to pay more attention to HFRS prevention and control in these areas. Because rodent populations and activity influence the spread and increase of HTNV, rodent prevention and control is an effective way to reduce the incidence of diseases, which has also been proved by other studies [76,77]. And a steady effective population from 2000 until now indicates that this is a successful strategy for HTNV prevention. These data provide important insights for better understanding the genetic diversity and spatiotemporal dynamics of HTNV and would be useful for disease prevention and control.