Conceived and designed the experiments: JNY HL QT XYT SMW GDL. Performed the experiments: JNY XYT HL. Analyzed the data: JNY QT SR JA NH WF XYT ZYG HZL. Contributed reagents/materials/analysis tools: QT GDL SR NH HZL. Wrote the paper: JNY SR NH JA XYT QT.
The authors have declared that no competing interests exist.
Recent years have seen a rapid increase in the number of rabies cases in China and an expansion in the geographic distribution of the virus. In spite of the seriousness of the outbreak and increasing number of fatalities, little is known about the phylogeography of the disease in China. In this study, we report an analysis of a set of Nucleocapsid sequences consisting of samples collected through the trial Chinese National Surveillance System as well as publicly available sequences. This sequence set represents the most comprehensive dataset from China to date, comprising 210 sequences (including 57 new samples) from 15 provinces and covering all epidemic regions. Using this dataset we investigated genetic diversity, patterns of distribution, and evolutionary history.
Our analysis indicates that the rabies virus in China is primarily defined by two clades that exhibit distinct population subdivision and translocation patterns and that contributed to the epidemic in different ways. The younger clade originated around 1992 and has properties that closely match the observed spread of the recent epidemic. The older clade originated around 1960 and has a dispersion pattern that suggests it represents a strain associated with a previous outbreak that remained at low levels throughout the country and reemerged in the current epidemic.
Our findings provide new insight into factors associated with the recent epidemic and are relevant to determining an effective policy for controlling the virus.
Rabies is a major problem in developing countries and responsible for more than 55,000 deaths annually. More than half of the cases occur in Asia and China has the second highest incidence of rabies after India. Human rabies cases in China decreased during the early 1990s but the virus began to re-emerge in the latter half of the decade and spread rapidly across the country with a corresponding increase in cases. To try and learn more about the epidemic, in 2006 the government implemented a trial surveillance program to sample and screen canine populations in locations where human cases were reported. In this work we selected a subset of samples (representative of the entire epidemic region) for sequencing and investigated the history and origin of the virus in China and examined the variation from a geographical perspective. Our results indicate that the epidemic is primarily composed of a younger strain with a geographical dispersion that was consistent with the recorded spread of the virus and a second older strain that corresponds to a previous epidemic. This second group exhibits a different geographical pattern, and it appears that this strain remained at low levels throughout the country and was able to re-emerge as the epidemic took hold.
Rabies is an enzootic disease that causes severe dysfunction to the central nervous system
There are already many published reports on the phylogenetic relationship amongst strains isolated in China that have primarily focused on sample classification and estimation of features such as date of the most recent common ancestor (TMRCA)
In 2005, in order to improve rabies control and prevention, the Chinese government implemented a trial surveillance program to monitor rabies at the national level in an attempt to obtain a more comprehensive epidemiological dataset. In addition to recording statistics on human cases, the Institute for Viral Disease Control and Prevention of China CDC cooperated with the provincial CDC laboratories and began collecting samples from dog populations in regions where human rabies cases had been reported; these samples were then screened for presence of the rabies virus by both DFA and RT-PCR detection. The positive samples were then submitted for DNA sequencing and combined with a second subset of selected sequences from publicly available sequences.
Although dogs remain the major infection source, contributing 85%–95% of human cases in China
In this study we use this dataset to investigate the dissemination of the virus across China as the epidemic took hold and we analyze the sequence set in terms of genetic diversity, patterns of distribution and evolutionary history.
Data on human rabies cases in China between 1996 and 2008 were collected from the annual reports of Chinese Center for Disease Control and Prevention (China CDC). Human rabies cases in China are defined according to clinical symptoms and subject's case history (such as a record of close contact with infected hosts via bites or scratches) and are confirmed by laboratory test where possible. Although not all patients are confirmed by laboratory test, the typical symptoms of human rabies cases are very distinct and misdiagnosis is unusual.
Cases are recorded as part of the national infectious disease reporting system set up by the Chinese government for monitoring several diseases including rabies. If a subject at a local health service center, hospital or other health institute is diagnosed with rabies virus as outlined above this institute has the responsibility of immediately reporting the case to the local CCDC who liaise with the national headquarters.
Samples were collected as part of a national surveillance program. In this program, reported human cases of rabies were followed up by visits by provincial CDC laboratories to the region. Since 85%–95% of laboratory-confirmed human rabies cases in China could be associated with a dog bite
Additional rabies sequences were downloaded from GenBank and a subset was selected based on the following criteria: (1) that the sequence spanned the 720 nt region of the N gene from 636 nt to 1353 nt; (2) the full background information (isolation time/host/location) was available. Finally, T-COFFEE was used to identify samples with the greatest nucleotide diversity. This reduced the original set of 176 published sequences to a subset of 153 sequences that provided the greatest coverage of geographical regions and host species. These sequences were isolated from dogs, cats, deer, raccoon dogs, striped field mice (
Phylogenetic trees were constructed based on the 720-nt N-gene sequence (nt704–1423) using the Maximum Likelihood (ML) method implemented in the PHYML
PHYML does not use an outgroup so trees were estimated with and without Australian bat rabies virus sequence ABL1996, the topologies of all the PHYLIP and PHYML trees were compared for consistency and no significant differences were observed. The HKY model was selected using MODELTEST
To investigate the patterns of distribution and geographical structure of the rabies virus in China, isolates in the constructed ML tree were assigned a state according to the province in which they were collected and the tree was examined for discordant sample locations. In previous phylogeographic studies these events are referred to as migration events but, for consistency with common rabies terminology, we refer to them in this paper as translocation events. Translocation events were inferred using a parsimony method with DELTRAN optimization
To examine the relationship between locations the UniFrac software package was used to generate a distance matrix between all pairs of communities (i.e. provinces) based on an estimation of the fraction of the branch lengths of the tree which is unique to each community. Principal Component Analysis (PCA) was then used to examine the geographical structure of the data by transforming the matrix such that the greatest variation occurred in the initial principal components
Evolutionary history, including evolutionary rates of populations (nucleotide substitutions per site per year), TMRCA and population growth models was inferred by using the Bayesian - Markov chain Monte Carlo (MCMC) method implemented in the BEAST software package
Demographic histories were inferred by Bayesian skyline reconstruction and statistical uncertainty was expressed by 95% confidence intervals of the Highest Posterior Density (HPD). The constant population size, exponential population growth and logistic population growth models were considered in turn and compared using Bayes Factors
To investigate whether the observed phylogeographic structure was simply a consequence of sample size or sampling bias we defined a series of distance matrices according to location for difference in (i) sample size, (ii) geographical distance between locations, (iii) number of translocation events (iv) UniFrac distance (v) Net Relatedness Index (NRI) and (vi) Nearest Taxon Index (NTI). NRI provides a measure of the dispersion of a locality throughout a tree, whereas NTI is a measure of the clustering at the leaf nodes
ML reconstruction of the 211 RABV partial N gene sequences collected in China divided the isolates into four major clades; clade I, clade II, clade III, and clade IV (
Bootstrap values are indicated at the main nodes. Most sequences are contained in clade I and clade II. Underlined provinces are from the southwest, provinces with a line above the name are from the east. New isolates are marked with a blue diamond, human isolates are marked with a green cross. Clade I shows statistically significant geographic subdivision (see
All the sequences in clade I are from dogs. Although the clade contains sequences covering all the sampled regions, the older sequences are almost exclusively from the southwest whereas the younger sequences are from the east. This is consistent with the recorded spread of the virus (i.e., number of human cases) from the southwest to the east (
Points represent counties where human cases were reported.
Clade III (n = 24) corresponds to the cosmopolitan branch and represents a more general group of strains that includes isolates from dogs, rats, deer and raccoon dog and also shows no clear geographical segregation. Clade IV is confined to samples from northeastern China and forms the arctic-related branch.
Spatial dynamic analysis was used to identify structure in the geographic diffusion of the rabies virus in China at the provincial level
Statistically significant (p<0.05) predicted translocation events for (a) clade I and (b) clade II. The left hand side of the figure shows the estimated BEAST tree for each clade. The branches are colored coded by location (see legend on right). For clarity, Shanghai, Zhejiang, Fujian, Anhui and Shandong are grouped together as they are bordering provinces and there are no migration events amongst them. The arrows at the bottom of the tree show the location of sequences with an ancestral sequence predicted to originate in Jiangsu (JS) province. The map on the right shows the translocation events predicted to originate from Jiangsu. These translocation events can also be seen in the marked region on the tree which, in contrast to other parts of the tree, contain multiple branches with two different colors. (b) map shows no clear centers for translocation, but a statistically significant translocation event is predicted for wildlife (ferret badger) from Jiangsu to Zhejiang province (dashed arrow on right of map).
UniFrac is a method that was originally developed to calculate a distance measure between bacterial communities based on the dispersion of the two communities within an estimated phylogenetic tree. The program finds taxa in the tree that contain samples from the two communities and counts the number of branches that are shared by both, or that are unique to one or the other community. To determine which provinces share similar evolutionary patterns, UniFrac was used to analyze the geographical structure of the tree by generating a distance matrix between all location pairs. PCA was then used to transform the matrix such that the greatest variation occurs in the first component, the next greatest variation in the second component and so on. The first two principal components explained 45% and 63% of the total variation for clade I and clade II respectively. The first two principal components for clade I and clade II are shown in
First two principal components for UniFrac metric for (a) clade I and (b) clade II. Clade I shows a strong division between east and southwest provinces in China. The eastern provinces are closely grouped together, with the exception of Jiangsu province which is a source of multiple translocation events to southwestern provinces. Clade II shows no apparent geographical division.
By using a Bayesian relaxed clock method, exponential population growth and constant population size was determined to be the most appropriate population model for clade I and clade II respectively. The evolutionary rates of each clade based on the selected population model were 1.274×10−3 (HPD95%: 8.3705−4-1.2515E−3)substitutions per site per year for clade I and 9.629×10−4 (HPD95%: 3.519−4-1.628E−3) substitutions per site per year for clade II. The corresponding TMRCA estimates for clade I, clade II and clade III were 15.5 years (about 1992; 95%HPD (10.5–20.1 years)), 48.0 years (about 1960; 95%HPD (16.1–112 years)) and 117 years (about 1891; 95%HPD (75–211 years)) respectively; because there were only two sequences for Clade IV the TMRCA was not estimated. For clade I, the virus spread from SW to E China, constantly encountering new hosts, whereas it seems that clade II was already distributed throughout the country, suggesting it was present at low levels and reemerged more gradually. Thus, the pattern of spread was very different for the two clades, and this may explain the differences in the selected population models. The Skyline plots (
Bayesian skyline plots showing the evolutionary and transmission histories of a) clade I and b) clade II and their corresponding trees. (a) also shows the number of human rabies cases recorded by year (bottom) and (b) shows the skyline plot for clade I on the same time scale (insert bottom right). Clade I shows greater variation in genetic diversity compared to clade II. Although both clades show a drop in genetic diversity around 2003 ((a) blue arrow on left), this is not correlated to number of human cases as they were still increasing rapidly and didn't peak until 2007 (red arrow on right of (a)). However, the drop appears to coincide with the introduction of translocation events, (a) and (b) top, as at this time multiple events appear in the trees.
To investigate whether the observed results were due to sampling bias we generated six distance matrices based on differences between the locations and performed a pairwise Mantel test to test for correspondence (
As a further test of whether clade I and clade II possess distinct geographical structures we formed a two-way contingency table for the sample data based on sample location (SW or E) and sample date (2003–2005 & 2006–2008) and performed a chi-squared test on the sample data within and between clades (
We have performed the most comprehensive study to date of the spatiotemporal dynamics of the rabies virus in China. While previous studies of the current rabies epidemic in China have focused on the phylogenetic relationship amongst canine rabies isolates, we also attempted to investigate the possible role of wildlife.
Our identification of two major clades, clade I and II, is consistent with results from previous studies
One commonly voiced concern is that the observed increase in the number of cases in China might simply be attributed to an improved surveillance program and misdiagnosis of rabies cases
What is still uncertain is the degree to which rabies is present in canines and wildlife. Currently, there is no national or local surveillance system for monitoring dog and wildlife rabies and previous estimates are based on case reports which are inconsistent and clearly underestimate the incidence in these populations. The sampling across 15 provinces in this study, although limited, is informative. Of the 3275 samples collected, 58 tested positive for the virus, corresponding to 2.8% of the dataset. As the goal of the surveillance program was to collect dog brain samples from areas where human rabies cases had been reported, this does not necessarily reflect the situation at the national level. Nevertheless, this percentage is consistent with earlier studies in China
Our results indicate the growth of clade I coincided with the spread of the epidemic, whereas clade II was already present throughout the sampled regions at the earliest stages. This suggests that clade II is from an earlier outbreak and existed at low levels throughout the country. This is also consistent with the earlier TMRCA for this clade and the difference in the distributions of branch lengths for the two clades.
Our results reveal the existence of both geographic dispersal and translocation events, and statistical tests indicate that it is improbable that the events are a consequence of sampling bias. Given the relatively small number of identified translocation events, it appears that geographic dispersal plays the major role in the spread of the virus. This is also supported by the observation that the branch order in the tree coincides with epidemiology data that shows that the neighboring provinces of Hunan, Guangxi and Guizhou experienced rabies outbreaks sequentially. In southwest China, Hunan seems to serve as a major source of geographic dispersal as these sequences are widely distributed among the southwestern sub-clades. The identification of translocation hotspots for clade I suggests that this mechanism also aids dissemination of the virus, although the reason why Jiangsu should act as a major translocation source is unclear. Also, because there were already cases reported in all the translocation regions, it is difficult to be certain how much translocation contributed to the epidemic. As more samples become available through the national surveillance program, it will be possible to further investigate these factors.
We also investigated the relevance of wildlife in the spread of the virus and there were a number of curious results from our study. Firstly, our phylogenetic analyses placed ferret badger sequences at the top of two distinct sub-clades of samples isolated from dogs. If the rabies in wildlife was a consequence of spillover from dogs, then we would expect to find the wildlife isolates mixed in with the dog samples. This hasn't been reported in previous studies which have either focused on dogs and only contained one or two wildlife samples
Previous studies have investigated rabies in ferret badgers in southeast China. While the number of isolated samples is small
There are insufficient samples to draw any definitive conclusions as to whether wildlife plays a significant role in the spread of rabies in China, but our results are nevertheless interesting and further studies would be worthwhile. However, given the size of rural China, obtaining sufficient positive samples remains a formidable challenge.
It is worth noting that the current epidemic and associated increase in human cases was coincident with many social changes in the country that facilitated the spread of the disease. Firstly, vaccination represents the most effective approach to controlling rabies
This analysis on population dynamics and patterns of distribution and differentiation of the virus may help the development of a program for the prevention and control of rabies in China. Specifically, the identification of translocation hotspots suggests that these regions should be given priority in order to reduce the likelihood of reintroducing the virus into vaccinated areas. Additionally, as our results indicate that clade II is evidence of a previous epidemic, this means that the virus had maintained low levels throughout the country for an extended period and was able to rapidly reemerge when suitable conditions prevailed. The presence of these two distinct components in the epidemic needs to be taken into consideration when attempting to implement WHO recommendations
Background information of rabies sequences used in this study. Sequences are grouped according to their assigned clade in the tree shown in
(DOC)
Details of MigraPhyla analysis to detect significant translocation events for (a) clade I and (b) clade 2 amongst the Chinese provinces from which samples were collected in this study. Top table shows number of translocation events predicted between pairs of provinces, lower table show statistical support for the events with P<0.05.
(DOC)
Pairwise Mantel test results for correspondence using Spearman correlation ranks amongst six different distance matrices for (a) clade I and (b) clade II.
(DOC)
Details of analysis of geographical composition of Clade I and Clade II.
(DOC)
We would like to thank the staff of the provincial CDCs (Guangxi, Hunan, Guizhou, Zhejiang, Jiangsu, Shanghai, Shandong, Anhui and Yunnan) for their help with field investigations and sample collection.