HCV 6a Prevalence in Guangdong Province Had the Origin from Vietnam and Recent Dissemination to Other Regions of China: Phylogeographic Analyses

Background Recently in China, HCV 6a infection has shown a fast increase among patients and blood donors, possibly due to IDU linked transmission. Methodology/Findings We recruited 210 drug users in Shanwei city, Guangdong province. Among them, HCV RNA was detected in 150 (71.4%), both E1 and NS5B genes were sequenced in 136, and 6a genotyped in 70. Of the 6a sequences, most were grouped into three clusters while 23% represent emerging strains. For coalescent analysis, additional 6a sequences were determined among 21 blood donors from Vietnam, 22 donors from 12 provinces of China, and 36 IDUs from Liuzhou City in Guangxi Province. Phylogeographic analyses indicated that Vietnam could be the origin of 6a in China. The Guangxi Province, which borders Vietnam, could be the first region to accept 6a for circulation. Migration from Yunnan, which also borders Vietnam, might be equally important, but it was only detected among IDUs in limited regions. From Guangxi, 6a could have further spread to Guangdong, Yunnan, Hainan, and Hubei provinces. However, evidence showed that only in Guangdong has 6a become a local epidemic, making Guangdong the second source region to disseminate 6a to the other 12 provinces. With a rate of 2.737×10−3 (95% CI: 1.792×10−3 to 3.745×10−3), a Bayesian Skyline Plot was portrayed. It revealed an exponential 6a growth during 1994–1998, while before and after 1994–1998 slow 6a growths were maintained. Concurrently, 1994–1998 corresponded to a period when contaminated blood transfusion was common, which caused many people being infected with HIV and HCV, until the Chinese government outlawed the use of paid blood donations in 1998. Conclusions/Significance With an origin from Vietnam, 6a has become a local epidemic in Guangdong Province, where an increasing prevalence has subsequently led to 6a spread to many other regions of China.


Introduction
Because of frequent blood exposures and unsafe injection behaviors, hepatitis C virus (HCV) infection has become highly prevalent among injection drug users (IDU) in China [1][2][3][4]. Based on a recent meta-analysis, the pooled HCV prevalence among IDUs was about 67.6%, and there were large geographic variations in genotype distribution. Generally, 6a was predominant, accounting for 36.7%. This was followed by 3b, 1a, 3a, and 1b, accounting for 16.4%, 14.6%, 12.8%, and 12.4%, respectively [5]. Among IDUs in the Guangxi and Hubei Provinces, and in Hong Kong and Taiwan as well, 6a was the main HCV strain [5]. However, this was not consistent with the predominant strains among clinical patients. The high proportion of 6a among IDUs in Guangxi has been linked to the fact that HCV was transmitted through unsafe injection by users along the drug trafficking route from Southeast Asia [6].
The Guangdong Province is located in the southern part of China, neighboring Guangxi. Historically, it has been an internal and international crossroads for cultural and commercial exchanges. Since 1978, Guangdong has been assigned as the first region of China to adopt a free market economy and open door policy. Recently, as economic development in China accelerates, this province has become a ''World Production Center''. A powerful economy has stimulated a large population of drug users and a rapid increase in drug trafficking and trading. In addition, there has been an explosion of the illegal sex industry in the province and this has been associated with millions of migrants who are on business or pleasure or temporarily hired as laborers in the myriad of factories that lack good hygiene conditions [7]. Collectively, these factors have created an environment to inbreed HCV prevalence.
Recently, we have studied HCV infection among patients and blood donors in the Guangdong Province and found that 6a has accounted for an increasing proportion [8][9]. We tend to believe that this 6a prevalence could have resulted from IDU linked transmission. In our phylogenetic analysis, 6a appeared to show its origin from Vietnam, because Vietnamese isolates were the most diverse and genetically distant. Considering the historical event in which the Guangdong, Guangxi, and Hainan provinces had accepted approximately 300,000 refugees from Vietnam during the late 1970 s [9], we hypothesized that 6a may have since been introduced into Guangdong and become a local epidemic to seed the IDU network. In recent decades, 6a has been spread to other regions by IDUs or by infected people among the migrants. In this study, coalescent and phylogeographic analyses were performed to test this hypothesis.

Prevalence of anti-HCV and HCV RNA
For routinely screening blood donations, an automated, integrated NAT (Nucleic Acid Testing) platform has been implemented at the Guangzhou Blood Center. Therefore, samples from this study were first applied to this platform. After testing for HCV RNA, the samples were subjected to PCR. This resulted with only 198 samples left for anti-HCV assay. Consequently, 191/198 (94.5%) users were anti-HCV positive and 150/210 (71.4%) were HCV RNA positive (Table 1).
1b sequences were isolated from 13 users. Among them one was classified into a previously designated ''A'' cluster, three into a ''C'' cluster [9], and one was grouped with reference Chinese isolates. In contrast, eight formed their own cluster designated ''SW''. The SW had a bootstrap score of 95% in the E1 and 89% in the NS5B tree ( Figure S1). 3a sequences were isolated from 42 users. They formed a monophyletic cluster in both the E1 and NS5B trees ( Figure S2). Genetically, they looked distinct from those isolated from IDUs in the Hubei [2] and Yunnan Provinces [10] but with no significant bootstrap support. 6a sequences were isolated from 70 users. Among them, 28, 5, and 17 were classified into clusters I, II, and III; 8 each were grouped into clusters VI and VII ( Figure 1). Clusters I, II, and III have been recently described and included many reference sequences [2,3,6,8,9]. Clusters VI and VII were newly designated and contained sequences only from this study (accounting for 23%). Both the E1 and NS5B sequences were analyzed and classified isolates consistently. Therefore, identical isolates were placed in similar positions in both trees. The E1 tree further showed bootstrap scores of 82%, 95%, 88%, 92%, and 90% for clusters I, II, III, VI, and VII, respectively. However, no significant scores were shown in the NS5B tree.

Multiple HCV infections
Directly sequencing amplicons yielded ambiguous results for eight users. Molecular cloning followed by sequencing revealed that they had multiple infections with different HCV strains. This was confirmed by sequencing both E1 and NS5B genes from two users, sequencing both core and E1 from one, and sequencing single NS5B from five. SW33, SW62, SW64, SW109, and SW179 each had co-infection with two 6a strains, of which SW64 and SW109 also had a third strain of 1b. SW122 had co-infection with

Coalescent analyses of 6a sequences
(1) 285 dated sequences. Many 6a sequences have been reported among IDUs in China [2,3,6,10]. We have also characterized a number of 6a sequences from patients and blood donors in the Guangdong Province [8,9]. Jointly, they represented 139 retrieved sequences. Here, 6a sequences were further determined among 21 blood donors from Vietnam, 22 blood donors from 12 provinces of China, and 36 IDUs [3] from Liuzhou City in Guangxi Province. They were co-analyzed with those from 70 IDUs in Shanwei city. In total, 285 isolates were represented, each with 519 nt of E1 gene.
(2) Model test. Prior to coalescent analysis [11], model test was performed [12]. Based on BIC (Bayesian information criterion) and AIC (Akaike information criterion), the GTR+I+C model was found to be the best. However, based on AICc (Akaike information criterion, corrected), the K80+C model was the best. Compared to AIC and BIC, AICc favors simpler models by scoring higher penalties for extra parameters. Since in the BEAST package [13] the model complexity is endeavored and K80+C is not implemented, we selected the GTR+I+C model.
(4) Estimating rates as priors. To obtain more valid rates to be used as priors in subsequent analysis for better representing the Chinese isolates, 139 sequences were selected with exclusion of all the Vietnamese strains. The 139 sequences were from our studies to ensure correct sampling dates and provide good temporal and phylogenetic structures. Under three models: BSP + uncorrected exponential, BSP + uncorrected lognormal, and BSP + strict clock, the substitution rates, 2.737610 23 (95% CI: However, sufficient ESS was achieved for all statistics. BF analysis confirmed that the exponential model was better than lognormal (BF = 57.75) and lognormal better than the strict (BF = 8.42).
(5) Coalescent analyses using rates as priors. Using the rate 2.737610 23 (95% CI: 1.792610 23 to 3.745610 23 , standard error: 1.507610 25 ) as a prior under the BSP + uncorrected exponential model, a coalescent analysis was performed for the 285 sequences (Table 4). Transforming the log file into trace distributions revealed a substitution rate of 2.7346610 23 (95% CI: 2.7055610 23 to 2.7637610 23 ), in which the mean is similar to the prior but the 95% CI (confidence interval) is narrower. Parameter a is inversely related to the extent of rate variation among sites [15]. Here, the estimated a was 0.8933 (95%CI: 0.8887 to 0.8978). It is #1 and means that most sites have very low rates while mutational ''hot spots'' existed in the sequences that changed at very high rates. Compared to that previously estimated for core (a = 0.22) and NS5B (a = 0.32) genes [14], a higher a suggests a weaker ''hot spot'' pattern for the analyzed E1 gene. The pInv statistic shows the proportion of invariable sites and lower value means more variable nucleotides [16]. Here, a moderate pInv 0.4684 (95%CI: 0.4674 to 0.4695) was inferred. COV represents rate heterogeneity among lineages, which abuts zero when a strict molecular clock is followed [17]. For the 285 sequences a high COV, 0.8419 (95% CI: 0.8402 to 0.8437), was estimated. This is higher than 0.37 previously inferred for concatenated core and NS5B genes (95% CI: 0.28 to 0.45) [14] and strongly indicates a relaxed molecular clock. In addition, a very small covariance parameter, 5.652610 24 (95% CI: 20.0011 to 2.2368610 23 ), was observed. This value measures a degree to which among-lineage rate variation is randomly distributed across the phylogeny rather than being restricted to specific clusters. Using different rates as priors, coalescent analyses were also performed under two other models, BSP + uncorrected lognormal and BSP + strict clock, and the results were detailed in Table 4.
(6) Phylogeography. Figure 2 shows the MCC (maximum clade credibility) tree, a phylogeography reconstructed from the trees sampled under the exponential model (Table 4). Seven 6a clusters shown in Figure 1 are all maintained and supported with high posterior probabilities (P$0.97). Vietnamese isolates are positioned at the base. The base contains the most direct descendents of the earliest 6a common ancestor dated around 1962-1963(95% CI: 1915-1980, which is thought to be the origin of all 6a strains in China. Upward from the base, there appeared a trend of 6a migration first to Yunnan, to Guangxi, Since most were from blood donors [9], a transition from IDU to non-IDU dissemination is suggested. Remarkably, isolates in cluster I had the largest number size (135 isolates) and showed the widest range of spread (11 provinces). It started from a common ancestor dated around 1990 (95% CI: 1986-1993) and is characteristic of geographic lineages. For example, sequences from three cities of Guangxi Province [3,6] formed a noticeable group analogous to four smaller neighboring groups that contain sequences mainly from Guangdong and occasionally from other provinces [8,9]. At the top of the tree, there is a large group containing two additional geographic lineages, one from Bingyang City in the Guangxi Province [6] and the other from Wuhan City in the Hubei Province [2], separated by a bunch of sequences from Guangdong [8,9]. This large group can be the evidence to support that the IDU networks in cities of Guangxi, Guangdong and Hubei provinces had been exchanged. Although Guangxi is thought to be the origin of 6a in Guangdong, migration from Guangdong back to Guangxi is also suggested. As 6a in  Figure 1. Subtype 6a phylogenies estimated from (A) E1 and (B) NS5B region sequences, corresponding to H77 nucleotide positions of 869-1289 and 8276-8615, respectively. Subtype 6b sequence D84262 was used as an outgroup. Green pies label sequences from our previous studies [8,9]. Red and yellow pies label sequences from this study, in which yellow pies mark isolates from IDUs with multiple HCV infections. Sequences without pies were retrieved from Genbank. In each tree, five rectangles highlight the further classification of 6a isolates into I, II, III, VI, and VII clusters. Scale bar represents 0.02 nucleotide substitutions per site. doi:10.1371/journal.pone.0028006.g001 Guangdong has become increasingly prevalent, there appeared a recent trend of radiation to other regions. This is brightly portrayed in cluster I, which showed 6a migrations from Guangdong to Yunnan, Fujian, Jiangxi, Hainan, Hunan, Xinjiang, Shanghai, and Shanxi Provinces, and this is associated with blood donors. Similar phylogeographies were also plotted under two other models (Table 4 & Figure S3, S4), and collectively, the 6a migration routes in China were suggested ( Figure 3).
(7) Migration test. To test for the presence of phylogeographic structures, an additional test was performed to estimate two statistics: AI (Association Index) and PS (Parsimony Score statistic). AI is the sum across all the internal nodes of a phylogeny, which explicitly takes into account the shape of the phylogeny by measuring the imbalance of the internal nodes [18]. The estimation of PS, however, uses a parsimony approach to reconstruct the character states at ancestral nodes and calculate the number of state changes in the phylogeny [19]. According to sampling origins, the 285 sequences were divided into five states (Guangdong, Guangxi, Yunnan, Others, and Vietnam). Correlating the five states with phylogenetic positions showed the AI and PS statistics both strongly rejecting the null hypothesis of panmixis [20]. Therefore, association between lineages and geographic origins was supported.
(8) Bayesian Skyline Plots (BSPs). Figure 4 shows the overlapped BSPs estimated based on three models (Table 4). These are flexible, non-parametric estimates of past changes in population size [11], plotted using the ''log'' files and ''trees'' files resulting from three coalescent procedures. Estimated under different models, surprisingly similar patterns of growth were shown. Let's first describe the BSP from the exponential model. A relatively constant population size was maintained from 1980 to around 1994. However, this was breached by a fast exponential growth until 1998. Since 1998, the growth was suddenly slowed and then kept slightly upward till the present. The rapid growth from 1994 to 1998 corresponded to a period when contaminated plasma collection was common in China, which led to about 500,000 blood donors infected with HCV [21,22] and 300,000 donors infected with HIV [23,24]. The abrupt slowing around 1998, however, was concurrent to a time point when the Chinese government outlawed the use of paid blood donors, which went into effect that year [25]. Using slightly lower substitution rates (Table 4), the BSPs under two other models showed both the start and end of the rapid growth a little earlier (Figure 4). However, they remained coincident with the period when for-profit plasma collection was common in China, which began in the early 1990 s and was officially forbidden in 1996 [23,24].

Discussion
In this study, both E1 and NS5B sequences of HCV were determined among 136 drug users. It resulted that 6a was detected in 70 users (51.5%). The identified 6a predominance is consistent with that from previous reports [2,3,6,10,[26][27][28]. These previous reports have collectively shown increasing 6a predominance accounting for a pooled rate of 36.7% [5]. Similar to that we have recently described, the majority of 6a isolates in this study were classified into three clusters, I, II, and III, with more in cluster III than in cluster II and more in cluster I than in both clusters II and III. In addition, two new clusters, VI and VII, were designated with the isolates exclusively from the present study. Including the reference sequences to indicate their geographic origins showed that isolates in clusters II and III were mainly from Guangdong and occasionally from other regions. Isolates in cluster I were not only from a variety of people (IDUs, patients, and blood donors) in Guangdong, but also from blood donors in 12 other provinces: Guangxi, Hainan, Yunnan, Fujian, Jiangxi, Hunan, Hubei, Jiangsu, Zhejiang, Shanghai, Shanxi, and Xinjiang [2,3,6,8,9]. With the addition of clusters IV and V, the distribution of 6a in China is illustrated. However, Vietnam was estimated to be the origin of 6a in China. Phylogeographic analysis indicated that the Guangxi Province, which borders Vietnam, could have been the first region to accept 6a for circulation. Migration from Yunnan, which also borders Vietnam, might be equally important, but was only detected among IDUs in limited regions. From Guangxi, 6a  could have further spread to the Yunnan, Guangdong, Hainan, and Hubei provinces. However, there is evidence to show that only in Guangdong has 6a become a local epidemic, which has made Guangdong the second source region to disseminate 6a strains to other regions. A map of 6a migration in China is therefore portrayed (Figure 3). The importation of 6a from Vietnam to Guangxi could be attributed to the growing communications among residents living on both sides of the Guangxi-Vietnam border, and more importantly, the free exchange of IDU networks in Guangxi and Southeast Asian countries through the known drug trafficking routes. Apart from this, there are three other possibilities: (1) during the late 1970 s, the Guangdong, Guangxi, and Hainan Provinces accepted over 300,000 refugees from Vietnam, of which a fraction of people might have been infected [9]; (2) 6a may be indigenous in Guangdong and Guangxi, since the two regions are geographically adjacent and historically linked to Vietnam; (3) some 6a strains might have come from Hong Kong, which had an alternative route of 6a introduction [29]. In this study, phylogenetic and phylogeographic analyses both revealed a certain degree of differences among seven 6a clusters and several branches, but not comparable with those existing among the Vietnamese isolates at the tree base. Such differences are not sufficient to validate possibilities (1) and (2). Unfortunately, due to lack of a time-stamped sequence dataset from Hong Kong, at the moment we were unable to test possibility (3). The common ancestor of 6a strains in China was dated around 1978. Concurrently, this was the time point when the China-Vietnam  Table 4). In addition to the isolates presented in Figure 1A, sequences determined among 21 blood donors from Vietnam, 22 blood donors from the 12 provinces of China, and 36 IDUs from Liuzhou City in the Guangxi Province [3] were also included. Branches were colored according to their respective geographic origins. Posterior probabilities greater than 0.9 were shown at the respective nodes, while the estimated time points of the origin for the nodes were parenthesized. Below the tree is a time scale from 1960-2010, which measures 6a origin and evolution. doi:10.1371/journal.pone.0028006.g002 Figure 3. Map of HCV 6a migration in China. With origin from Vietnam, 6a was spread to the neighboring Guangxi and Yunnan Provinces for circulation (shown in pink arrows). From Guangxi, 6a was further spread to the Guangdong, Hainan, Yunnan, and Hubei provinces (shown in green arrows), while from Yunnan, 6a was brought by IDUs to the central part of China (shown in a blue arrow). For unknown reasons, 6a has become a local epidemic in Guangdong and caused infections not only among IDUs, but also among blood donors and patients. As the prevalence is increasing, Guangdong has become the second source region to radiate 6a to other regions (shown in red arrows). doi:10.1371/journal.pone.0028006.g003 war occurred and when Guangdong was assigned as the first region of China to adopt a free market economy and open door policy. These could be two historical events to facilitate 6a strains imported from Vietnam (via Hong Kong or other Southeastern countries?) to China (first to Guangxi, Guangdong, or Yunnan?) and implanted in an HCV transmission network (via blood donor or IDU network?). During the decades after 1978, Guangdong has led the country in fast economic development and has now become a ''World Production Center''. The powerful economy has stimulated a large population of IDUs and the prosperity of drug trafficking and trading, as well as countless visitors for business and pleasure. More importantly, tens of millions of migrants or ''farm workers'' have been attracted from rural areas to work in the myriad of factories in the Pearl River Delta, a central region of the Guangdong Province. However, due to a lack of optimal working and living conditions, these migrants are more vulnerable to viral infections such as HCV and HIV [23]. Some of them may have also been involved in a thriving illegal sex industry [7] and drugs, while a few may have been HCV carriers from earlier due to previous contaminated blood donations. During holidays and busy farming seasons, these migrants have frequently traveled back and forth between Guangdong and their rural hometowns; 6a strains may have been widely transported. Blood transfusion was a major risk factor for HCV infection before the virus was identified in 1989. This risk remained high in China even during the mid 1990 s and peaked in 1995. Marked with a scandalous plasma campaign during 1994-1996 in the Henan province, it led to more than 500,000 blood donors that were infected with HCV and 300,000 donors infected with HIV [21][22][23][24][25]. Most of the infections were acquired at paid blood donation stations, some operated by local health officials, where many donors' red blood samples were mixed, the plasma extracted, and the pooled red blood cells transfused into the donors. Similar contaminated donations have been also reported in seven other provinces [23]. This disastrous event in China was vitally indicated in the BSPs, which showed a rapid growth in the HCV-infected population size during 1994-1998. In 1998, the Chinese government outlawed the use of paid blood donors to eliminate this risk [25]. Correspondingly, this was reflected in BSPs with an abrupt slowing in the 6a-infected population size, starting from 1998. However, in the poor rural communities a huge infected population has been established, of whom many have now moved with millions of migrant laborers to cities. Through drugs and sex, they still provide a bridge to the general population, and in the BSPs this could be represented with a slightly increasing slope.

Subjects and specimens
Serum samples were collected in Nov 2010 from 210 drug users retained in a mandatory detoxification center in Shanwei City, Guangdong Province, China. The ethical review committee of the Guangzhou Blood Center had approved this study. Guidelines set by this committee were strictly followed and written informed consent was obtained from each participant. Before bleeding, a questionnaire was answered by each user. Questions included the home address, age, gender, marital status, education, ethnic and geographic origins, histories of blood transfusion and surgery and other exposures, time of starting, the duration, and the method of drug use.

Anti-HCV and NAT assays
Anti-HCV was assayed as previously described [9]. NAT for HCV RNA was tested on the Procleix TIGRIS system (Novartis Diagnostics, Emeryville, CA, USA) following the manufacturer's protocol. DNA amplification, sequencing, and phylogenetic analysis HCV E1 and NS5B genes were amplified and sequenced as previously described [8,9]. To confirm multiple HCV infections, amplicons were cloned into a pT-Adv vector (Advantage PCR cloning kit; Clontech). A number of clones were picked and plasmids sequenced [30]. According to HCV subtypes, the resulting sequences were aligned with references using the CLUSTAL_X program. Phylogenetic trees were estimated using the maximum likelihood method under the GTR+I+C 6 substitution model.

Evolutionary analysis of 6a sequences
For coalescent analyses, time-stamped E1 sequences were further determined among 21 blood donors from Vietnam, 22 blood donors from 12 provinces of China, and 36 IDUs from Liuzhou City in the Guangxi Province [3]. In addition, 139 sequences were retrieved from Genbank. Together with those determined in this study, a total of 285 6a isolates were represented. They all had their sampling dates (from 1992 to 2010) and geographic origins recorded and E1 gene sequences (up to 519 nt) available to compile the dataset. Information from Genbank, from original publications, and from our personal communications were used to verify the sampling dates, to ensure that no sequences were from experimental animals, engineering, no multiple quasispecies were included, nor those that were from single individuals at different time points. Because the change in HCV-infected population size was an object of study, closely related 6a sequences from single epidemic scenarios were not excluded except when they were identical. After alignment and visual corrections, the most appropriate site model was determined using the program jModelTest [12].
Based on the 285 sequences, the evolutionary rate and epidemic history was estimated using the BEAST package [13]. Each MCMC sampling was performed for 200 million states, sampling a tree every 20,000 states. Prior to MCMC, selection was made for (1) site model, (2) demographic model, and (3) clock model. For (1), the GTR+I+C model was revealed to be the best among 24 models, based on the criteria from jModelTest. For (2), although the constant size, exponential growth, logistic growth, expansion growth, and BSP all are selectable, a previous study has shown that, excluding BSP, other models consistently performed poorly with HCV sequences [14]. Therefore, we did not test these models but only used the BSP in this study. For (3), we tested all three models: exponential, lognormal, and strict, in combination with BSP respectively, for selecting the best-fitting combination. To analyze the results from the MCMC process, a program Tracer was used [13]. We used it to estimate the marginal likelihoods of the trace, check the trace distribution for convergence, and determine if sufficient ESS (.200) was achieved. We also used it to compute a BF for each pair of models for selecting the statistically best. In addition, we used it to reconstruct BSP to show the epidemic history of HCV. Through the MCMC process, the BSP generalizes skyline plots from the targeted sequence dataset, then samples the distribution of the generalized skyline plots, and combines the plots to yield a posterior distribution of effective population size through time. Since credibility intervals are provided for effective population size at every point in time, from the present back to the most recent common ancestor of the sampled sequences, a past history of HCVinfected population size is inferred [11].
After discarding 10% of burn-in, a time-scaled phylogeny was estimated from a posterior distribution of trees that were generated under a given model combination. We used the TreeAnnotator program to construct this phylogeny, which best summarizes the set of credible trees and is called the MCC tree. Since a molecular clock was used in the MCMC process, the branch lengths and node height of the tree are in units of years. Phylogeographic structure was then displayed using the FigTree program, and clades and lineages were colored according to their geographic origins [13].
In addition, the Befi-BaTS program was used to test for the presence of statistically significant phylogeographic structure (www.lonelyjoeparker.com/?tag = befi-bats). This was done through a parametric bootstrap process, which randomizes the association a large number of times, calculates statistics from each randomization, and provides a null distribution of the statistics with 0.95 credible intervals. Against this level the observed statistics are compared. Here, the null hypothesis of panmixis assumes that there is no correlation between phylogeny and taxa location [19], and the randomization is performed against a series of tree-shaped statistics. We performed randomizations across a posterior distribution of trees generated from the MCMC process under a given coalescent model. During the bootstrapping process, the phylogenetic uncertainties were correctly incorporated and phylogeographic structure tested [20].

Nucleotide sequence accession numbers
The sequences reported in this paper have been deposited in GenBank with the following accession numbers: JF720958-JF721322.

Supporting Information
Figure S1 Subtype 1b phylogenies estimated from (A) E1 and (B) NS5B region sequences, corresponding to H77 nucleotide positions of 869-1289 and 8276-8615, respectively. Subtype 1a sequence M62321 was used as an outgroup. Green pies label sequences from our previous studies [8,9]. Red and yellow pies label sequences from this study, in which yellow pies mark isolates from IDUs with multiple HCV infections. Sequences without pies were retrieved from Genbank. In each tree, six rectangles highlight the further classification of 1b isolates into A, B, C, D, E, and SW clusters. Scale bar represents 0.05 nucleotide substitutions per site. Bootstrap support values are shown in italics. (TIF) Figure S2 Subtype 3a phylogenies estimated from (A) E1 and (B) NS5B region sequences, corresponding to H77 nucleotide positions of 869-1289 and 8276-8615, respectively. Subtype 3b sequence D49374 was used as an outgroup. Two geographic clusters were shown with sequences from Hubei [2] and Yunnan [10] provinces to compare with a geographic cluster from Guangdong. Otherwise, all labels are the same as those described in the Figure 1