Phylogeographic Analyses Reveal a Crucial Role of Xinjiang in HIV-1 CRF07_BC and HCV 3a Transmissions in Asia

Background China faces an increasing prevalence of two HIV-1 circulating recombinant forms (CRFs) 07_BC and 08_BC. Both CRFs_BC were previously demonstrated to originate in Yunnan and spread to Liaoning from Yunnan via injection drug use (IDU) in China. Supposing it is true, we are unable to answer why only CRF07_BC, rather than both CRFs_BC together, was transmitted to Xinjiang. Methodology/Principal Findings We investigated the phylogeography of CRF07_BC and CRF08_BC using multiple HIV-1 genomic regions with Bayesian phylogeography method. Phylogenetic reconstructions showed that all CRF07_BC sequences were divided into two clades, Yunnan and Xinjiang, and all strains from other regions of mainland China clustered within the Xinjiang clade. Significant geographic diffusion links of Xinjiang with other regions (including Liaoning, Beijing, Jiangsu and Guangdong) were supported by Bayes factor tests. The temporal dynamics analyses showed that CRF07_BC spread from Xinjiang to Liaoning in 1996.10, and to Jiangsu in 2000.9. The analyses of CRF08_BC not only confirmed the previous conclusion on temporal and spatial dynamics of CRF08_BC, but also indicated that the CRF08_BC strains from Guangdong and Shanghai originated from Yunnan. The analyses of HCV 3a showed that it was introduced into Xinjiang in the early 1980s, and spread from Xinjiang to Yunnan in 1990.10 and to Jiangsu in 1999.2, and further from Yunnan to Guangxi in 1995.3. The temporal and spatial dynamics of HCV 3a were similar to some extent to that of HIV-1 CRF07_BC and/or CRF08_BC, suggesting a possible association in migration patterns between HCV and HIV-1 through IDU. In addition, HCV 3a spread from Xinjiang to Pakistan, implying a drug trafficking route linking them. Conclusions/Significance Xinjiang, as the most important transfer station for drug trafficking from Golden Crescent to other regions of China, plays a very crucial role in the transmission of viruses (e.g., HIV-1 and HCV) through IDU in Asia.


Introduction
Yunnan Province of China borders the 'Golden Triangle' region (one of the world's largest heroin-producing areas) and is the most important epicenter for heroin smuggling into China and other Asia countries or regions [1,2]. Because of the special location of Yunnan, human immunodeficiency virus type 1 (HIV-1) subtypes B' (Thai-B subtype) and C that originated in Thailand and India, respectively, were introduced into Yunnan via drug-trafficking routes and resulted in the initial HIV-1 outbreak among injection drug users (IDUs) in Yunnan in the late 1980s [3,4,5]. The co-circulation of both subtypes further resulted in the on-going generation of HIV-1 B'/C intersubtype recombinants (referred to as unique recombinant form (URF) or circulating recombinant form (CRF)) in Yunnan [6,7,8]. For example, several new B'/C recombinants were identified in southwestern region (Dehong Prefecture) of Yunnan that borders Myanmar [6,7,8]. HIV-1 CRF07_BC and CRF08_BC are the descendents of Thai subtype B' and India subtype C [9,10]. Both of them were originally found among IDUs in Xinjiang and Guangxi, China, in 1997, respectively [9,10]. Since then, both CRF07_BC and CRF08_BC were demonstrated to be predominant strains co-circulating in Yunnan [6,8]. Based on these observations, both CRFs_BC were naturally thought to originate in Yunnan.
To address the hypothesis that HIV-1 CRF07_BC and CRF08_BC originated in Yunnan,  investigated the temporal and spatial dynamics of both CRFs_BC across East Asia [11,12,13]. Based on the estimation of the time to the most recent common ancestor (tMRCA) and the reconstruction of ancestral relationships, they clearly confirmed the migration pattern of CRF08_BC, in which CRF08_BC originated in Yunnan in 1990 and further spread northeastward to Liaoning and eastward to Guangxi around 1995. In addition, they inferred that CRF07_BC also originated in Yunnan in 1993, and further concluded that the migration of CRF07_BC from Yunnan to Xinjiang (northwest), Liaoning (northeast), and Taiwan (east) were via different drug trafficking routes [11,12].
The origin and migration route of CRF07_BC, however, still remain elusive. If the previous conclusions on the migration routes of CRF07_BC and CRF08_BC described by Tee et al. [11,12] are correct, there remain two questions that cannot be answered objectively. Firstly, both CRFs_BC could be transmitted northeastward to Liaoning from Yunnan. However, why was only CRF07_BC transmitted northwestward to Xinjiang, but not CRF08_BC? Secondly, CRF08_BC originated in Yunnan in 1990, earlier than CRF07_BC. Until 1995, CRF08_BC started to spread to Guangxi (the neighboring province of Yunnan) and Liaoning. However, why CRF07_BC was able to spread rapidly to the regions (Xinjiang and Liaoning) far away from Yunnan in the same year (1993), shortly after when it originated in 1993?
Bayesian phylogeography method, a probabilistic method developed recently by Lemey et al., is more capable of describing the most plausible scenario of geographic migration than the previous methods [14]. To address this issue, here we used the Bayesian phylogeography method to revisit the origin and phylogeography of HIV-1 CRF07_BC and CRF08_BC based on multiple HIV-1 genomic regions with more tip-dated sequences. Our results show that CRF07_BC spread from Xinjiang, rather than Yunnan, to other regions of mainland China, and the geographic origin of CRF07_BC is more complex than previously thought. In addition, additional analysis shows that hepatitis C virus (HCV) 3a also spread from Xinjiang to other regions of China and even to Pakistan. These results indicate that Xinjiang plays a very crucial role in transmission of some viruses through injection drug use in Asia.

Results
HIV-1 CRF07_BC and CRF08_BC have subtype C backbones with several insertions of subtype B'. In previous studies, only one fragment of subtype C origin was selected to investigate the spatial and temporal spread of CRF07_BC and CRF08_BC [11,12]. In order to obtain overall insight into the origin and migration of CRF07_BC and CRF08_BC in Asia, here we selected multiple non-recombinant genomic regions of subtype C or B' origin, located in gag, gag-pol and env gene regions, to perform phylogeographic analyses. Two subtype C-originated fragments in gag region of CRF07_BC (HXB2 793-1211 nt and 1462-2061 nt) are separated by a short subtype B' insertion. Since more informative sites in analyzed sequences can enhance the accuracy of phylogenetic analysis, we performed the partition homogeneity test using PAUP* [15] to evaluate whether two split subtype C fragments (HXB2 793-1211 nt and 1462-2061 nt) can be merged into one continuous sequence. The results show that the two split fragments of CRF07_BC were homogeneous (p = 0.64). A similar result (p = 0.57) was also observed in two fragments (HXB2 794-1209 nt and 1709-2826 nt) of CRF08_BC. These results indicate that the two subtype C fragments in CRF07_BC or CRF08_BC can be merged into one fragment in phylogenetic analyses.

Phylogeography of HIV-1 CRF07_BC
Phylogenetic reconstruction of the merged gag region of subtype C origin (Fig. 1A) shows that all CRF07_BC sequences were divided into two clades. All strains (red) from Yunnan clustered together, forming the Yunnan clade. Except Yunnan strains, only one strain from Taiwan was included in this clade, indicating a close genetic relationship between Yunnan and Taiwan strains. Intriguingly, all strains from other regions of mainland China, including Xinjiang, Liaoning and Jiangsu, clustered together and formed another clade with a root in Xinjiang (black). It indicates that the strains from at least Liaoning and Jiangsu have close evolutionary associations with Xinjiang strains. To test whether there are some statistically significant geographic diffusion links (BF.3) between these strains, the Bayes factor (BF) test was performed. Three significant geographic links were clearly observed between Xinjiang and Liaoning (BF = 234.944), between Liaoning and Jiangsu (BF = 18.592), and between Yunnan and Taiwan (BF = 7.040) ( Table 1). These results strongly argue against the previous conclusion on the migration route of CRF07_BC from Yunnan to Xinjiang and Liaoning.
The genome of HIV-1 CRF07_BC contains some inserted fragments of subtype B'. Like the subtype C fragments, the subtype B' fragments should also retain the evolutionary characteristics reflecting the migration routes of CRF07_BC. Therefore, we extended our analyses to the gag-pol region of subtype B' origin ( Fig. 1B). Similar to the results of subtype C-originated fragment, two clades, the Xinjiang and Yunnan clades, were also observed in the tree of the subtype B' fragment of CRF07_BC. Except one Liaoning strain that clustered within the Yunnan clade, all Liaoning and Jiangsu strains clustered closely within the Xinjiang clade, supporting the close evolutionary associations of the Liaoning and Jiangsu strains with the Xinjiang strains. Further BF test shows that geographic diffusion links between Xinjiang and Liaoning (BF = 28.695), between Xinjiang and Yunnan (BF = 7.209), and between Yunnan and Taiwan (BF = 5.619) are statistically significant ( Table 1).
The results of both subtype B' and C fragments in gag-pol region clearly indicate that the CRF07_BC strains circulating in Liaoning and Jiangsu originated from Xinjiang, rather than from Yunnan. Because of the limited gag-pol sequences available for CRF07_BC, however, it is unclear whether CRF07_BC from other regions of mainland China also originated from Xinjiang. Since a great deal of C2V3 sequences were available for a relatively wide range of locations across mainland China, to address this issue, we further extended our analyses to the C2V3 fragment in env region of CRF07_BC. These locations cover Xinjiang, Yunnan, Jiangsu, Beijing, Guangdong, Heilongjiang, and Taiwan.
Phylogenetic reconstruction of the env region shows that the CRF07_BC sequences were still divided into two clades of Xinjiang and Yunnan. All strains from Taiwan still clustered with the Yunnan strains, firmly confirming the migration route of CRF07_BC from Yunnan to Taiwan. Except the strains from Yunnan and Taiwan, all strains from other regions of mainland China clustered within the Xinjiang clade and were rooted with the Xinjiang strains (Fig. 1C). The BF values show statistically significant geographic diffusion links between Xinjiang and Jiangsu (BF = 280.206), and Beijing (BF = 25.129), and between Yunnan and Taiwan (BF = 34.702) ( Table 2). In this analysis, only 2 Yunnan sequences were isolated in and before 2000, whereas 34 Xinjiang sequences were isolated before 2000. Because the use of more early CRF07_BC sequences from Xinjiang than from Yunnan may lead to a bias in the ancestral state inference of CRF07_BC strains in phylogeographic analysis, we excluded 34 Xinjiang sequences before 2000 and re-performed the analysis. The results are consistent with that obtained using all sequences of env fragment ( Fig. S1A and Table 2), supporting above observation.
All above results not only strongly indicate that unlike previously thought, CRF07_BC spread to other regions of mainland China from Xinjiang, rather than from Yunnan, but also imply that Xinjiang plays a more important role than Yunnan at least in the spatial dynamics of CRF07_BC in mainland China.

Phylogeography of HIV-1 CRF08_BC
HIV-1 CRF07_BC and CRF08_BC are genetically closely related and were considered to be derived from a putative common ancestral B'/C recombinant occurring in Ruili city of Yunnan [7,16]. In order to test whether the migration route of CRF08_BC changed as that of CRF07_BC, we extended phylogeographic analyses to CRF08_BC datasets, which cover the sequences from more regions of China.
The MCC trees constructed based on the gag-pol region of subtype C origin ( Fig. 2A) and the gag region of subtype B' origin ( Fig. 2B) consistently show that all CRF08_BC sequences were grouped in a single clade with high posterior probabilities (PP = 1.00). The strains from Guangxi, Liaoning, and Gansu were rooted with the Yunnan strains. The BF tests show the geographic diffusion of CRF08_BC in Guangxi, Liaoning, and Gansu were significantly linked with Yunnan (BF.3) ( Table 3). These results clearly indicate that CRF08_BC spread to Guangxi and Liaoning from Yunnan.
Because a large number of the subtype C fragments in pol region of CRF08_BC were available, we reconstructed the MCC tree of     (Table 4). These results suggest that besides Liaoning and Guangxi, other regions of China, such as Gansu, Guangdong, Shanghai and Hainan, also have significant geographic diffusion links with Yunnan in the migration of CRF08_BC.

Origin of HIV-1 CRF07_BC and CRF08_BC
Although we demonstrated that CRF07_BC spread to other regions of mainland China from Xinjiang, the geographic origin of CRF07_BC still remains elusive. In ancestral reconstructions, the subtype C fragment in gag region and the subtype B' fragment in gag-pol region show that the geographic origin of CRF07_BC was Yunnan ( Fig. 1A and B), whereas the subtype C fragment in env region shows that the geographic origin was Xinjiang ( Fig. 1C and S1A). The discordance in the ancestral reconstructions of CRF07_BC between different genomic regions is also reflected by the root state posterior probabilities for the locations (Fig. 3A). These results at least suggest that the CRF07_BC strains circulating in China are unlikely to simply derive from a common ancestor that existed in Yunnan in 1993.
Different from CRF07_BC, all ancestral state reconstructions of four analyzed CRF08_BC fragments show that the geographic origin of the CRF08_BC strains circulating in China were pointed toward Yunnan (Fig. 3B). These results firmly confirm the previous conclusion that CRF08_BC originated in Yunnan.

Evolutionary rates and temporal Spread of HIV-1 CRF07_BC and CRF08_BC
We used BEAST package to estimate the evolutionary rates and the dates of tMRCA of HIV-1 CRF07_BC and CRF08_BC. The results show that the evolutionary rates of different genomic regions of CRF07_BC or CRF08_BC were various (Table 5). Since env gene is the most hypervariable region of the HIV-1 genome, the env fragments (HXB2 7095-7328n) of both CRF07_BC and CRF08_BC unsurprisingly exhibited the highest evolutionary rates [17]. In addition, the evolutionary rates of the subtype B' fragments were higher than those of the merged subtype C fragments. The dates of tMRCA inferred based on different fragments also appear to be different. For example, the dates of tMRCA inferred for CRF07_BC and CRF08_BC based on the subtype B' fragments were later than those based on the merged subtype C fragments. The difference in the dates of tMRCA may be ascribed to the intersubtype recombination.

Phylogeography and temporal dynamics of HCV 3a
The transmissions of HIV-1 CRF07_BC and CRF08_BC in China are closely associated with drug trafficking routes [1,18]. Like HIV, HCV can be transmitted by injection drug use (IDU), sexual contact and mother-to-child infection, and IDU is the predominant mode of HCV transmission in Asia [19,20,21,22]. Our recent study [23] revealed that the HCV 3a strains circulating among Jiangsu IDUs were evolutionarily associated with the strains from Xinjiang IDUs, suggesting a potential geographic diffusion links between Xinjiang and Jiangsu, which is similar to the observation in HIV-1 CRF07_BC migration described here. Because of sharing a common transmission pattern (i.e. IDU), the coinfection with HIV-1 and HCV is very common among IDUs [24]. Therefore, we performed phylogeographic analysis using the E1/E2 fragment of HCV 3a to test the potential association in migration between HCV 3a and HIV-1 CRF07_BC and CRF08_BC.    The MCC tree shows that the HCV 3a strains from Jiangsu, Yunnan, and Guangxi clustered within the Xinjiang strains (Fig. 4). The ancestral state reconstruction indicates that Xinjiang was the geographic origin of the HCV 3a strains circulating in China (Fig. 4). The BF tests significantly support the geographic diffusion links between Xinjiang and Jiangsu (BF = 7.180) and between Yunnan and Guangxi (BF = 4.448) ( Table 6). These results at least reveal two migration routes of HCV 3a from Xinjiang to Jiangsu and from Yunnan to Guangxi. In fact, the routes from Xinjiang to Jiangsu and from Yunnan to Guangxi were also observed on HIV-1 CRF07_BC and CRF08_BC migrations, respectively.
In addition, we used BEAST package to estimate the evolutionary rate and the dates of tMRCA of HCV 3a. Since the genomic region of HCV 3a used in the Bayesian coalescence inference located in the hypervariable region, a high evolutionary rate of 7.157610 23 substitutions/site/year was estimated under a constant size population model (  7-1996.9), respectively (Fig. 5).

Discussion
Currently, China faces a rapidly increasing prevalence of two HIV-1 CRFs 07_BC and 08_BC [19,25]. The investigation on temporal and spatial dynamics of both CRFs_BC may provide valuable information for HIV-1 epidemiology, prevention strategy, and vaccine development [12]. Because of the special importance of Yunnan in heroin smuggling and HIV-1 transmission [1,2,4,20], almost all previous studies agreed that both CRFs_BC originated in Yunnan, and were transmitted to other regions of China via different drug-trafficking routes [11,12,13,26].
In this study, we investigated the origin and phylogeography of HIV-1 CRF07_BC based on multiple genomic regions with more tip-dated sequences using Bayesian phylogeographic inference framework [14,27]. Our results show that HIV-1 CRF07_BC spread to other regions (including Liaoning, Jiangsu, Heilongjiang, Beijing, and Guangdong) of mainland China from Xinjiang, and only the CRF07_BC circulating in Taiwan was from Yunnan (Fig. 6A). The dates of CRF07_BC spreading from Xinjiang to Liaoning and from Liaoning to Jiangsu were estimated at 1996 and 2000, respectively. Because HIV-1 CRF07_BC spread from Xinjiang to other regions (especially Liaoning) of mainland China, the two unanswerable scientific questions mentioned in the Introduction, namely why was only CRF07_BC transmitted northwestward to Xinjiang, but not CRF08_BC and why CRF07_BC was able to spread rapidly to the regions (incl. Xinjiang) far away from Yunnan so shortly after it originated are resolved. In addition, the analyses of the gag fragment of subtype C origin and the gag-pol fragment of subtype B' origin show that the geographic origin of CRF07_BC was Yunnan, whereas the analyses of the env fragment of subtype C origin suggest that Xinjiang was the origin region of CRF07_BC. These indicate that besides Yunnan, Xinjiang might also be the candidate of the geographic origin of CRF07_BC. In spite of this, the geographic origin of HIV-1 CRF07_BC still needs to be determined with more HIV-1 CRF07_BC strains from the regions (i.e. Gansu and Sichuan) between Xinjiang and Yunnan.
Although these results do not concur with Tee et al. [11,12], in which CRF07_BC was considered to originate in Yunnan and spread from Yunnan to other regions of China (i.e. Xinjiang, Liaoning and Taiwan), we believe that our results are more reliable and accurate. First, the Bayesian method used in this study has been demonstrated to be more robust of describing the most plausible scenario of geographic migration of infectious disease than the model-free heuristic approaches implemented in PAUP* [14], although the latter was previously used to investigate the spatial dynamics of viruses, such as HIV-1 [28], HCV 1b [29], and influenza A H5N1 virus [30,31]. Different from previous studies, in which the inferences of origin and phylogeography were only based on the phylogenetic trees [11], here we determined the origin and phylogeography of CRF07_BC and CRF08_BC via the reconstruction of ancestral geographic states and the BF tests.
Second, the selection of the representative sequences is very important for phylogeographic study. In this study, we added more tip-dated CRF07_BC sequences representing a relatively wide range of locations across China into our analyses. These sequences include CRF07_BC strains sampled in the early CRF07_BC epidemic and a pure subtype C sequence isolated in the early HIV-1 epidemic in Ruili, Yunnan [7]. Third, in order to obtain a more comprehensive scenario of the geographic migrations of CRF07_BC and CRF08_BC, we selected multiple non-recombinant genomic regions (gag, gag-pol and env) of subtype C or B' origin to perform phylogeographic analyses, compared with the single genomic region to be used in the previous studies. Further, two loci of subtype C origin located in the gag or gag-pol region were merged into one fragment to enhance the accuracy of phylogenetic analyses.
We performed phylogeographic analyses of CRF08_BC using the same method. The results not only clearly confirm the previous conclusion that CRF08_BC originated from Yunnan in the early 1990s and spread to Guangxi and Liaoning in mid-1990s, but also show that the CRF08_BC isolates from Guangdong and Shanghai have close genetic relationships with the strains from Yunnan ( Fig. 2C and 6B), further strengthening the importance of Yunnan in the transmission of CRF08_BC in China. Furthermore, we also analyzed the temporal and spatial dynamics of HCV 3a in Asia because it shares same transmission patterns with HIV-1 and is predominantly transmitted through IDU in Asia [19,20]. The results show that HCV 3a was initially introduced into Xinjiang in 1981.3 and subsequently dispersed into Yunnan and Jiangsu in 1990.10 and 1999.2, respectively (Fig. 6B). Although HCV 3a and HIV-1 CRF07_BC emerged in Xinjiang at different times (1981 and 1993, respectively), both of them migrated to Jiangsu during a very short time interval (Fig. 5). In addition, the migration date of HCV 3a from Yunnan to Guangxi is also very close to that of HIV-1 CRF08_BC (Fig. 5). The similarities in temporal and spatial dynamics between HCV 3a and HIV-1 CRF07_BC and/ or CRF08_BC suggest a possible association in migration patterns between different viruses through same transmission patterns (e.g. IDU). On the other hand, the fact that both HIV-1 CRF07_BC and HCV 3a were transmitted from Xinjiang to other regions of  China indicates that besides Yunnan, Xinjiang also plays a crucial role in the transmission of viruses (at least HIV-1 CRF07_BC and HCV 3a) in Asia through IDU. Xinjiang borders India and Pakistan and is near the 'Golden Crescent' region (another of the largest heroin-producing areas in the world). Like Yunnan, Xinjiang also possesses a geographic precondition to be an important entry point for heroin smuggling into China and other Asia countries or regions [2,20]. However, people often associated the HIV-1 transmission among IDUs with overland heroin trafficking routes through Yunnan, and rarely recognized the role that Xinjiang plays in drug-trafficking and even the transmission of infectious viruses. The reason might be that Yunnan borders the 'Golden Triangle' region and in 1989, the first HIV-1 outbreak was detected among IDUs in Ruili city of Yunnan [4,32]. From the MCC tree of HCV 3a, we found that two strains isolated from Pakistan in 2008 clustered within the Xinjiang clade (Fig. 4). This implies that there is a drug trafficking route linking Xinjiang and Pakistan (near the Golden Crescent), and along this route, HCV 3a was transmitted from Xinjiang to Pakistan. Furthermore, the phylogeography of HIV-1 CRF07_BC and HCV 3a clearly show that the HIV-1 CRF07_BC and HCV 3a strains circulating in other regions of mainland China were transmitted from Xinjiang, supporting the presence of multiple drug trafficking routes linking Xinjiang and other regions of mainland China. Therefore, as the most important transfer station for drug trafficking from Golden Crescent to China and other Asia countries or regions, the role of Xinjiang in transmission of infectious viruses such as HIV-1 and HCV in Asia should be especially considered.
Nine pure subtype C sequences from India, 1 from Myanmar and 1 from Ruili city of Yunnan province were used as the outgroup founder strains in the analyses of subtype C parts of CRF07_BC and CRF08_BC. Moreover, 9 pure subtype B' sequences from Thailand and 1 from Myanmar were used as the outgroup founder strains in the analyses of subtype B' part of CRF07_BC and CRF08_BC.
The datasets of HIV-1 CRF07_BC, CRF08_BC and HCV 3a in this study are available as Dataset S1, Dataset S2 and Dataset S3, respectively.

Phylogenies and temporal dynamics analyses
All sequences were aligned using CLUSTAL W program implemented in MEGA 4.0 [34] and then manually edited. Maximum clade credibility (MCC) trees were constructed using a MCMC (Markov Chain Monte Carlo) method implemented in the BEAST v1.5.4 package, which incorporates time-of-sampling information and returns rooted trees [27]. All of these trees were viewed and edited using FigTree v1.3.1 (tree.bio.ed.ac.uk/ software/figtree/). BEAST was also used to estimate the evolutionary rates and the dates to tMRCA of various nodes on the MCC tree [35]. A relaxed molecular clock with an uncorrelated log-normal distribution [36], a Hasegawa-Kishino-Yano (HKY) nucleotide substitution model [37] with a gammadistributed model of among site rate variation using four rate categories (C4) [38], and a constant population size model were used in the Bayesian coalescence analyses [39]. Statistical uncertainty in parameter estimates was reflected by the values of the 95% highest posterior density (HPD) credible region (CR). Each MCMC analysis was run for at least 20 million generations, with sampling every 10,000 generations. The initial 25% of the samples were discarded as burn-in, leaving 1501 trees per run, when we summarized the trees using TreeAnnotator implemented in the BEAST v1.5.4 package. Posterior probabilities for the internal nodes were calculated from the posterior density of trees. The program Tracer v1.4.1 (tree.bio.ed.ac.uk/software/tracer/) was used to check for the convergence and to determine whether effective sample size (ESS) .200. The posterior densities were calculated with 10% burn-in using Tracer v1.4.1. If the effective sample size is less than 200, the MCMC chain length would be elongated to 200 million.

Phylogeographic analyses
Each sequence was assigned a character state reflecting its sampling location firstly. The Bayesian phylogeographic inference framework was performed to analyze the strength of the movement between geographic locations using a geographically explicit Bayesian MCMC method implemented in the BEAST v1.5.4 package [14,27]. This method can be used to infer the location state of the ancestral branch over the whole tree and to build a reversible diffusion rate matrix between previously defined locations accompanied with the evolutionary and coalescent parameters [14,40,41,42,43,44,45]. We set up a discrete phylogeographic analysis using a standard continuous-time Markov chain [46]. To investigate the phylogeographic diffusion process along the posterior sets of the trees, the relationships between these locations were identified using the Bayesian stochastic variable search selection. The Bayes factor (BF) test that determines the statistically significant phylogeographic links was performed using the RateIndicatorBF tool, which was recently added into the BEAST code [14]. If BF.3, the phylogeographic link between two locations was considered to be statistically significant [14]. Moreover, the posterior probabilities for the ancestral geographic states were also calculated from the posterior density of trees summarized by using TreeAnnotator implemented in the BEAST v1.5.4 package [27]. Figure S1 The env MCC trees of HIV-1 CRF07_BC and CRF08_BC sequences. Ancestral geographic states were reconstructed using Bayesian phylogeographic inference framework implemented in the BEAST v1.5.4 package. The tree branches are colored according to their respective geographical locations. The purple solid nodes on the trees represent the most recent common ancestor (MRCA) of CRF07_BC or CRF08_BC.