Evolutionary history and spatiotemporal dynamics of the HIV-1 subtype B epidemic in Guatemala

Different explanations exist on how HIV-1 subtype B spread in Central America, but the role of Guatemala, the Central American country with the highest number of people living with the virus, in this scenario is unknown. We investigated the evolutionary history and spatiotemporal dynamics of HIV-1 subtype B in Guatemala. A total of 1,047 HIV-1 subtype B pol sequences, from newly diagnosed ART-naïve, HIV-infected Guatemalan subjects enrolled between 2011 and 2013 were combined with published subtype B sequences from other Central American countries (n = 2,101) and with reference sequences representative of the BPANDEMIC and BCAR lineages from the United States (n = 465), France (n = 344) and the Caribbean (n = 238). Estimates of evolutionary, demographic, and phylogeographic parameters were obtained from sequence data using maximum likelihood and Bayesian coalescent-based methods. The majority of Guatemalan sequences (98.9%) belonged to the BPANDEMIC clade, and 75.2% of these sequences branched within 10 monophyletic clades: four also included sequences from other Central American countries (BCAM-I to BCAM-IV) and six were mostly (>99%) composed by Guatemalan sequences (BGU clades). Most clades mainly comprised sequences from heterosexual individuals. Bayesian coalescent-based analyses suggested that BGU clades originated during the 1990s and 2000s, whereas BCAM clades originated between the late 1970s and mid 1980s. The major hub of dissemination of all BGU, and of BCAM-II, and BCAM-IV clades was traced to the Department of Guatemala, while the root location of BCAM-I and BCAM-III was traced to Honduras. Most Guatemalan clades experienced initial phases of exponential growth (0.23 and 3.6 year-1), followed by recent growth declines. Our observations suggest that the Guatemalan HIV-1 subtype B epidemic is driven by dissemination of multiple BPANDEMIC founder viral strains, some restricted to Guatemala and others widely disseminated in the Central American region, with Guatemala City identified as a major hub of viral dissemination. Our results also suggest the existence of different sub-epidemics within Guatemala for which different targeted prevention efforts might be needed.


Introduction
It is commonly accepted that the HIV-1 subtype B epidemic likely moved out of Africa via a single introduction to Haiti in the mid 1960s [1]. This event was followed by a wide spread of phylogenetically distinct Caribbean-specific lineages (B CAR ) throughout the Caribbean islands [1][2][3], with non-pandemic B CAR lineages accounting for more than 40% of infections in the region [3]. A pandemic B lineage (B PANDEMIC ), encompassing most subtype B infections around the world, subsequently emerged in the late 1960s [1], spreading from the Caribbean to North America and later expanding to other parts of the world [1,4,5], including possible re-introductions to the Caribbean [6].
The precise routes of introduction of HIV-1 subtype B to Central America remain to be clarified. One study suggests that a single early introduction of a B PANDEMIC strain possibly to Honduras during the 1960s originated a Central American lineage (B CAM ) that accounts for approximately 60% of current subtype B cases in the region [7]. Other sub-epidemics originating mostly during the 1970s and early 1980s were generated by the spread of additional founder viruses that tended to cluster according to the country of origin [6,7]. A more detailed study in Panama, however, showed that the subtype B epidemic in this country is mostly driven by the expansion of multiple founder B PANDEMIC and B CAR strains, that have remained mostly restricted to this country [8].
Guatemala is the country comprising the highest number of individuals estimated to be living with HIV (49,000) in Central America, with prevalence in the general population of 0.5% [9]. The Guatemalan HIV epidemic is concentrated in most-at-risk groups, with 8.9% prevalence in men who have sex with men (MSM), 23.8% in transgender women and 1.0-2.6% in female sex workers (FSW) [10]. It is estimated that men who have sex with men comprise approximately 30% of the total estimated number of persons living with HIV in Guatemala and that heterosexual transmission accounts for 68% of cases [11]. The first AIDS case in Guatemala was recorded in 1984 in a young homosexual man who had resided in the United States, and cases in women were not reported until 1986 [12]. Geographic distribution of the Guatemalan HIV epidemic reflects economic development with the involvement of departments with higher commercial activity and with marked migratory routes, including Guatemala, Escuintla, San Marcos, Retalhuleu, Quetzaltenango, Izabal, Petén and Suchitepéquez, which concentrate 76% of notified cases [12]. A higher proportion of cases is found in Ladino (Mestizo) populations (77%) compared to Maya (indigenous) populations (20%), and 60% of infected individuals are male (approximately 60% of the general population is considered Ladino and 40% indigenous) [10]. Nevertheless, the molecular epidemiological history of HIV has not been assessed in Guatemala. For rapidly evolving pathogens such as HIV, phylogenetic methods can contribute to understand the complex dynamics of viral spread at the population level. Moreover, molecular epidemiology studies have the potential to better inform targeted public health interventions to contain the HIV epidemic at the local, national, or regional context, by identifying key clinical, demographic and geographical correlates of transmission. This information can help to focus limited resources to more effective HIV prevention efforts, including for example, the active search for new infection cases to link them to care and the roll out of pre-exposure prophylaxis [13].
The aim of the present study was to describe the spatiotemporal dynamics of dissemination of HIV subtype B in Guatemala, assessing its possible evolutionary relations with other viruses circulating in the Americas. The study was performed on a large HIV pol sequence dataset from individuals enrolled from 2010 to 2013 at the Roosevelt Hospital, one of the national HIV reference centers.

Ethics statement
This study was evaluated and approved by the Ethics Committees of the National Institute of Respiratory Diseases (INER) in Mexico City (E06-09), and the Roosevelt Hospital in Guatemala City, and was conducted according to the principles of the Declaration of Helsinki. All participants gave written informed consent before blood sample donation.

Guatemalan HIV-1 pol sequences
Blood samples from 1,083 HIV-1-infected Guatemalan individuals were collected at the Infectious Diseases Clinic of the Roosevelt Hospital in Guatemala City from October 2010 to December 2013 as part of an HIV pre-treatment drug resistance study [14]. Briefly, antiretroviral treatment-naïve individuals were enrolled in a cross-sectional study according to new clinical diagnoses at spontaneous demand to the clinic, also including individuals on followup visits, previous to starting antiretroviral treatment. Every eligible person attending the clinic was invited to participate in the study. No exclusion criteria were applied except for previous exposure to antiretroviral drugs. After giving written informed consent, participants donated a single blood sample. The complete protease (PR) and the first part of the reverse transcriptase (RT) of the pol gene (nucleotides 2253 to 3275 of reference strain HXB2) were amplified and sequenced using an in-house developed methodology as previously described [14]. Sequences used in the study are available at S1 Data. Serving 30% of all HIV-infected persons in Guatemala, the Roosevelt Hospital is the largest of a total of 18 clinics providing antiretroviral treatment in the country [15]. Although, the Roosevelt Hospital is located in the Department of Guatemala, the persons that initiate ART in this institution come from all over the country [16], with 45% of clients from other departments [14]. The place of residence distribution of Roosevelt Hospital's clients reflects the national distribution of HIV cases [11,16]. Also, sociodemographic characteristics (including female-to-male ratio, mean age, race, marital status and literacy) of Roosevelt Hospital's clients is similar to those of the national HIV-infected population [16,17].

HIV-1 subtype B pol reference sequences
The HIV-1 subtype B pol sequences from Guatemala were aligned with: 1) subtype B pol sequences representative of the B PANDEMIC /B CAR clades (US/France = 809/Caribbean = 238) [3,8]; 2) Panamanian subtype B pol sequences representative of the country-specific clades (n = 583) [8]; and 3) all available subtype B pol sequences from other Central American countries (n = 694) and Mexico (n = 824) that matched the above selected genomic region (S1 Table). All HIV-1 subtype B pol reference sequences used in this study were available at Los Alamos HIV Sequence Database (www.hiv.lanl.gov) by December 2014.

Sequence alignment and subtype assignment
Sequences were aligned using the ClustalW program implemented in Mega 6.0 software [18,19]. The subtype assignment of all sequences included was confirmed using REGA HIV subtyping tool v.2 [20] and by performing Neighbor-Joining (NJ) phylogenetic analyses with HIV-1 group M reference sequences downloaded from Los Alamos HIV Sequence Database. The NJ phylogenies were constructed under the Tamura-Nei evolutionary model and the reliability of tree topologies was assessed by bootstrap analysis with 500 replicates using the MEGA 6.0 software package [19]. Bootstrap values above 75% were considered significant. All sites associated with major antiretroviral drug resistance mutations in PR (30,32,46,47,48,50

Phylogenetic analysis
Maximum Likelihood (ML) phylogenetic trees were inferred under GTR+I+Γ 4 nucleotide substitution model selected using the jModeltest program [21]. The ML tree was reconstructed with the PhyML program [22] using an online web server [23]. Heuristic tree search was performed using the SPR branch-swapping algorithm and the reliability of the obtained topology was estimated with the approximate likelihood-ratio test (aLRT) [24] based on the Shimodaira-Hasegawa-like procedure. aLRT values above 0.8 were considered significant. The ML trees were visualized using the FigTree v1.4.0 program [25].

Analysis of spatiotemporal dispersion patterns
The evolutionary rate (nucleotide substitutions per site per year, subst./site/year), the age of the most recent common ancestor (T mrca, years) and the spatial diffusion of HIV-1 Guatemalan clades were jointly estimated using the Bayesian Markov Chain Monte Carlo (MCMC) approach as implemented in BEAST v1.8 [26,27] with BEAGLE [28]. Analyses were performed using the GTR+I+Γ 4 nucleotide substitution model and the temporal scale of evolutionary process was directly estimated from the sampling dates of the sequences using a relaxed uncorrelated lognormal molecular clock model [29]. Migration events throughout the phylogenetic histories were reconstructed using a reversible discrete phylogeography model [30]. Adequate chain mixing and uncertainty in parameter estimates were assessed by calculating the effective sample size (ESS) and the 95% Highest Posterior Density (HPD) values respectively using the TRACER v1.5 program [31]. Maximum clade credibility (MCC) trees were summarized with TreeAnnotator v1.8.1 and visualized with FigTree v1.4.0. Migratory events were summarized using the SPREAD application [32].

Reconstruction of demographic history
The mode and rate (r, years -1 ) of population growth of major HIV-1 clades circulating in Guatemala was also estimated using BEAST v1.8 software. Analyses were also performed using the GTR+I+Γ 4 nucleotide substitution model and the temporal scale of evolutionary process were estimated using a relaxed uncorrelated lognormal molecular clock model. Changes in effective population size through time were initially estimated using a Bayesian Skyline coalescent tree prior [33] and estimates of the population growth rate were subsequently obtained using the parametric model (logistic, exponential or expansion) that provided the best fit to the demographic signal contained in datasets. Comparison between demographic models was performed using the log marginal likelihood (ML) estimation based on path sampling (PS) and stepping-stone sampling (SS) methods [34]. MCMC chains were run for 50-500 × 10 6 generations. Convergence and uncertainty of parameter estimates were assessed as explained above.

Statistical analysis
Demographic and clinical variables of individuals belonging to the different clades were compared using linear regression or Chi-square test according to variable type. False Discovery Rate (FDR) for multiple comparisons was estimated with Storey q-values. The threshold for significance was set at p<0.05 and q<0.2. Analysis was performed using STATA/SE version 14 (StataCorp, College Station, TX).

Most HIV-1 subtype B Guatemalan sequences belong to the B PANDEMIC clade
Of the 1,083 HIV-1 pol Guatemalan sequences here analyzed, we selected 1,047 subtype B sequences geographically distributed among the 22 departments of Guatemala. The remaining 36 sequences were excluded because of possible quality issues (unusual indel mutations/rare polymorphisms) (n = 21) or non-B subtype classification (n = 15). In order to better understand the origin of the Guatemalan HIV-1 subtype B epidemic, Guatemalan sequences were combined with representative sequences of the B PANDEMIC and B CAR lineages (S1 Table) to perform a phylogenetic analysis. The ML analysis revealed that, as expected, the B CAR sequences occupied the deepest branches within the subtype B phylogeny; whereas the B PANDEMIC sequences branched as a well-supported (aLRT = 0.88) monophyletic subgroup nested within the B CAR clades (Fig 1). Of the 1,047 sequences analyzed from Guatemala, 1,036 (98.9%) branched within the B PANDEMIC clade and only 11 sequences (1.1%) branched among the B CAR lineages (Fig 1). This analysis also revealed that 787 (75.2%) of the Guatemalan sequences branched in 10 country-specific monophyletic clades of large size (29 n 209; B GU-I to B GU-X ), 226 (21.6%) branched in 43 country-specific clusters of small size (2 n 17), and the remaining 34 (3.2%) represented non-clustered sequences (Fig 1).

Identification of country-and region-specific HIV-1 subtype B clades
HIV-1 subtype B sequences from the largest Guatemalan clades here identified were next aligned with subtype B sequences from Mexico, Panama and other Central American countries (S1 Table). ML analysis revealed that all sequences from Panama and Costa Rica and most sequences from Mexico (98%) branched outside the major Guatemalan clades (Figs 2 and 3). By contrast, a large proportion of sequences from Honduras (92%), El Salvador (54%), and Belize (22%) branched within the previously defined Guatemalan clades B GU-IV , B GU-VII , B GU-IX and B GU-X that were thus reclassified as Central American clades B CAM-IV , B CAM-II , B CAM-III and B CAM-I , respectively (Fig 2). The prevalence of each major Central American clade greatly varied among countries (S2 Table). In Guatemala, the predominant clade was B CAM-I (19%), followed by clades B CAM-II (12%), B CAM-IV (10%) and B CAM-III (8%). In Honduras, the most prevalent clade was B CAM-I (73%), followed by clades B CAM-III (12%) and B CAM-II (7%). In El Salvador, the predominant clade was B CAM-II (34%), followed by clades B CAM-IV (12%) and B CAM-I (8%). In Belize, B CAM-I and B CAM-II clades seemed to circulate with similar prevalence (11%). Guatemalan clades (B GU-I , B GU-II , B GU-III, B GU-V , B GU-VI and B GU-VIII ) were mostly (>99%) composed by sequences from that country and were thus classified as countryspecific clades. Among subtype B sequences from Guatemala included in the study (n = 1,047), 49.3% (n = 516) belonged to Central America-specific clades and 23.8% (n = 249) belonged to Guatemala-specific clades.

Epidemiological characteristics of Guatemalan subjects across major HIV-1 subtype B clades circulating in Central America
We then analyzed the demographic and clinical characteristics of individuals included in the different Guatemalan and Central American clades (Table 1). Significant differences in all the variables analyzed were observed between the B GU-II clade and the other B CAM and B GU lineages. The B GU-II clade was mostly transmitted among young (median age 27 years), recently infected (59%), MSM (46%), with higher education level (71% with bachelor degree and 16% with high-school degree) and employment (64%) rates. In sharp contrast, all other B CAM and B GU clades were mostly composed of older (median age >31 years), heterosexual (>79%) individuals, with long-standing infection (>78%), and lower education (degree <21%) and employment (<53%) rates than those observed in individuals infected by the B GU-II clade. Guatemala City was the residence for most (>48%) individuals of major B CAM and B GU clades, with exception of B GU-V (24%) and B CAM-III (34%). The B GU-V clade stands out due to a lower mean CD4 T cell count. This clade was composed by persons from outside the Department of Guatemala (77%), with a significantly lower level of education (35% with none and 53% with elementary degree) and heterosexual risk of transmission. Late arrival to clinical care in populations with lower education and socio-economic status is a common feature of many Latin American countries [35]. This may explain the common feature of low CD4 T cell counts at the time of HIV diagnosis.

Spatiotemporal dissemination of major HIV-1 subtype B clades circulating in Central America
The median estimated evolutionary rates at HIV-1 pol gene for major Guatemalan and Central American HIV-1 clades ranged from 2.1 x 10 −3 subst./site/year to 2.9 x 10 −3 subst./site/year with a great overlap of 95% HPD intervals, whereas the coefficient of variation ranged from 0.33 (0.14-0.50) to 0.55 (0.34-0.76), thus supporting the use of a relaxed molecular clock model ( Table 2). The HIV-1 subtype B Guatemalan clades B GU-III , B GU-V , B GU-VI and B GU-VIII arose between the early and the middle 1990s, followed by clades B GU-I and B GU-II between the early and the middle 2000s ( Table 2). The phylogeographic analysis showed that the department of Guatemala was the most probable epicenter of dissemination of all those Guatemalan clades (PSP!0.86). B GU-I was the most widely disseminated lineage, being detected in 15 out of the 22 Guatemalan departments, followed by clade B GU-V detected in 11 departments and El Salvador; clade B GU-VIII detected in 11 departments, clade B GU-VI in 10 departments; clade B GU-II in seven departments, and clade B GU-III in six departments and Mexico (Fig 4). The departments of Alta Verapaz, Quiche, Sacatepequez and Santa Rosa were also detected as secondary disseminating locations of clades B GU-V , B GU-I /B GU-II /B GU-VIII , B GU-III and B GU-II , respectively (Fig 4).
The root location of Central American clades B CAM-I and B CAM-III was most probably traced to Honduras (Posterior State Probability, PSP !0.99) around the middle 1980s ( Table 2). The most widely disseminated viral clade, B CAM-I, migrated from Honduras to the department of Guatemala, El Salvador and Mexico during the early 1990s and later moved to other Guatemalan departments around 2010 (Fig 5). From the department of Guatemala, clade B CAM-I was disseminated to other 16 Guatemalan departments and also to Mexico, Belize, and back to Honduras. El Salvador and the Guatemalan department of Escuintla were also secondary hubs of dissemination of clade B CAM-I . Clade B CAM-III moved from Honduras to several Guatemalan departments (Guatemala, Zacapa, Sacatepéquez, Izabal and Chiquimula) and to Mexico since 1997 (Fig 5). The department of Guatemala acted as a secondary hub of dissemination of B CAM-III sending viruses to other Guatemalan departments from 2001 onwards.
The root location for clades B CAM-II and B CAM-IV was most probably traced to the department of Guatemala (PSP = 0.47 and 0.99, respectively) between the late 1970s and the early 1990s ( Table 2). Clade B CAM-II moved from the department of Guatemala to Honduras and El Salvador around the late 1970s, to the Guatemalan departments of Solola and Chiquimula during the 1990s, and to other 11 Guatemalan departments, Belize and Mexico since the early 2000 ( Fig 5). Secondary hubs of dissemination of clade B CAM-II were El Salvador from where the virus has spread to Honduras, back to the department of Guatemala, and to other seven Guatemalan departments since the early 1990s, and Honduras from where the virus has spread back to the department of Guatemala and Mexico since the early 1980s. The department of Guatemala was the major epicenter of clade B CAM-IV moving from the capital to other 15 Guatemalan departments, as well as to El Salvador and Mexico since the middle 1980s. From El Salvador, clade B CAM-IV moved back to the department of Guatemala.

Demographic history of major HIV-1 subtype B clades circulating in Guatemala
To reconstruct the population dynamics of the HIV-1 subtype B epidemic in Guatemala, we used the Guatemalan sequences from those clades that arose in Guatemala (B GU-I , B GU-II , B GU-III , B GU-V , B GU-VI and B GU-VIII , and B CAM-IV ) and well supported (aLTR !80) Guatemalaspecific sub-clades within clades B CAM-I (B CAM-I/GU-I = 51 sequences and B CAM-I/GU-II = 37) and B CAM-III (B CAM-III/GU = 71 sequences). Bayesian skyline plot (BSP) analyses suggest that most clades experienced an initial phase of exponential growth followed by a decline in growth rate during the 2000s, leading to stabilization of the effective population size (Ne) (Fig 6). Some clades (B GU-II , B GU-III , B CAM-IV and B CAM-I/GU-II ), however, seem to have experienced more complex dynamics characterized by phases of expansion interleaved by periods of slow growth (Fig 6). Model comparison indicated that the best-fit demographic model was the logistic for clades B GU-II, B CAM-IV, B GU-VI , B CAM-I/GU-I and B CAM-I/GU-II , while both the logistic and exponential models fit equally well the demographic signal in clades B GU-I, B GU-III, B GU-V , B GU-VIII and B GU-IX (S3 Table). According to these parametric models, the mean estimated growth rate of major clades circulating in Guatemala ranged from 0.23 year -1 for clade B GU-III to 3.59 year -1 for clade B GU-II (Table 3).

Discussion
The HIV-1 subtype B epidemic in Guatemala is mostly related to the B PANDEMIC clade (98.9%), as described for the majority of the Central [7,8,36], North, and South [3,37] American countries. The proportion of Guatemalan HIV-1 subtype B viruses belonging to the B CAR clades (1.1%) was similar to the prevalence reported for Honduras, El Salvador, and Mexico  HIV-1 subtype B molecular epidemiology in Guatemala (<1.1%) [37], but lower than the one reported in Panama (5.5%), the country where B CAR clade circulation was first demonstrated in Central America [8].
Our phylogeographic analysis identified both country-and region-specific HIV-1 subtype B clades involved in the Guatemalan HIV epidemic, contrasting with the Panamanian Table 1 [7]. The fact that the Central American clades BCAM-I-B CAM-IV differ in root location (B CAM-I and B CAM-III traced to Honduras, and B CAM-II and B CAM-IV traced to Guatemala), also refute the Murillo et al assertion that a single viral introduction event was responsible for most of the HIV-1 subtype B infections in Central America. This change in the previously described HIV-1 epidemiological history of subtype B in Central America is expected, considering the fact that the highest number of individuals estimated to be living with HIV in Central America are from Guatemala [9], and this country was not included in previous analyses. Nearly half of all Guatemalan HIV-1 subtype B sequences included in our study branched within Central American B PANDEMIC clades (67%), indicating an unrestricted viral flow between main transmission networks from Guatemala and the neighboring Central American countries of Honduras and El Salvador. This result is consistent with a previous study that suggested an HIV-1 subtype B epidemiological link between the Guatemalan neighboring countries Belize, Honduras and El Salvador [6]. A much more restricted viral flow, by contrast, was observed between Guatemala and Mexico (despite their close geographic proximity) and between Guatemala and Panama. Viral flux between Guatemala and Belize, Nicaragua and Costa Rica was not possible to estimate because of the paucity of HIV-1 pol sequences from these countries.

All clusters B GU-I B GU-II B GU-III B GU-V B GU-VI B GU-VIII B CAM I B CAM II B CAM III B CAM IV
About one fourth of all Guatemalan HIV-1 subtype B sequences included in our study branched within six country-specific clades of large size (B GU-I , B GU-II , B GU-III, B GU-V , B GU-VI , and B GU-VIII ). The analysis of clinical and demographic characteristics, however, reveals no major difference between B CAM and B GU clades, with the exception of clade B GU-II that differed from all other clades mostly comprising mainly young, recently infected MSM with higher education and employment rates. In the last 10 years, HIV prevalence among MSM has remained higher than 8% in Guatemala, with most of the cases (75%) related to MSM younger than 36 years with higher education [38]. Major differences, by contrast, were detected between B CAM and B GU clades, with regards to the estimated origin of the different lineages: B GU clades displayed a much more recent median T MRCA (between the 1990s and 2000s) than HIV-1 subtype B molecular epidemiology in Guatemala the B CAM clades (between the 1970s and 1980s), which may have had a great impact in the geographical expansion of each lineage.
The spatiotemporal dissemination analysis of B GU and B CAM clades evidenced complex migration patterns characterizing the HIV-1 epidemic in the region, involving Guatemalan departments and the neighboring countries Honduras, El Salvador, Belize and Mexico. Guatemalan internal and external migration has been attributed mostly to economic factors, with population mobility from poor rural areas to agriculture or industrial developing areas [39], and the country's civil war that lasted 32 years, from 1960 to 1996 [40]. The department of Guatemala and, to a lesser extent, Escuintla, Izabal and Petén have been the main destinations for internal-migration from the eastern and western departments of the country, with dynamic bidirectional population mobility [41,42]. Regarding external migration, 50% of immigrants to Guatemala correspond to people of other Central American countries, mostly from El Salvador, Nicaragua and Honduras [42], yet the Guatemalan net migration rate is negative [43] with emigration flows mostly to the United States of America, and the neighboring countries of Mexico, Honduras, El Salvador and Belize [42]. Our phylogeographic reconstructions revealed that the department of Guatemala was the dissemination epicenter of the B GU , B CA-M-II and B CAM-IV clades to the rest of Guatemalan departments and to the Guatemalan neighboring countries. Additionally, the departments of Guatemala and Escuintla served as secondary hubs of dissemination of the B CAM-I and B CAM-III clades, which emerged in Honduras, to other Guatemalan departments and to the four bordering countries in the case of B CAM-I . The estimated T MRCA of B GU clades, ranging between the early 1990s and the middle 2000s, also coincides with an increased repatriation and deportation rate of Guatemalans from Mexico and the United States of America, respectively [42,44], although we found no significant evolutionary links between B GU clades and the Mexican subtype B epidemic.
The demographic analysis of the B GU-I , B GU-V , B GU-VI , B GU-VIII , and B CAM-I/GU-I clusters suggested that, even though all these clades had an initial phase of exponential growth, the dissemination of these clades declined during the 2000s. On the other hand, the B GU-II , B GU-III , B CAM-IV and B CAM-I/GU-II clades evidenced a more complex growth dynamics with phases of expansion interleaved by periods of slow growth. The B GU-II clade, characterized mostly by a population of young MSM, evidenced the highest mean estimated logistic growth rate (3.6 year -1 ) among the major clades circulating in Guatemala. The B GU-II clade growth rate was remarkably higher than those previously estimated for other country-specific B clades mainly or exclusively restricted to MSM (ranging from 0.5 to 1.6 year -1 ) [45][46][47][48]. Furthermore, the proportion of strains that corresponds to new infections within the B GU-II clade was higher (59.1%) compared with the rest of the B GU and B CAM clades (ranging from 6.9% of B GU-III to 22.1% of B CAM-III ). As the HIV epidemic in Guatemala is concentrated in high-risk key populations, with 8.9% prevalence in MSM [10], an increase in size and proportion of the B GU-II clade would be expected in the current Guatemalan HIV epidemic. These results, however, should be interpreted with caution because the exponential growth model, that was equally supported as the logistic one, showed a much lower mean growth rate (0.4 year -1 ) for the B GU-II clade. Despite the fact that other major subtype B clades in Guatemala mostly circulate  among individuals infected by the same transmission route (heterosexual transmission), the mean estimated growth rate of these lineages varied considerably, ranging from 0.2 year -1 for clade B GU-III to 2.1 year -1 for clade B CAM-I/GU-II . This range is even higher than that estimated for country-specific B PANDEMIC clades circulating in the general population in different Latin American countries (0.2-0.9 year -1 ) [7,8,36].
It is important to acknowledge some limitations of our study design. Even when the study sample size is large, sequences were obtained from a single HIV clinic. Additionally, the time frame for collecting HIV sequences was limited (2010-2013). Nevertheless, the Roosevelt Hospital is the largest of a total of 18 HIV clinics in the country, providing antiretroviral treatment for 30% of all registered HIV-infected persons in Guatemala [15], and the demographic characteristics (including female-to-male ratio, mean age, race, marital status, literacy) and geographical distribution of persons attending this clinic are highly representative of the whole population living with HIV in Guatemala [11,16,17]. Indeed, among the Roosevelt Hospital's clients, 45% reside outside the department of Guatemala [14,16]. Our study group is highly representative of the Roosevelt Hospital cohort, as more than 80% of all adults diagnosed at this clinic during the enrolment period were included in the study (expecting approximately 450 total new cases per year [16]). Given these characteristics of the Roosevelt Hospital's cohort, we would not expect our main observations to vary significantly if additional participants enrolled in clinics from around the country were to be included.

Conclusions
In summary, our observations suggest that the Guatemalan HIV-1 subtype B epidemic is being driven by dissemination of multiple B PANDEMIC founder viral strains, some of them mostly restricted to Guatemala and others widely disseminated in the Central American region. Moreover, our results identify Guatemala and Honduras as hubs that have played a central role in dissemination of B PANDEMIC viral strains within Central America and further suggest the existence of different sub-epidemics with distinct evolutionary and demographic