A Newly Emerging HIV-1 Recombinant Lineage (CRF58_01B) Disseminating among People Who Inject Drugs in Malaysia

The HIV epidemic is primarily characterised by the circulation of HIV-1 group M (main) comprising of 11 subtypes and sub-subtypes (A1, A2, B–D, F1, F2, G, H, J, and K) and to date 55 circulating recombinant forms (CRFs). In Southeast Asia, active inter-subtype recombination involving three main circulating genotypes—subtype B (including subtype B′, the Thai variant of subtype B), CRF01_AE, and CRF33_01B—have contributed to the emergence of novel unique recombinant forms. In the present study, we conducted the molecular epidemiological surveillance of HIV-1 gag-RT genes among 258 people who inject drugs (PWIDs) in Kuala Lumpur, Malaysia, between 2009 and 2011 whereby a novel CRF candidate was recently identified. The near full-length genome sequences obtained from six epidemiologically unlinked individuals showed identical mosaic structures consisting of subtype B′ and CRF01_AE, with six unique recombination breakpoints in the gag-RT, pol, and env regions. Among the high-risk population of PWIDs in Malaysia, which was predominantly infected by CRF33_01B (>70%), CRF58_01B circulated at a low but significant prevalence (2.3%, 6/258). Interestingly, the CRF58_01B shared two unique recombination breakpoints with other established CRFs in the region: CRF33_01B, CRF48_01B, and CRF53_01B in the gag gene, and CRF15_01B (from Thailand) in the env gene. Extended Bayesian Markov chain Monte Carlo sampling analysis showed that CRF58_01B and other recently discovered CRFs were most likely to have originated in Malaysia, and that the recent spread of recombinant lineages in the country had little influence from neighbouring countries. The isolation, genetic characterization, and evolutionary features of CRF58_01B among PWIDs in Malaysia signify the increasingly complex HIV-1 diversity in Southeast Asia that may hold an implication on disease treatment, control, and prevention.


Introduction
According to the Joint United Nations Program on HIV/AIDS (UNAIDS), approximately 34 million people were living with HIV worldwide by the end of 2011. Within the same year, 2.5 million new HIV infections were also reported across the globe, attributing to an adult HIV prevalence rate of 0.8% [1]. In Malaysia, a total of 94,841 cases of HIV infections had been reported since the country's first HIV epidemic began in 1986, among which 14,986 AIDS-related deaths were recorded. The high-risk practice of injecting drug use was especially prominent in Malaysia with the highest HIV prevalence rate at 70% compared to other risk groups and causing more than half of AIDS-related deaths in the country during the last two decades [2].
In Southeast Asia, the first HIV/AIDS epidemic occurred in Thailand in the late 1980s where two genetically distinct HIV-1 genotypes were co-circulating in the country, namely the circulating recombinant form (CRF) 01_AE (CRF01_AE) and subtype B (including subtype B9, the Thai variant of subtype B). However CRF01_AE and subtype B9 had circulated among distinct risk groups, where CRF01_AE propagated among those engaged in heterosexual activities as compared to subtype B9 circulating among people who inject drugs (PWIDs) [3,4]. By mid-1990s, it was observed that the distribution of CRF01_AE was no longer confined among the heterosexuals when Tovanabutra et al. identified the circulation of CRF01_AE among 80% of PWID in Thailand [5]. Coupled with the rampant illegal drug trafficking activities in the region [6], CRF01_AE was soon disseminating among PWIDs in the vicinity including Cambodia, Vietnam, Malaysia, China, Taiwan, Korea, Japan and various countries in Southeast and East Asia [3,7].
In the following years, in addition to the extensive genetic diversity of HIV-1 [8], the wide co-circulation and dual infection of CRF01_AE and subtype B9 among various risk populations in Southeast Asia have led to the emergence of various unique recombinant forms (URFs) and ultimately, CRFs as defined by the identification and characterisation of near full length HIV-1 sequences which display an identical mosaic genome isolated from three or more epidemiologically-unlinked persons [9]. At present, 55 CRFs have been characterised (http://www.hiv.lanl.gov/) and altogether they comprise an estimated 16% of HIV-1 infections reported worldwide [10]. In Southeast Asia, a recent study documented the massive expansion of CRF33_01B among PWIDs in Malaysia and its endemicity in various HIV-1 infected populations including children who acquired infections through their mothers -further highlighting the increasing transmission of CRF33_01B to the general population [11]. The CRF33_01B lineage is also reported to be actively recombining with the main circulating genotypes in the region, consequently generating multiple novel and genetically distinct clades including CRF48_01B and CRF53_01B [12,13], each sharing one or more recombination features with CRF33_01B [14].
In addition to PWIDs, earlier studies reported the widespread dissemination of CRF33_01B at a significant prevalence among homosexuals and heterosexuals in Malaysia [14,15] and also in neighbouring countries, in particular Singapore [16,17], Indonesia [18] and Hong Kong [19], further demonstrating the establishment of the relatively new CRF33_01B lineages across Asia. The co-circulation of the previously identified CRFs and URFs, in addition to HIV-1 CRF01_AE, subtype B9 and other infrequent imported genotypes (e.g. subtype C, CRF02_AG [20]) may indeed increase the genetic complexity of HIV-1 in Southeast Asia. Furthermore, in view of the increasing epidemiological impact of HIV-1 recombinants, for example CRF33_01B [11] in Southeast Asian countries, it is highly presumptive that novel recombinants (CRF) could arise especially among the high risk injecting drug population. In this study, as a result of continuous molecular surveillance recently conducted among PWIDs between 2009 and 2011 in Malaysia [11], we report the emergence of a newly emerging novel HIV-1 CRF, designated as CRF58_01B characterised by the near full length recombinant genomes sequenced from six epidemiologically-unlinked PWIDs.

Ethics Statement
The study was approved by the University Malaya Medical Centre (UMMC) Medical Ethics Committee. Standard, multilingual consent forms allowed by the Medical Ethics Committee were used. Written consent was obtained from all willing study participants. Being an especially vulnerable population, all interviews and data collected were kept confidential. All potential participants who declined to participate in the study were not in any way disadvantaged from receiving treatment and care.

Study Subjects and HIV-1 near Full Length Genome Amplification
All six study subjects were recruited during a molecular epidemiological study conducted during 2009-2011 among inmates of a prison and attendees of a needle syringe exchange program (n = 258) in Kuala Lumpur, Malaysia [11] based on initial HIV-1 gag-RT genes amplification and sequencing. Plasma samples were collected from all subjects, serologically determined to be HIV-1 positive and stored at 280uC until further processed.
HIV-1 Viral RNA was extracted from plasma samples using the NucliSENS easyMAG automated platform (bioMerieux, Durham, North Carolina, USA) according to the manufacturer's recommendation and reverse transcribed into cDNA using SuperScript III RNase H 2 Reverse Transcriptase (Invitrogen, Carlsbad, California, USA) and random hexamers (Applied Biosystems, USA) according to the manufacturer's instructions. Nested PCR was performed using QIAGEN HotStarTaq Plus DNA polymerase (Qiagen, Hilden, Germany) to amplify 10 overlapping fragments corresponding to the near full length HIV-1 genome using primers listed in Table S1. Purified PCR products were directly sequenced using ABI PRISM 3730XL DNA Analyzer (Applied Biosystems, Foster City, California, USA) and assembled to produce near full length genomes (,9 kb).

Phylogenetic and Recombination Analysis
Near full length genomes were aligned with the HIV-1 reference subtypes and CRFs of epidemiological significance in Southeast Asia downloaded from the Los Alamos HIV database (http:// www.hiv.lanl.gov/) using ClustalX 2.0. Nucleotide sequences were then manually adjusted using BioEdit 7.0 with reference to the HIV Sequence Compendium 2011 (http://www.hiv.lanl.gov/) to ensure accurate codon alignment. In addition, newly published CRFs of regional significance such as CRF48_01B and CRF51_01B to CRF55_01B were included in the alignment. Phylogenetic trees were constructed by the neighbour-joining method based on the Kimura two-parameter model with a transition-transversion ratio of 2.0 using MEGA 5.05 [21]. The reliability of the branching orders were analysed by bootstrap analysis of 1000 replicates. Bootscanning analysis [22] was performed using SimPlot version 3.5.1 [23] and followed by informative site analysis to identify the specific recombination  breakpoints shared by the novel CRF candidates. All possible parental reference strains were included in the similarity plot to identify the closely related parental strains prior to subsequent analyses. Sub-region trees were constructed to confirm the parental origin of each segment in the near full length genome. Such methods have been established and widely used in various studies to characterize recombinant genomes, including HIV-1.
All sequences reported in this study have been deposited in GenBank with accession numbers reported in Table 1.

Results and Discussions
Phylogenetic and Recombination Analysis Identified a Novel HIV-1 CRF58_01B Genotype In the present study, we sequenced the near full length genomes of HIV-1 isolated from six epidemiologically unlinked PWIDs recruited as part of a recent molecular epidemiological surveillance conducted between 2009 and 2011 in Kuala Lumpur, Malaysia. These strains were selected for near full length sequencing based on the preliminary screening of the gag-RT genes which revealed the sharing of unique recombination structures and breakpoints [11]. All subjects were antiretroviralnaïve males with a mean age of 40.566.9 years old from various ethnic groups (Malays, Indian and Kadazan) with a history of unsafe injecting drug use ( Table 1). In these six strains (09MYPR37, 10MYKJ036, 10MYPR87, 11MY1ZK731, 11MY1RJ704 and 11MY1EP794), the coding regions of HIV-1 were amplified, spanning the gag, pol, env, tat, rev, vif, vpr, vpu, and nef genes and subsequently analysed for genetic evidence of similar mosaic recombination structures using SimPlot (bootscanning and informative sites analyses). Bootscanning and informative sites analyses of the near full length genomes were performed using HIV-1 CRF01_CM240 and B9_RL42 as the putative parental genotypes, revealing similar mosaic recombination structures in all six strains (Figure 1 and Table S2). In these mosaic genomes involving HIV-1 CRF01_AE and subtype B9,  ( Figure 1). Regions I, III, V and VII (1263 bp, 505 bp, 4180 bp and 1223 bp, consecutively) were grouped within CRF01_AE and regions II, IV and VI (570 bp, 220 bp and 585 bp, consecutively) were of subtype B9 origin. Of interest, the first recombination breakpoint (HXB2: 2053 to 2095 nt) shared among all six strains was similarly identified in CRF33_01B [14,24], CRF48_01B [12] and CRF53_01B [13], firstly identified in Malaysia. Moreover, the sixth unique recombination breakpoint of all six strains was located closely with that of CRF15_01B, originally identified in Thailand [25] in the env region (approximately 12 bp in distance relative to the recombination breakpoint of CRF15_01B, reported at position 831761 nt by the Los Alamos National Laboratory, LANL). Taken together, the near full length genomes of all six strains formed a highly-supported novel cluster, genetically distinct from other established subtypes and CRFs reported worldwide and therefore assigned as CRF58_01B by the LANL, in compliance with the HIV-1 nomenclature system (Figure 2). Intra-genotype pairwise nucleotide distance analysis of CRF58_01B and its parental strains (CRF01_AE and B/B9) based on a larger alignment data set containing CRF01_AE (n = 59) and B/B9 (n = 77) reference sequences has shown that both parental strains had greater nucleotide diversity across the genomes compared to CRF58_01B, although the estimated nucleotide diversity within CRF58_01B sequences was considerably high (Table S2). This may imply that CRF58_01B could have recombined years ago and not recently. There are perhaps several reasons why CRF58_01B was not discovered during the early years: a) CRF58_01B was circulating at a very low level and concentrated among the PWIDs, b) insufficient sampling in previous molecular epidemiology studies, and c) mis-classification of CRF58_01B due to limited, partial genome analysis (often only involving the prot-RT genes).

Evolutionary Analysis of CRF58_01B and other Established CRFs in Malaysia and Thailand
Following the near full length genome characterization of HIV-1 CRF58_01B, we further distinguished the probable genetic relationships between CRF58_01B and other established CRFs in Malaysia and Thailand on the basis of the two shared unique recombination breakpoints with CRF33_01B, CRF48_01B and CRF53_01B in the gag gene, and CRF15_01B in the env gene. We hypothesize that CRF58_01B may be ancestrally linked to the established CRFs mentioned herein, therefore phylogenetic signal within genetic regions adjacent to the breakpoints may possibly reveal the shared evolutionary history among these lineages [26]. Briefly, maximum clade credibility (MCC) phylogenetic reconstructions were performed for the three shared sub-regions: region I (of CRF01_AE origin, HXB2: 790-2052 nt), VI (subtype B9 origin, HXB2: 7743-8327 nt) and VII (CRF01_AE origin, HXB2: 8385-9607 nt) in the gag, env and nef-39 LTR genes, respectively ( Figure 3). Reference strains of CRF01_AE (sampled from 1990 to 2009 in Malaysia, Thailand, China and Japan) and B/B9 (sampled from 1983 to 2011 in France, United States of America, Japan, Thailand and Myanmar) were downloaded from the Los Alamos HIV database. In addition, reference strains of CRF15_01B (n = 5), CRF33_01B (n = 15), CRF48_01B (n = 3), CRF53_01B (n = 4) and the putative parental genotypes of these recombinants, CRF01_AE and subtype B (including B9) from Southeast and East Asia, were also retrieved from the online database. A coalescent-based Bayesian Markov chain Monte Carlo (MCMC) sampling method was performed in BEAST v1.7.4 [27] with an uncorrelated lognormal relaxed molecular clock and appropriate evolutionary parameters incorporated, as described previously [26].
First, MCC phylogeny analysis of the gag gene (region I) showed that almost all CRFs isolated from Malaysia (CRF33_01B, CRF48_01B, CRF53_01B and the novel CRF58_01B) were intermingled within a robust monophyletic cluster distinct from other CRF01_AE and CRF15_01B strains ( Figure 3). This could possibly be explained by the high rates of recombination that may confound the evolutionary history and classification of HIV-1 sequences [28]. Recombination event especially intra-subtype recombination (recombination involving closely related lineages of the same subtype within a single individual) is common among HIV-1. Such mechanism could lead to possible discrepancies on linkage disequilibrium, including the possible loss of phylogenetic correlation within the genome. Most of the analysed CRFs (CRF33/48/53/58) were likely to have emerged some years ago as indicated by the high intra-subtype genetic distances (Figure 3), thus increasing the likelihood of intra-subtype recombination that may lead to such discrepancy [29]. Such genetic pattern indicated that a common ancestor was shared among these CRFs, and recombination events that generated these clades were likely to be traced to Malaysia. In addition, the findings probably showed that recent emergence of novel CRFs, which is thought to be the major driving force of the regional epidemic, have had little influence from neighbouring countries. The spatial and temporal structure in the gag gene however was less informative to elucidate clearly the divergence times for each CRF lineage. Lastly, in order to examine the evolutionary relationship between CRF58_01B and CRF15_01B (mainly circulating in Thailand), MCC tree analysis for sub-regions VI and VII showed that all CRF58_01B strains were grouped together but distantly located from all CRF15_01B sequences ( Figure 3). The results disproved the possible genealogical relationship between CRF58_01B and CRF15_01B, although both genotypes shared a common recombination ''hotspot'' in the env gene.
The increasing genetic diversity of HIV-1 in Southeast Asia was attributed to the growing emergence of distinct CRFs circulating in the HIV-1 infected population in the region. In recent years (2005-2011), a total of five novel and genetically distinct CRFs, namely CRF33_01B [14,24], CRF48_01B [12], CRF52_01B [30], CRF53_01B [13] and CRF54_01B [31] (Figure 2) had been identified among various HIV-1 infected populations, mostly involving PWIDs, heterosexuals and men who have sex with men (MSM) in Malaysia whereby active inter-subtype recombination between subtype B9, CRF01_AE and CRF33_01B lineages has been on-going. The identification of CRF58_01B may represent Recently isolated among PWIDs in Malaysia, CRF58_01B shared two unique recombination breakpoints (indicated in dashed lines) with other established CRFs in the region: CRF33_01B, CRF48_01B and CRF53_01B in the gag gene at HXB2: 2053 to 2095 nt, and CRF15_01B (from Thailand) in the env gene. Maximum clade credibility (MCC) phylogenies were constructed for the shared sub-regions I, VI and VII between CRF58_01B and the established CRFs by including CRF01_AE (sampled from 1990 to 2009 in Malaysia, Thailand, China and Japan) and B/B9 reference strains (sampled from 1983 to 2011 in France, United States of America, Japan, Thailand and Myanmar) downloaded from the Los Alamos HIV database (labelled in black) to discern their evolutionary relationship. Monophyletic clusters (with a posterior probability of 1.0) were indicated with an asterisk (*) at the branch nodes. SIVcpz and CRF01_AE or B/B9 reference strains were included as outgroups but not shown for simplicity. The subregion trees were scaled in units of time (years). doi:10.1371/journal.pone.0085250.g003 one of the recently emerging recombinant strains circulating at a low prevalence (2.3%, 6/258) among PWIDs in the region, as indicated by its apparent absence in other molecular epidemiological studies conducted prior to year 2009 [14,24,[32][33][34][35]. The steady emergence of CRF58_01B between 2009 and 2011 however was not readily explained and remains unclear, although it is possible that the increased detection of CRF58_01B especially in year 2011 may be due to active transmission among PWIDs.
In summary, the present study identified a newly emerging HIV-1 CRF58_01B among PWIDs in Malaysia and in addition to other co-circulating genotypes adds further challenges to the development of an HIV vaccine and HIV-1 control in general. Table S1 HIV-1 near full length primer sequences (in 59 to 39 direction). (DOCX)  Figure 1). (DOCX)