Characterization of HIV-1 CRF90_BF1 and putative novel CRFs_BF1 in Central West, North and Northeast Brazilian regions

The Brazilian AIDS epidemic has been characterized by an increasing rate of BF1 recombinants and so far eight circulating recombinant forms/CRFs_BF1 have been described countrywide. In this study, pol sequences (protease/PR, reverse transcriptase/RT) of 87 BF1 mosaic isolates identified among 828 patients living in six Brazilian States from three geographic regions (Central West, North, Northeast) were analyzed. Phylogenetic and bootscan analyses were performed to investigate the evolutionary relationship and mosaic structure of BF1 isolates. Those analyses showed that 20.7% of mosaics (18 out of 87) were CRFs-like isolates, mostly represented by CRF28/CRF29_BF-like viruses (14 out of 18). We also identified five highly supported clusters that together comprise 42 out of 87 (48.3%) BF1 sequences, each cluster containing at least five sequences sharing a similar mosaic structure, suggesting possible new unidentified CRFs_BF1. The divergence time of these five potential new CRFs_BF1 clusters was estimated using a Bayesian approach and indicate that they probably originated between the middle 1980s and the middle 1990s. DNA was extracted from whole blood and four overlapping fragments were amplified by PCR providing full/near full length genomes (FLG/NFLG) and partial genomes. Eleven HIV-1 isolates from Cluster # 5 identified in epidemiologically unlinked individuals living in Central West and North regions provided FLG/NFLG/partial genome sequences with identical mosaic structure. These viruses differ from any known CRF_BF1 reported to date and were named CRF90_BF1 by the Los Alamos National Laboratory. This is the 9th CRF_BF1 described in Brazil and the first one identified in Central West and North regions. Our results highlight the importance of continued molecular screening and surveillance studies, especially of full genome sequences to understand the evolutionary dynamics of the HIV-1 epidemic in a country of continental dimensions as Brazil.

Introduction Human Immunodeficiency Virus-1 (HIV-1) is a highly polymorphic and fast evolving pathogen [1]. Worldwide HIV-1 can be classified into groups (M, N, O and P), and the pandemic group M is classified in subtypes (A-D, F-H, J and K) and sub-subtypes (A1-A4, F1-F2) [2,3]. While mutation rates are similar to other RNA viruses, HIV-1 has a high recombinogenic capacity and intersubtype recombination events are frequent in coinfected or superinfected individuals from areas where two or multiple variants cocirculate [4]. Recombinant strains exhibiting identical mosaic patterns identified in at least three epidemiologically unlinked individuals have been classified as circulating recombinant forms (CRFs), while the ones displaying unique mosaic structures or only infecting individuals with epidemiological link are known as unique recombinant forms (URFs) [5,6]. Recombination has been recognized as a driving force in shaping the diversity of HIV-1 globally since the mid 90´s [7]. Currently, 88 CRFs have been assigned and 81 of them have been published with public data available at the Los Alamos HIV database [http://www.hiv.lanl.gov/content/sequence/HIV/CRFs/CRFs.html]. CRFs together with URFs are estimated to account for at least 20% of HIV-1 infections worldwide [8].

DNA sequencing
The amplified DNA fragments from the nested-PCR products were separated by gel electrophoresis, purified (kit QIAquick1 PCR Purification Kit/QIAGEN, Qiagen, Hilden, Germany) and sequenced with the Big Dye Terminator Sequencing Kit v. 3.1 (Applied Biosystems, Foster City, CA) in an automated ABI Prism 3100 Genetic Analyzer (Applied Biosystems, USA). Chromatograms were analyzed and edited using the SeqMan software from the package DNASTAR Lasergene (MA, USA).

Phylogenetic and recombination analyses
Sequences were aligned using Clustal X 2.0 implemented in BioEdit 7.2.0 program [41]. Reference sequences of HIV-1 group M subtypes (A-D, F-H, J and K) and CRF-BF1 sequences were obtained from the Los Alamos HIV database (http://hiv.lanl.gov/). Phylogenetic trees were generated using the neighbor-joining (NJ) method [42] under the Kimura two-parameter model [43] using MEGA 6.0 software [44]. Bootstrap values (BP, 1.000 replicates) above 70% were considered significant. Recombination analyses were performed in all viral isolates using bootscan implemented in Simplot v3.5.1 software with the following parameters: 200nt or 300nt window, 20nt increments, NJ method under Kimura's two-parameter correction with 100 bootstrap replicates [45]. In this study the parameters used for bootscan analyses of recombinant viruses differed for smaller and larger fragments: for the analyses of pol fragments (998nt) a smaller sliding window of 200nt was used whereas for larger fragments of near fullgenomes (>6670nt) a larger sliding window of 300nt was adopted. To better characterize the recombination breakpoints suggested in the previous analyses, the putative recombinants were subjected to informative site analyses as described elsewhere [39]. For this purpose, consensus sequences from Brazilian HIV-1 subtypes B and F were generated in the DAMBE program [46]. Fragments of sequences assigned to specific HIV-1 subtypes were finally confirmed by separate NJ phylogenetic analysis as described above.
Representative samples from the HIV-1 BF1 Brazilian clusters herein identified were submitted to a Basic Local Alignment Search Tool (BLAST) analysis in order to recover other Brazilian sequences with high similarity (>95%) and probably similar recombination profile. The BLAST analysis was done sequences using sequences obtained from the Los Alamos HIV database (http://hiv.lanl.gov/).

Evolutionary analyses of BF1 recombinants
The time of the most recent common ancestor (T MRCA ) of HIV-1 BF1 clades was estimated using a Bayesian Markov Chain Monte Carlo (MCMC) approach implemented in BEAST v1.8 [47,48] with BEAGLE to improve run-time [49]. Analyses were performed using the GTR+I +G nucleotide substitution model, a Bayesian Skyline coalescent tree prior [50] and a relaxed uncorrelated lognormal molecular clock model [51] with an informative uniform prior interval (1.0-3.0 x 10 −3 nucleotide substitutions per site per year). One MCMC chain was run for 1x10 7 generations. Convergence and uncertainty of parameter estimates were assessed by calculating the effective sample size (ESS) and the 95% highest probability density (HPD) values, respectively using Tracer v1.6 [52]. The maximum clade credibility (MCC) tree was summarized with TreeAnnotator v1.8 and visualized with FigTree v1.4.0.

Data availability
All HIV-1 sequences generated in this study were deposited in the GenBank database (KY628215-KY628225).

Phylogenetic and evolutionary analyses of BF1 pol recombinants
Initial phylogenetic analyses of 87 HIV-1 isolates previously characterized as BF1 recombinants in the PR/RT region (S1 Table) classified 18 (21%) sequences as CRF_BF-like (14 CRF28/CRF29_BF-like, two CRF17_BF-like, one CRF12_BF-like and one CRF47_BF-like) and 27 (31%) sequences as URFs_BF (Fig 1). The remaining 42 (48%) sequences were distributed in five clusters comprising between five and 22 sequences, sharing the same mosaic structure and were classified as potential news CRFs_BF1 (Fig 1). Clusters # 1, 3 and 4 displayed high supports (BP ! 99%) at initial analysis. For Clusters # 2 and 5, however, high supports were obtained only after exclusion of the URFs_BF MS251, BRGO3127 and BRGO4162 sequences (Fig 1). Cluster # 1 had six sequences, from three different States (two from Goiás, three from Maranhão and one from Piauí). Cluster # 2 had five sequences, all from Goiás State. Cluster # 3 comprised four sequences from two States (two from Mato Grosso and two from Goiás). Cluster # 4 had five sequences from three States (one from Goiás, three from Maranhão and one from Piauí). Cluster # 5 contained 22 sequences from three States (20 from Goiás, one from Mato Grosso and one from Tocantins).
A Blast search analysis was performed to identify sequences similar to the five potential new CRF_BF1 Brazilian clusters. The recovered sequences were included in the phylogenetic and recombinant analysis, bootstrap values higher than 87% and similar mosaic profiles compared to those previously classified in Clusters # 3, 4 and 5 was verified (Fig 2). Eighteen sequences branching within Custer # 3 were recovered from patients recruited in four States from the North region (seven from Amazonas, five from Rondônia, three from Roraima and one from Bootscanning analyses of BF1 inter-subtype recombinant clusters (# 1-5) are represented. The five clusters identified in our study are indicated by different colors: Cluster # 1: purple, Cluster # 2: blue, Cluster # 3: pink, Cluster # 4: green and Cluster # 5: red. Bootscan analysis was performed in a 200nt sliding window advanced in 20nt step size increments (1.000 replicates). All CRF_BF depicting recombination breakpoints in pol region were included in the analysis. In the mosaic structure representations of BF1 isolates, the breakpoint positions according to HXB2 genome numeration are shown on the right and left sides of the clusters, blue stands for subtype B and green stands for subtype F. https://doi.org/10.1371/journal.pone.0178578.g001 Novel HIV-1 CRF90_BF1 detected in Central-West, North Brazil Acre) along with two sequences from the South region (Paraná) (Fig 2). Two sequences from the North region (Amapá) classified in Cluster # 4 and three sequences classified in Cluster # 5 were recovered from patients from the North region (Rondônia) (Fig 2).
The Bayesian MCC tree displayed the same topology of the NJ tree, thus confirming the five BF1 phylogenetic clusters initially described (Fig 3). According to this analysis, the median T MRCA of the five potential new Brazilian CRFs_BF identified was estimated between the middle 1980s and the middle 1990s (Fig 3). Fig 2. Phylogenetic tree of study BF1 isolates from Central West, North, Northeast and South Brazil and BF1 sequences from GenBank sharing over 95% similarity with study isolates. Trees were constructed using MEGA software, 6.0 version under neighbor-joining and Kimura 2 parameters methods (Bootstrap value over 70%). The sequences described in our study are distinguished from the sequences retrieved from the GenBank by a diamond signal. The five clusters identified in our study are indicated by different colors: Cluster # 1: purple, Cluster # 2: blue, Cluster # 3: pink, Cluster # 4: green and Cluster # 5: red.
https://doi.org/10.1371/journal.pone.0178578.g002 Analysis of FLG, NFLG and partial genomes Phylogenetic (Fig 4) and bootscan analyses of six full length genomes (BRGOAP801, BRGO6043, BRTO10_66, BRGO4141, BRGO3145 and BRGO3047) obtained from isolates classified in Cluster # 5 allowed the description of a new recombinant lineage designated CRF90_BF1 by the Los Alamos HIV Sequence Database (Los Alamos National Laboratory) according to the standardized nomenclature [2]. We also obtained one NFLG and four partial genomes for isolates from this Cluster that share the same mosaic structure (Fig 5). The mosaic structures inferred from the analyses of these FLG, NFLG and partial genomes showed a genome predominantly of subtype B, which can be divided into seven subregions alternating subtypes B and F1. These seven subregions were named I (626-2.661), II (2.662-2.971), III (2.972-4.295), IV (4.296-4.759), V (4.760-8.671), VI (8.672-9.492) and VII (9.493-9.612) all positions relative to HXB2 genome. Subregion NJ analyses also confirmed the putative parental HIV-1 subtype (Fig 5). Fully coincident intersubtype breakpoint locations at I-III sub regions were also observed in the NFLG of BRGO4188 isolate and in the partial genome sequences of BRMT508, BRGO3027, BRGO3059 and BRGO6048 isolates (Fig 5 and Table 1).
The epidemiological features of the 11 patients presenting the newly described CRF90_BF1 lineage included six females (four of them pregnant) and five males (two of them prisoners) ( Table 1). The prevailing risk category was heterosexual sex reported by nine patients while one prisoner patient reported intravenous drug use. Six patients were ARV naïve and five had been exposed to ARV drugs either as highly active antiretroviral therapy (HAART) or temporary mother-to-child-transmission (MTCT) prophylaxis. Most patients were from the Central West region (Goiás State: isolates BRGO3027, BRGO3047, BRGO3145, BRGO3059, BRGO4188, BRGO4141, BRGO6048, BRGOAP801 and BRGO6043; Mato Grosso State: isolate BRMT508) and one patient lived in the North region (Tocantins State: isolate BRTO10_66).

Discussion
In this study, we report the characterization of a novel HIV-1 CRF_BF1, named CRF90_BF1 based on six FLG, one NFLG and four partial genome sequences. These isolates shared identical mosaic structures and were identified in individuals without any epidemiological link that live in two distinct geographic regions in Brazil (Central West and North) located around 800-900 km apart. These criteria fulfill the requirements to define a new CRF, which is circulating in distant interior urban areas in Brazil. This novel CRF is the 9 th CRF involving subtypes B and F1 described in Brazil and the 14 th reported in South America. The estimated frequency of the CRF90_BF1 in our sample set was 1.3% (11/828), with predominant detection in the Central West region. However, the actual prevalence of this new CRF in these geographic regions cannot be accurately estimated since there is limited molecular data on HIV-1 isolates especially from the States of Mato Grosso, Mato Grosso do Sul and Tocantins.
The CRF12_BF, the first CRF identified in the Americas was described in 2001 in patients from Argentina and Uruguay and its origin was estimated around the early 80s [53,54], while BF1 recombinants were first reported in Brazil in the early 90's [16,17]. Patients harboring the CRF90_BF1 were diagnosed between 2002 and 2011. The median estimated T MRCA of the CRF90_BF1 and of other putative CRF_BF1 clusters identified in our study is not recent and ranges from middle 80's to middle 90's, similar to that previously estimated for Brazilian CRF28_BF and CRF29_BF [55]. These estimates indicate that CRFs_BF1 have been probably circulating in Brazil for three to four decades.
Besides its early generation, we have evidences, as shown by blast search analyses, that the CRF90_BF1 and also the other putative CRFs_BF1 clades identified here have a wide geographic circulation (Fig 6). The CRF90_BF1 that we identified in Central West (Goiás and  Mato Grosso) and North Brazil (Tocantins) is probably also circulating in Rondônia, another State in the North region which borders Bolivia in the South/East. HIV-1 BF1 isolates with the same recombination pattern of isolates from Cluster # 3 detected in Central West were also identified in several North Brazilian States (Amazonas, Rondônia, Roraima and Acre), and in the South State of Paraná. Isolates with similar recombination profile of isolates from Cluster # 4 were also identified in the North region (Amapá State) besides the Central West (Goiás) and Northeast Brazil (Maranhão and Piauí). These results suggest the existence of novel CRFs_BF1 circulating in Brazil.
CRF28_BF and CRF29_BF described in the Southeast in 2006 (Santos, São Paulo State) represent the first Brazilian CRFs, and their origin date to 1988-1989 [18,55]. Studies have shown a low prevalence of CRF28_BF and CRF29_BF [14,56], outside São Paulo except in Salvador, Bahia State, Northeast where prevalence ranged from 10%-21% [57,58]. Among all BF1 isolates identified in our study we have found a moderate rate of CRF28/CRF29_BF-like isolates (16.1%, 14 out of 87) and an overall rate of 1.7% (14 out of 828) which represent one of the highest frequencies of these CRFs identified outside São Paulo State.
Despite the predominance of subtype B in most geographic Brazilian regions, except in the South where subtype C prevails, studies have shown that the prevalence of non-B subtypes, particularly URFs_BF1 and URFs_BC has increased in the last decade [15,25,40,59,60]. Our studies have shown a significant percentage of recombinant BF1 forms (3.7-25.9%) in the Central West, North and Northeast Brazilian regions [11,[27][28][29][30][31][32][33][34][35][36][37]]. The most recently described Brazilian CRFs_BF1 (CRF70_BF1 and CRF71_BF1) were identified among blood donors from Pernambuco State, Northeast region [22]. The CRF72_BF1 was identified among blood donors from five public blood banks in Minas Gerais State, Southeast region [21]. These recent data point out the increasing generation and spread of CRFs, especially involving subtypes B and F1 which play an important role in the Brazilian AIDS epidemic. However, the number of complete genome sequences available is still limited, especially sequences from areas away from the epicenter, as our study areas (Central West, North and Northeast) suggesting that only significant bootstrap values >70% are shown at the corresponding nodes. The genetic distance corresponding to the length of the branches is shown by the line at the bottom. The red color represents the CRF90_BF1 identified in this study.

Conclusions
In summary, we identified the novel CRF90_BF1 among heterosexual patients living in two geographic regions in Brazil, away from the epicenter of the epidemic. This is the 9 th CRF_BF1 described in Brazil indicating that continued molecular screening and surveillance are necessary to fully understand the evolutionary dynamics of the HIV-1 epidemic in such a country of continental dimensions. Our results also underscore the importance of full-length genome sequencing of HIV-1 isolates obtained from patients infected by different transmission routes and in different country regions to fully understand the diversity and complexity of the HIV-1 epidemic in Brazil.