HIV-1 genetic diversity and demographic characteristics in Bulgaria

HIV-1 strain diversity in Bulgaria is extensive and includes contributions from nearly all major subtypes and the Circulating Recombinant Forms (CRF): 01_AE, 02_AG, and 05_DF. Prior to this study, HIV-1 sequence information from Bulgaria has been based solely on the pro-RT gene, which represent less than 15% of the viral genome. To further characterize HIV-1 in Bulgaria, assess participant risk behaviors, and strengthen knowledge of circulating strains in the region, the study “Genetic Subtypes of HIV-1 in Bulgaria (RV240)” was conducted. This study employed the real time-PCR based Multi-region Hybridization Assay (MHA) B/non-B and HIV-1 sequencing to survey 215 of the approximately 1100 known HIV-1 infected Bulgarian adults (2008–2009) and determine if they were infected with subtype B HIV-1. The results indicated a subtype B prevalence of 40% and demonstrate the application of the MHA B/non-B in an area containing broad HIV-1 strain diversity. Within the assessed risk behaviors, the proportion of subtype B infection was greatest in men who have sex with men and lowest among those with drug use risk factors. During this study, 15 near full-length genomes and 22 envelope sequences were isolated from study participants. Phylogenetic analysis shows the presence of subtypes A1, B, C, F1, and G, CRF01_AE, CRF02_AG, CRF05_DF, and one unique recombinant form (URF). These sequences also show the presence of two strain groups containing participants with similar risk factors. Previous studies in African and Asian cohorts have shown that co-circulation of multiple subtypes can lead to viral recombination within super-infected individuals and the emergence of new URFs. The low prevalence of URFs in the presence of high subtype diversity in this study, may be the result of successful infection prevention and control programs. Continued epidemiological monitoring and support of infection prevention programs will help maintain control of the HIV-1 epidemic in Bulgaria.


Introduction
The HIV-1 epidemic in Bulgaria is notable for its high level of strain diversity and relatively low amount of new and existing infections. Recent studies using pro-RT sequences have reported the presence of subtypes A1, B, C, F1, and H as well as CRFs: 01_AE, 02_AG, 05_DF, and URFs in Bulgaria [1,2]. Such broad diversity is perhaps unexpected given new HIV-1 diagnoses varying between 171 in 2009 to 247 in 2014 [3] and an HIV-1 prevalence of 0.015% (1109 known infections) in 2009 [4], all of which are well below the rates of Western & Central Europe (estimated new infections: 29,000 and prevalence: 0.2% in 2012) or North America (estimated new infections: 48,000 and prevalence: 0.5% in 2012) [5]. However, while the rates of infection appear low, Bulgaria's geographic position along major transcontinental shipping routes and its role as a point of origin for human trafficking [6] makes characterizing and monitoring the high level of HIV-1 strain diversity within Bulgaria an important concern. Owing to its location in the eastern part of the Balkan Peninsula, Bulgaria is bordered on three sides by countries that contain very different collections of HIV-1 strains.
To the north of Bulgaria, Romania contends with an HIV-1 epidemic that over the past decade has been found to contain largely subtype F1 strains (>70%) followed by subtype B (6-14%) and subtype C (4%) [7][8][9]. To the west, Serbia has an epidemic that is predominantly subtype B (>90%) with a small number of infections caused by subtypes A1, F1, CRF01_AE, and CRF02_AG [9,10]. The other western neighbor, the former Yugoslav Republic of Macedonia, is known to have active HIV-1 and AIDS cases [3,11], however reported cases are few in number (less than 200 as of 2014) and have yet to be molecularly characterized [12] or sequenced and published as of August, 2018. Bulgaria's southwestern neighbor, Greece, has an HIV-1 epidemic that is mostly non-subtype B [13], with subtype B represented at less than 46% and the balance containing subtype A1 (41%) and a mix of subtypes C, F1, G, and recombinants including CRFs: 01_AE, 02_AG, 04_cpx, and 11_cpx [9]. To the southeast, Turkey has an epidemic that contains a similar proportion of subtype B (45%) as Greece, but differs in its remaining constituents, which are mostly B/F recombinants (22%), subtypes A1 (12%) and F1 (7%) along with smaller amounts of CRF02/B recombinants and subtypes C, G [9,14]. Within those settings, this study performed the systematic and detailed characterization of HIV-1 subtype diversity of circulating strains in Bulgaria.
The RV240 Genetic Subtypes of HIV-1 in Bulgaria 2008-2009 study, in a collaboration between the Bulgarian Ministry of Health, the Bulgarian National Center for Infectious and Parasitic Diseases, the Naval Medical Research Unit No. 3, and the U.S. Military HIV Research Program performed characterization of circulating HIV-1 strain diversity by implementing a high-throughput real-time PCR based assay that distinguishes subtype B from non-subtype B strains. The Multi-region Hybridization Assay (MHA) B/non-B [15] uses subtype B specific probes to identify subtype B strains through probe hybridization to six regions across the HIV-1 genome. Additionally, since current sequence knowledge of HIV-1 in Bulgaria is limited to the protease and RT genes, complete envelope gene and near full-length HIV-1 genome sequencing was conducted within subsets of study participants for phylogenetic analysis and MHA assay validation. Herein, we describe high-throughput measurements of strain diversity using the MHA B/non-B assay as well as the molecular epidemiology of env gene and full- length genomes of circulating strains in Bulgaria. The combined results indicate a diverse collection of subtype and recombinant strains and provide a previously unavailable sequence and molecular epidemiology status of HIV-1 in Bulgaria.

Study participants
Between

MHA B/non-B primer and probe testing
Details of the initial testing of primers and probes for each region have been described previously [16][17][18][19]. Briefly, primers were tested with a panel of full-length HIV-1 DNA generated from previously characterized samples. Probes were tested for functionality with DNA fragments that have a 5-10 nucleotide overhang on each end with respect to the probe length. For each probe and target pair, a reaction mix containing 2x TaqMan SYBR Green PCR Master Mix (Applied Biosystems Inc., Foster City, CA), 1 μM probe, 1 μM target, and dH 2 O was prepared in duplicate. Dissociation curves were obtained with a 384-well ABI PRISM 7900HT Sequence Detection System (Applied Biosystems Inc., Foster City, CA) using the following conditions: 95˚C for 15 s, 30˚C for 15 s, and 95˚C for 15 s at a 10% ramp rate. The melting temperature of the duplexes was estimated as the maximum of the peak in the-dRn/dT vs. temperature graph.

MHA data processing
During the reaction, fluorescence intensity was monitored and analyzed using the SDS v2.1 software (Applied Biosystems Inc.). Positive results were defined as having a threshold cycle lower than 35 cycles and having displayed an exponential increase in the normalized fluorescence intensity for five or more consecutive cycles. The PCR positive control for each sample was conducted in parallel with the other reactions to assess whether negative real-time PCR results were due to a lack of template amplification or probe hybridization. Control reactions contained 2x SYBR Green PCR Master Mix (Applied Biosystems Inc.) and similar amounts of inner primers and template as described above. Within the control reactions, amplicon production was confirmed using melting curves (95˚C 15 s, 30˚C 15 s, and 95˚C 15 s at 10% ramp rate) to distinguish the expected PCR products from primer-dimers of lower thermal stability.

Sequencing
Fifteen of the subtyped samples were subjected to near full-length sequencing and 22 other samples were subjected to env sequencing. PBMC DNA provided the template to generate full-length and env sequences using amplification and sequencing methods as previously described [20]. Sequence data were assembled and manually edited using Sequencher 5.0 (Gene Codes Corporation, USA).

Phylogenetic analysis
The initial HIV-1 genotype was assigned for each sequence in this study using the HIV-1 Genotyping Tool at the National Center for Biotechnology Information (NCBI) [21]. A multiple sequence alignment of the sequences from this cohort including subtype reference sequences was constructed with HIVAlign [22] and refined using MEGA version 5 [23]. Neighbor-Joining trees were constructed and bootstrap values (maximum likelihood, 100 replicates) supporting relevant branches were calculated with DIVEIN [24] using the estimated GTR+I+G model. Phylogenetic trees generated from the analysis were visualized using FigTree version 1.4.2 (http://tree.bio.ed.ac.uk) and analyzed for genotype assignment. Informative site analysis, visual inspection, bootstrap analysis of subgenomic trees, and the jumping profile Hidden Markov Model analytical tool [25] were used to precisely map breakpoints within the final genome structures of inter-subtype recombinants [26][27][28].

Results
Validation of the MHA probes and primer sequences was conducted using a subtype reference panel of 35 full-length HIV-1 genomes representing subtypes A (n = 4), B (n = 7), C (n = 8), D (n = 6), G (n = 2), CRF01_AE (n = 5), CRF02_AG (n = 2), and one A1G recombinant as listed in the S2 Table. Scoring of the validation results was assessed per amplified region and included amplification success, probe reactivity for the known subtype B genomes, and probe specificity which quantified the number of correct probe reactions in the validation panel. Across the six genomic regions: amplification success ranged from 83-100%, probe sensitivity ranged from 80-100%, and probe specificity ranged from 88-100% as detailed in Table 1. MHA B/non-B performance within the study cohort was confirmed by testing assay performance on a selection of 15 full-length HIV-1 genomes from participant samples which had been subjected to fulllength HIV-1 sequencing. Among those samples, amplification success ranged from 87-93%, probe sensitivity ranged from 80-100%, and probe specificity ranged from 85-100%, Table 2.

HIV-1 subtype distribution and phylogeny in Bulgaria
From the 343 participant PBMC fractions, only 276 contained sufficient viral DNA to produce genomic template for characterization. Of those, 61 (22%) did not amplify at least four of the six targeted regions in the HIV-1 genome and were therefore classified as non-typeable. The results from the remaining 215 typeable samples (166 Sofia, 31 Varna, 18 Plovdiv) revealed an overall subtype B proportion of 39%, which ranged from 33% to 41% between the three study sites (Fig 2) and was nonuniformly distributed between the male and female participants (Table 3). During the MHA's development, 15 full-length HIV-1 genome sequences were produced using samples from early enrollees. Once the MHA was completed, 22 additional nonsubtype B samples were selected for sequencing of the env gene to obtain sequence information that would be useful for future vaccine development in Bulgaria and the surrounding region. Phylogenetic analysis of the env sequences revealed an extensive diversity of strains, which include subtypes A1, C, F1, G, CRF01_AE, and CRF02_AG. Within the env dataset, a group of CRF01_AE strains was observed with intra-group genetic distances of less than 3% from a trio of males with self-identified intravenous drug use (IDU) and non-IDU risk behaviors, Fig 3. Among the full-length sequences, phylogenetic analysis revealed the presence of a closely

Demographic attributes
In this study, male participants outnumbered females by more than 2-fold. The ratio of male-tofemale participants from the main study sites were: Plovdiv 20/13, Varna 35/19, and Sofia 194/87; the nine participants from other cities were male. Within the subtyped samples, 150 were from male participants (median age: 39 years, interquartile range [IQR]: 32-47 years) and 65 were from female participants (median age: 33 years, IQR: 28-40 years). Among those groups, 79% of the males reported having one or more of the listed risk behaviors whereas only 23% of the females acknowledged a risk behavior. In general, most participants who reported any risk factor, typically selected more than one. The queried risk factors were: commercial sex work (CSW) received or paid, intravenous drug use (IDU) or non-IDU, men having sex with men (MSM), sex while travelling (casual or with a travel companion), and transfusion. In this report, the CSW and sex while travelling risk factors are grouped respectively into CSW and travel sex. The number of typeable participants self-indicating one or more risk factors, their gender, and the detected infection subtype are shown in Table 4. The low number of risk responses from the female participants prevents dissection of female risk behaviors in relation to the infecting subtypes. However, among all subtyped females 22% are infected with subtype B HIV-1 strains, whereas 47% of the males are infected with B strains. Dissection of the male self-reported risks by subtype shows IDUs (13%) and non-IDUs (35%) having the lowest percentage of subtype B infections and MSM having the highest percentage of subtype B infections (70%). Within the transfusion, CSW, and travel sex risk categories (males only) the percentage of subtype B infections detected were 40%, 42%, and 56%, respectively. Further dissection of the male risks, Table 5, shows significant (p values < 0.02) age differences of greater than 10 years between the drug use category and the CSW, travel sex, and transfusion risk categories. Conversely, within each behavior category, small differences in age (less than 4 years) were observed between subtype B or non-B infected male participants. For both genders, the proportion of subtype B infections above and below the median ages (males: 39 years, females: 33 years) were 49% and 45% for males and 21% and 22% for females, respectively, with no statistically significant differences.

Discussion
This study describes the characterization of HIV-1 genetic diversity in Bulgaria using a combination of the high-throughput real-time PCR based Multi-region Hybridization Assay (MHA) B/non-B and env/full-length genome sequencing. Although 343 participant samples were received and analyzed, 67 did not produce genomic HIV-1 template for subsequent assay steps, this was likely due to the use of HIV-1 therapy clinics as the primary recruitment sites and the enrollment of long-term viral suppressed participants. In addition, another 61 samples were classified as non-typeable since they did not amplify at least four of the six targeted regions. This was potentially due to low viral loads, but could also result from the presence of recombinants containing genomic contributions from less common CRFs that were not included during the design of the universal primers. For the remaining 215 samples, the MHA B/non-B assay demonstrated target amplification, probe sensitivity and probe specificity across six regions of the HIV-1 genome (gag, RT, int, tat, env and nef). This strategy provides for subtype determination based on multiple distant regions of the HIV-1 genome instead of relying on partial sequencing or amplification and analysis of a single subgenomic region. MHA B/ non-B analysis revealed an overall subtype B prevalence of 39%, which is less than the proportion of prot-RT based subtype B strains (50%) currently available in the HIV-1 database (hiv. lanl.gov) for Bulgaria during the study enrollment period. The difference between these proportions is likely due to the limited genomic information obtained when analyzing a single region such as prot-RT. When stratified by gender, subtype B infections were detected in 47% of the subtyped males and 21% of the subtyped females. Further stratification of the males by self-reported risk factors disclosed a nonuniform distribution wherein males reporting activities such as MSM or travel sex had the highest percentage of subtype B infections (70% and 56%, respectively), while males reporting drug use activities (IDU or non-IDU) have the lowest percentage of subtype B infections (13% and 35%, respectively). The high percentage of subtype B strains among the MSM risk group and low percentage of subtype B strains among the females are both consistent with previously identified trends in European countries [29,30]. The lower percentage of subtype B infections in the male IDU risk group (13%) sampled in this study is intermediate to that seen in previous reports of strain diversity in IDU from general European (61.4%) [30] and Bulgarian (8.5%) [2] cohorts. Overall, these results are consistent with the finding that MSM have larger proportions of subtype B infections than IDU or heterosexual risk groups in Europe.
When examined in terms of participant age, the proportion of subtypes were equivalent above and below the median ages regardless of gender. Combined with the low number of self-identified risk behaviors among the females, these results support the premise that males are the primary link between these risk behaviors and their female partners [4]. In addition, within the risk categories there were no significant differences in the mean ages of participants infected with subtype B or non-B strains. This suggests a consistent and established HIV-1 subtype distribution within the behavioral groups queried by this study. In contrast, there were differences in the ages of males between the risk categories. Significant differences in male participant ages were observed between the drug use category (mean ages of 29 and 29.1 years for IDU and non-IDU, respectively) vs the CSW (p = .017), travel sex (p = .002), and transfusion (p = .013) categories with mean ages of 36, 40, and 43 years, respectively. Although slightly older, the relative youth of participants in this study's drug use category is similar to a recent report, which found a higher proportion of youths among the people who inject drugs demographic with a mean age of 26.4 years for both genders [2]. Analysis of the env and full-length sequences identified two small groups of participants with similar risk factors: a triplet of drug users infected with CRF01_AE (Fig 3) and a pair MSM infected with subtype A1 , Fig 4. Although these were small sample subsets, the presence of closely related strain groups (env genetic distance < 3%) suggests these transmission events occurred within Bulgaria and are not simply the result of infection during foreign travel. Given the high diversity of HIV-1 strains in Bulgaria, the number of unique recombinant forms (URF) observed in this study was surprisingly low. Since previous analyses were limited to pro-RT sequences, it was expected that multiple URFs would be observed once longer env or full-length sequences were obtained. Other countries with comparable diversity [31,32], displayed high proportions of URF (approaching half of the sequences collected), which should have been apparent within the 37 sequences collected here. Aside from potential sampling bias, the most likely explanation for the low number of URFs is the low prevalence of HIV-1 in Bulgaria, which would minimize the opportunity for super-infection and recombinant generation. In either instance, these sequences are a valuable reference for future epidemiology studies and vaccine development.
In conclusion, this study sampled and subtyped HIV-1 strains from 215 of the approximately 1100 known Bulgarian HIV-1 infections. The MHA B/non-B assay provides a highthroughput and high-resolution means for subtyping HIV-1 infections. Compared to analysis of a single genomic region, the resolution generated by assaying multiple regions of the HIV-1 genome, provides greater confidence in subtype assignments in countries where multiple subtypes or recombinants are circulating. However, any subtyping assay that makes sub-genomic measurements has the potential to miss and under-report recombination events that occur outside of the observed regions (a limitation only overcome by resource intensive full-length sequencing). The high diversity of HIV-1 strains sequenced during this study indicate multiple introductions of HIV-1 into the Bulgarian populace and are representative of strains found among its neighboring countries as well as distant continents. Fortunately, new infection rates in Bulgaria are low compared to Europe in general. However, if new infection rates increase to levels where super-infections become common, then the established HIV-1 subtype diversity can be expected to contribute to the production of new URFs. In the context of vaccine development and monitoring of the epidemic, increased strain diversity complicates the immunogen or reagent selection process. Given the high level of strain diversity already present in the Bulgarian epidemic, continued monitoring for changes in subtype distribution and continued support for testing, treatment, and prevention programs will be necessary to maintain control of HIV-1 in Bulgaria.

Sequence data
The 37 HIV-1 sequences produced during this study have been submitted to GenBank and are available under accession numbers MH746230-MH746267.

S1 Fig. Genomic structures of observed URF_A1/C/D and CRF05_DF strains.
Genomic subtype breakpoints of this study's A1/C/D URF and CRF05_DF strains relative to the position of HIV-1 genes. (TIF) S1