First complete-genome documentation of HIV-1 intersubtype superinfection with transmissions of diverse recombinants over time to five recipients

Human immunodeficiency virus type 1 (HIV-1) recombinants in the world are believed to be generated through recombination between distinct HIV-1 strains among coinfection or superinfection cases. However, direct evidence to support transmission of HIV-1 recombinants from a coinfected/superinfected donor to putative recipient is lacking. Here, we report on the origin and evolutionary relationship between a set of recombinants from a CRF01_AE/CRF07_BC superinfected putative donor and diverse CRF01_AE/CRF07_BC recombinants from five putative recipients. Interviews on sociodemographic characteristics and sexual behaviors for these six HIV-1-infected men who have sex with men showed that they had similar ways of partner seeking: online dating sites and social circles. Phylogenetic and recombination analyses demonstrated that the near-full-length genome sequences from six patients formed a monophyletic cluster different from known HIV-1 genotypes in maximum likelihood phylogenetic trees, were all composed of CRF01_AE and CRF07_BC fragments with two common breakpoints on env, and shared 4–7 breakpoints with each other. Moreover, 3’ half-genomes of recombinant strains from five recipients had identical/similar recombinant structures with strains at longitudinal samples from the superinfected donor. Recombinants from the donor were paraphyletic, whereas five recipients were monophyletic or polyphyletic in the maximum clade credibility tree. Bayesian analyses confirmed that the estimated time to the most recent common ancestor (tMRCA) of CRF01_AE and CRF07_BC strains of the donor was 2009.2 and 2010.7, respectively, and all were earlier than the emergence of recombinants from five recipients. Our results demonstrated that the closely related unique recombinant forms of HIV-1 might be the descendent of a series of recombinants generated gradually in a superinfected patient. This finding highlights the importance of early initiation of antiretroviral therapy as well as tracing and testing of partners in patients with multiple HIV-1 infection.

Identification of HIV-1 recombinants from primary infected individuals usually indicates the spreading of these recombinants among a population, but the origin and transmission history are incompletely understood [9]. Deciphering the origin of a group of recombinants with high genetic similarity using phylogenetic analyses is difficult. Studies have reported that recombinants sharing some breakpoints might have a common recombinant ancestor (e.g., CRF07_BC and CRF08_BC in China [10,11] and HIV-1 BF intersubtype recombinant viruses in Argentina [12]) or that there might be a direct parental/progenitor relationship between them (e.g., CRF48_01B and CRF74_01B were probably descended from CRF33_01B in Malaysia [13,14]) or that they may be irrelevant in evolution. Common breakpoints may be attributable to fragile sites or hairpin structure of genomic RNA, pause sites, high-pairing probability, or sequence similarity during reverse transcription. In this instance, the breakpoints often occur in well-conserved regions of viral genomes. These potential mechanisms of recombination have been supported by in vitro experiments and mathematical models, but they may be more complex in vivo [15][16][17][18]. Therefore, elucidation of the recombination mechanisms of HIV-1 may help to guide the surveillance and prevention of HIV spread.
Multiple HIV-1 strains infecting the same person concurrently ("coinfection") or one after another ("superinfection") is believed to be the prerequisite for the generation of recombinant strains [19][20][21], which is supported indirectly by the overlap of "hot areas" for HIV recombinants and multiple infections [22][23][24]. Moreover, recombinants composed of more than two parental viruses have also been found in HIV-1 superinfected cases [25,26]. However, there is no direct evidence to support the transmission of HIV-1 recombinants from coinfection/ superinfection cases to a putative recipient with recombinant descendants, let alone the origin and transmission history of a group of HIV recombinant strains with genetic similarity.
We depicted the origin and evolutionary relationship among a group of closely related CRF01_AE/CRF07_BC URFs (0107 URFs) between an HIV-1 intersubtype superinfected donor and five putative CRF01_AE/CRF07_BC-infected recipients. This finding emphasized the importance of early initiation of antiretroviral therapy (ART) as well as tracing and testing of partners with multiple HIV-1 infection to prevent the spread of recombinant strains.

Sociodemographic characteristics and sexual behaviors of six HIV-1-infected men who have sex with men (MSM)
Previously, we identified six patients infected with HIV-1 CRF01_AE/CRF07_BC strains in a newly diagnosed cohort of MSM in Liaoning, northeast China. Among them, the donor was diagnosed with recent HIV-1 infection on 3 March 2010 and started ART on 7 January 2014. His viral load was well-controlled to <100 copies/mL after that. He is a high-earning businessman and self-reported as seeking younger male partners via online dating sites and social circles. He reported sexual behaviors with one regular partner and �10 casual partners in the past 3 months before the diagnosis of HIV infection. Insertive and receptive positions were adopted when he had sex with other males without a condom. Moreover, methamphetamine and "rush poppers" were used constantly during sex.
All five recipients were diagnosed with HIV infection between 2013 and 2014. Among them (hereafter termed recipient), recipient 1, 2, 3, and 4 had been diagnosed with a recent HIV infection according to the results of limiting antigen (LAg)-avidity enzyme immunoassay (EIA). Recipient 5 was estimated to have become infected with HIV before the end of 2013. Five recipients were all 10 years younger and had a lower income than that of the donor. They usually sought male partners via online dating sites and social circles. They self-reported to have had sex with 3-15 male partners within the last 3 months before the diagnosis of HIV infection except recipient 3, a "rent boy," who had 80 commercial partners and one regular partner (recipient 5). When they had sex with other males, recipient 2, 3, and 4 adopted insertive and receptive positions; recipient 1 and 5 preferred insertive and receptive positions, respectively. Recipient 2 and 4 engaged in condom-less anal sex. Besides, recipient 1, 2, and 3 had a history of substance abuse. The sociodemographic characteristics and sexual behaviors of six HIV-1-infected MSM are outlined in Table 1.

Identification of a lineage of HIV-1 CRF01_AE/CRF07_BC URFs among MSM in Liaoning
To validate the lineage of HIV-1 new recombinants, the near-full-length genome (NFLG) (HXB2: 790-9601 bp) was used for phylogenetic analyses (Fig 1A). The NFLG of six patients formed a distinct monophyletic cluster that was separate from any other known subtypes, CRFs and URFs in the maximum likelihood (ML) phylogenetic tree with poster probability value 1, suggesting that this was a lineage of new HIV-1 URFs.

Six HIV-1 CRF01_AE/CRF07_BC URFs showed similar recombination forms and homologous parental strains
The recombination forms along the whole genomes of six strains were first screened with Recombinant Identification Program (RIP) and jumping profile hidden Markov model LAg-Avidity EIA, limiting-antigen avidity enzyme immunoassay, the testing results were used to determine the patients were recent infection (<180 days after seroconversions) or long-term infection (LT, >180 days after seroconversions) b NA, data were not available because of lost to follow-up (jpHMM), and then validated with Simplot Six strains had similar (but not identical) recombination forms ( Fig 1B). First, six strains were all CRF01_AE/ CRF07_BC recombinants with a CRF01_AE backbone and three or four CRF07_BC insertions in pol, vpr, tat/rev, env, and nef. Second, six strains shared two identical breakpoints: the fifth and sixth breakpoint in env; five strains shared the first and second breakpoints in pol except in the donor; four strains shared the seventh and eighth breakpoint in nef except in recipient 1 and the donor. Recipient 1, 2, and 3 and the donor shared the third and fourth breakpoint in vpr and first exon of tat/rev. Third, the main difference among recipient 3, 4, and 5 was the length of the CRF07_BC segment IV, which was 404 bp, 970 bp, and 644 bp, respectively.
To determine the evolutionary relationship and potential parental strains of the six strains, we undertook phylogenetic analyses on sub-regions between breakpoints (Fig 2). In general, six strains displayed monophyletic clustering in all the sub-region trees of CRF01_AE and CRF07_BC segments. In trees I, III, V, and VII+IX, the CRF01_AE segments from six strains belonged to CRF01_AE lineage 4 among the Chinese MSM population (poster probability = 1) (Fig 2A). In trees II, IV, VI, and VI+VIII, the CRF07_BC segments of six strains belonged to the CRF07_BC lineage predominant among the Chinese MSM population (poster probability �0.96) (Fig 2B). Taken together, this high genetic similarity and homologous parental strains suggested a close evolutionary relationship among the six CRF01_AE/CRF07_BC recombinant strains.

Most likely origin of the series of HIV-1 CRF01_AE/CRF07_BC URFs
Among six patients, according to clinical records and results for LAg-avidity EIA, the donor was diagnosed to have been infected with an HIV-1 CRF01_AE strain on 3 March 2010 (Table 1). Moreover, the donor had been identified to have superinfected another CRF07_BC strain in our previous study [27]. To determine the recombination process between the primary infected CRF01_AE strain and superinfected CRF07_BC strain in the donor, the 3 0 halfgenome was obtained from longitudinal samples by a single-genome amplification (SGA) strategy and used for recombination analyses (Fig 3). The superinfected CRF07_BC strain was obtained first through the SGA strategy detected from the donor on 9 December 2010. The recombination between CRF01_AE and CRF07_BC was first detected~3 months after superinfection. The predominant CRF01_AE/CRF07_BC recombinants were detected~7 months and~12 months after superinfection, respectively. Although the quasispecies and composition of the CRF01_AE/CRF07_BC recombinants among longitudinal samples of the donor changed continuously, some recombinants could be detected at �2 time points. Also, some identical/similar breakpoints were detected between distinct recombinants at different time points (S1A and S2 Figs), which suggested continuous evolution of CRF01_AE/CRF07_BC recombinants in vivo under immune selection from the host.
We further compared the recombination forms of 3 0 half-genomes of the donor at longitudinal samples and five recipients at baseline or after seroconversion (Fig 3). The viral quasispecies of the donor were more complex than those of the five recipients who had only one or two recombination forms. More importantly, each recombinant strain from five patients had at least one similar or identical recombination form with the strain from the donor sample at an earlier time point (Figs 3, and S1B and S2). For example, two forms of recombinants from recipient 1 (1 April 2013) were similar with recombinants from donor samples on 13 July 2011

Temporal evolutionary relationships among HIV-1 CRF01_AE/CRF07_BC URFs
To investigate the possible evolutionary relationship between these closely related CRF01_AE/ CRF07_BC recombinants from six patients, the concatenated CRF01_AE segments and CRF07_BC segments were analyzed by Bayesian molecular clocks (Fig 4A and 4B). In the maximum clade credibility (MCC) tree of CRF01_AE and CRF07_BC, the sequences from six patients formed a monophyletic cluster within all reference sequences, respectively. Within each cluster, the initial infected CRF01_AE strains and the superinfected CRF07_BC strains from the donor were located at the root. The strains of five recipients formed internal branches following the CRF01_AE strains and CRF07_BC strains from the donor, respectively. Moreover, the strains from the donor were paraphyletic, whereas the strains from the five recipients were monophyletic or polyphyletic in the MCC tree. This result was further supported by the evolutionary ML trees constructed with concatenated CRF01_AE and CRF07_BC segments, respectively (S3 Fig). The estimated time to the most recent common ancestor (tMRCA) for the concatenated initial infected CRF01_AE strain and superinfected CRF07_BC strain of the donor was dated to 2009.2 (95% highest probability density 2008.8-2009.7) and 2010.7 (2010.5-2010.9), respectively (Fig 4A and 4B). These observations were consistent with the results using LAg-avidity EIA and SGA on estimation and identification of initial infection and superinfection. The tMRCA of the concatenated CRF01_AE segments in the donor (13 July 2011, 29 November 2011, 22 January 2013, and 23 September 2013) was earlier than that of the corresponding similar/identical recombinants of the five recipients. The tMRCA of concatenated CRF01_AE 2), respectively (Fig 4A). Similar results were found for the tMRCA of CRF07_BC segments of the six patients (Fig 4B).

Discussion
The rapid increase and high diversity of HIV-1 recombinants are a great challenge for the prevention and surveillance of HIV infection globally. The co-circulation of various strains and high rate of multiple infection are prerequisites for a generation of new HIV-1 recombinants. We identified a lineage of new 0107 URFs with homologous parental strains and similar recombination forms among six HIV-1-infected MSM in northeast China. Epidemiology, quasispecies diversity, timelines, and phylogenetics supported the notion that the potential origin of the ancestral virus of a series of closely related 0107 URFs could be traced back to an HIV-1-superinfected individual.
First, HIV-1-infected MSM carry a higher risk of multiple infection compared with that in heterosexuals [28]. Junjie Xu and colleagues reported that~35% of MSM had >5 male sexual partners in the last 12 months in Shenyang, Liaoning [29]. In our study, six patients all sought multiple male sexual partners mainly through online dating sites and social circles in the same area. Moreover, most of them seldom used condoms, admitted substance abuse during anal/ oral sex with their casual male sexual partners, and had a history of syphilis, which further increased the potential for HIV infection and acquisition of multiple viruses [30][31][32][33][34].
Second, the donor was an HIV-1 CRF01_AE/CRF07_BC-superinfected patient. This patient developed a series of CRF01_AE/CRF07_BC recombinants with 1-5 common breakpoints but non-identical recombination forms in longitudinal samples within a treatmentnaïve period of~4 years. This finding is consistent with the observation by McCutchan and colleagues that various recombinants from a heterosexual superinfected individual in Tanzania had common breakpoints in gag, env, and gp41/nef regions in serial samples within 30 months [26]. Compared with the method used by McCutchan and colleagues to amplify three regions of the HIV (multiple-region hybridization assays), we used the SGA method to amplify the relatively long fragments of the 3' half-genome, which could fully reflect the complex structure of the recombinants in six patients [35]. Five recipients in our study had one or two genetically homogeneous CRF01_AE/CRF07 recombinants at baseline or after seroconversion, which showed less quasispecies complexity compared with that in the donor. Surprisingly, the recombination forms of all strains from five patients resembled those at different sampling dates from the donor. Studies have demonstrated the loss of genetic diversity of the HIV within donors to new hosts upon sexual transmission [36] and mother-to-child transmission [37], which may result from a "transmission bottleneck". These data imply that this HIV-1-superinfected patient might have been the donor who transmitted a series of 0107 URFs to new hosts.
Third, according to clinical records and the results of LAg-avidity EIA, we found that the putative donor was diagnosed with HIV infection about 3-4 years earlier than the other five patients, and was the first to be detected with CRF01_AE/CRF07_BC recombinants. The tMRCA of the recombinants of the putative donor was also earlier than that of the other five patients. Furthermore, the putative donor did not start ART until 2014 (i.e.,~4 years after the diagnosis).
Finally, based on Bayesian analyses, the topology of the MCC tree provided compelling evidence of the source of these 0107 URFs. Strains from the putative donor and five recipients formed paraphyletic-polyphyletic or paraphyletic-monophyletic donor-recipient joint phylogeny. Studies have reported that phylogenetic methods might be used to infer the transmission history of epidemiologically linked hosts in an HIV-mono-infected population [38].
Paraphyletic-polyphyletic trees support direct transmission and are believed to exclude intervening transmission and a common source. Typically, paraphyletic-monophyletic trees result from direct or indirect transmission [39].
In recent years, several HIV-1 CRF01_AE and B-related second-generation recombinants have been identified in Asia [40][41][42][43][44], some of which (e.g., CRF55_01B and CRF59_01B) have spread widely around China [45]. More recently, massive new recombinants composed of CRF01_AE and CRF07_BC (the two predominant strains among MSM populations) have also been reported around China [46][47][48][49]. However, most of them were unrelated in terms of phylogenetics, and a few were clustered but the origin and evolutionary relationship among them were not clear. We demonstrated, for the first time, that a group of closely related HIV-1 new recombinants in MSM may derive from one superinfection case, which supported the hypothesis of a model of generation of HIV-1 BF intersubtype recombinants with coincident breakpoints from South America [50].
Recently, early initiation of ART has become a worldwide public-health prevention strategy for HIV transmission. In China, the standards of ART initiation have been updated several times in national treatment guidelines [51][52][53][54]. In 2014, it was suggested that people with HIV infection with CD4+ T-cell count <500 cells/μL should receive ART. In 2016, it was suggested that all HIV-infected people, regardless of the CD4+ T-cell count, should be treated. Therefore, the putative donor reported in the present study did not start ART until 2014 according to the policy in China at that time. During the treatment-naïve period (~4 years), he became superinfected with another HIV strain and developed a series of CRF01_AE/CRF07_BC recombinants, then infected the other five recipients directly or indirectly.
Treatment-as-prevention approach has been shown to reduce the risk of HIV transmission in serodiscordant couples [55], and has been proposed and implemented in many countries (including China). Therefore, the prevalence of superinfection (such as in the donor in the present study) might be reduced. However, detection of increased URFs in areas with multiple HIV strains suggest there are many undiagnosed multiple-infected cases. Hence, if a multiple-infected case is diagnosed, not only should ART be started early to reduce the transmission risk, strengthened tracing of partners should also be done immediately. In this way, persons infected with complicated HIV-1 strains and a higher risk of further transmission can be diagnosed rapidly.
Due to protection of personal privacy, we did not have sufficient epidemiological data to determine the transmission chain among the six patients. However, we provided evidence from different perspectives to suggest there might be a direct or indirect transmission relationship among our six patients. We also inferred that the patient with HIV-1 superinfection might be the source of a lineage of closely related 0107 URFs (Fig 5).
Our study suggests an important role of HIV-1 superinfection on the generation and transmission of new recombinants. Furthermore, recombinants with high genetic similarity (but distinct recombination forms) could share a common origin. This observation provides a new perspective to infer the evolutionary relationship between HIV-infected individuals harboring recombinants. The present study also calls for greater attention to the monitoring, early ART, and strengthened management of multiple-infected individuals, including the tracing and testing of partners.

Ethics statement
This study was approved by the Ethics Committee of the First Affiliated Hospital of China Medical University ([2018] 2015-140-5). Written informed consent to participate in this study was obtained from all patients before sample collection.

Study participants
The six study participants were newly diagnosed HIV-1-infected patients from a cohort of MSM in a voluntary HIV counseling and testing clinic of the First Affiliated Hospital of China Medical University. They were found to be infected with a lineage of CRF01_AE/CRF07_BC recombinant strains through phylogenetic analyses on pol sequences from routine genotypic testing for resistance to common anti-HIV drugs. The related laboratory testing has been described previously [56,57]. Donor (LNA819) was diagnosed with HIV-1 infection on 3 March 2010 and identified as having HIV-1 superinfection by next-generation sequencing [27]. Due to ART initiation or loss to follow-up, serial plasma samples between 2010 and 2013 were collected from the donor. One plasma sample at baseline was collected from recipient 3 (LN328575), recipient 4 (LN301538), and recipient 5 (LN328576), respectively. One plasma sample at the first or second HIV-positivity time-point after seroconversion was collected from recipient 1 (LN320639) and recipient 2 (LN320392), respectively.

HIV-1 limiting-antigen avidity enzyme immunoassay (LAg-avidity EIA)
Plasma samples were tested for recent HIV infections with LAg-avidity EIA (Maxim Biomedicals, Rockville, MD, USA) according to manufacturer instructions. Normalized optical density (ODn) of 2.0 was used as a threshold cutoff to distinguish long-term HIV infection from recent HIV infection. Plasma with ODn upon initial screening >2 was classified as "long-term infection", whereas that with ODn �2 was retested in triplicate for confirmation. Plasma with median ODn >1.5 was classified as "long-term HIV infection", whereas that with ODn >0.4 but �1.5 was classified as "recent seroconversion"; for ODn �0.4, a serology confirmation test was necessary to further ensure that the plasma sample was HIV-positive.

Amplification and sequencing of near-full-length genomes (NFLGs) and 3 0 half-genomes
NFLGs and 3' half-genomes were amplified and sequenced directly according to methods described previously [58]. In brief, HIV-1 RNA was extracted from 140-μL plasma sample

Phylogenetic and recombination analyses
Sequences were assembled with Sequencher 5.4.6 (Gene Codes, Ann Arbor, MI, USA), and aligned using Gene Cutter within HIV databases (www.hiv.lanl.gov), then adjusted manually with BioEdit 7.0 (www.mbio.ncsu.edu/ BioEdit) [59]. All sequences obtained in this study were blasted in the local sequence library by our research team. We used BLAST within HIV databases (www.hiv.lanl.gov) to eliminate potential cross-contamination during the experiment. Reference sequences were downloaded from the Los Alamos HIV Database (www.hiv. lanl.gov). Maximum likelihood phylogenetic trees (ML trees) of the aligned NFLG, 3 0 halfgenomes, and concatenated CRF01_AE and CRF07_BC segments were constructed by Fast Tree [60] and edited by Fig Tree v1.4.2 (http://tree.bio.ed.ac.uk/software/figtree). Recombination analyses were first done with the Recombinant Identification Program (RIP) [61] and jumping profile Hidden Markov Model (jpHMM) within HIV databases (www.hiv.lanl.gov) [62]. Further confirmation was achieved with bootscanning in Simplot 3.5.1 [63] to define the recombination structures (window size: 350 nt; step size: 50 nt; bootstrap replicate: 250).

Bayesian Markov Chain Monte Carlo (MCMC) evolutionary analyses
We wished to explore the phylogenetic relationship and the time of the most recent common ancestor (tMRCA) of viruses from the six participants, Bayesian phylogenetic analyses were done using the MCMC inference implemented in BEAST v2.5.1 [64]. For concatenated CRF01_AE segments, strict molecular-clock analyses were undertaken under the model of general time reversible (GTR) + I+G nucleotide substitution. For concatenated CRF07_BC segments, relaxed molecular-clock analyses were undertaken under the model of Tamura Nei 93 (TN93) nucleotide substitution. The MCMC chains were run 200-million times and sampled every 20,000 steps. The output was tested for convergence using Tracer v1.6, and related parameters were estimated from an Effective Sample Size (ESS) more than 200. Phylogenetic trees were summarized using TreeAnnotator (with 10% burn-in) and then edited using Fig  Tree v1.4.2. The NFLG and 3' half-genome sequences reported here are available in GenBank under accession numbers KX434794, KX434795, KX434797, KX434798 KX434799, MT857722,  MW287665-MW287747, and MW344769-MW344807. Supporting information S1 Fig. Highlighter plots of HIV-1 3 0 half-genome diversity in six HIV-1-infected MSM. The initial strain (CRF01_AE) and superinfected strain (CRF07_BC) from the donor were chosen as master sequences and are colored light-coral and slate-blue, respectively. The x-axis represents the base number. The y-axis represents the sampling dates of donor or recipients. The 3' half-genome sequences obtained from the donor and five recipients are shown in panel A and B, respectively. Some recombination key sites were marked with ".