HIV-1 variants are archived throughout infection and persist in the reservoir

The HIV-1 reservoir consists of latently infected cells that persist despite antiretroviral therapy (ART). Elucidating the proviral genetic composition of the reservoir, particularly in the context of pre-therapy viral diversity, is therefore important to understanding reservoir formation and the persistence of latently infected cells. Here we investigate reservoir proviral variants from 13 Zambian acutely-infected individuals with additional pre-therapy sampling for a unique comparison to the ART-naïve quasispecies. We identified complete transmitted/founder (TF) viruses from seroconversion plasma samples, and additionally amplified and sequenced HIV-1 from plasma obtained one year post-infection and just prior to ART initiation. While the majority of proviral variants in the reservoir were most closely related to viral variants from the latest pre-therapy time point, we also identified reservoir proviral variants dating to or near the time of infection, and to intermediate time points between infection and treatment initiation. Reservoir proviral variants differing by five or fewer nucleotide changes from the TF virus persisted during treatment in five individuals, including proviral variants that exactly matched the TF in two individuals, one of whom had remained ART-naïve for more than six years. Proviral variants during treatment were significantly less divergent from the TF virus than plasma variants present at the last ART-naïve time point. These findings indicate that reservoir proviral variants are archived throughout infection, recapitulating much of the viral diversity that arises throughout untreated HIV-1 infection, and strategies to target and reduce the reservoir must therefore permit for the clearance of proviruses encompassing this extensive diversity.


Introduction
Although over 23 million individuals living with HIV were receiving ART by the end of 2018, only two have been cured following stem-cell transplantation of HIV-1 resistant cells [1][2][3][4].ART alone is not curative due to the persistence and proliferation of latently infected CD4+ T cells harboring intact but quiescent proviruses unaffected by antiretrovirals that target stages of active viral replication [5][6][7][8][9][10][11].This long-lived and potentially self-renewing population of latently infected cells can serve as a reservoir for viral rebound in the event of treatment cessation [12][13][14][15][16], and efforts to understand the reservoir are therefore essential to HIV-1 cure strategies.Genetic approaches investigating the reservoir have sequenced rebounding virus in HIV-1 patients undergoing treatment interruption, reactivated virus from latently infected cells stimulated in vitro, and proviral populations during ART [14][15][16][17][18][19][20][21][22][23][24][25][26].These studies provided critical insights into the complex nature of provirus remaining during treatment, only a small fraction of which is capable of replicating, and a further subset of which reactivates with treatment interruption [22].The sources and establishment of this reservoir are of considerable interest, and although the reservoir is seeded beginning in very early stages of infection [5,12,14,20,[27][28][29], few investigations have explored the relationship of the reservoir to early infection, pre-therapy viral variants.The extent to which these variants may persist in the reservoir during virologically suppressive ART is incompletely understood.
Recent studies examining associations between transmitted virus, its descendent quasispecies in chronic ART-naïve infection, and the reservoir include genetic analyses of amplified virus by Brodin et al. [30] and Abrahams et al. [31], particularly from quantitative viral outgrowth assay (QVOA) as in Abrahams et al. [31], while additional work from Jones et al. [32] infers age of latent proviral genes in relation to pre-therapy plasma variants.All three groups describe heterogenous populations of proviral sequences that do not indicate ongoing evolution during virologically suppressive ART [30][31][32].Furthermore, all three studies observe proviral variants that are inferred to be most closely related to sequences circulating in the plasma immediately prior to the start of treatment as well as variants contemporaneous with the earliest pre-therapy sample [30][31][32].The frequencies of proviral variants dating to particular pre-therapy eras are distinct in each PLOS PATHOGENS study, with Brodin et al. [30] and Abrahams et al. [31] describing 60% and 71%, respectively, of proviral variants during treatment as most closely related to pre-therapy sequences from immediately prior to treatment initiation, while these frequencies are higher than that described in Jones et al. [32].Given the interpatient variability in proviral population structure present in these studies, it is perhaps unsurprising for discrepancies in findings as well.
In this study, we examined reservoir proviral sequences in the context of pre-therapy plasma HIV-1 RNA diversity and evolution in 13 Zambian seroconvertors.Critically, our reconstruction of within-host HIV-1 evolution includes the inference of the near full-length transmitted/founder virus from single genome amplification, allowing us to investigate the possible long-term persistence of this sequence within the reservoir.Utilizing the phylogenetic approaches developed by Jones et al. [32] and additional analyses to assess the reservoir during short-term ART, our findings indicate that latent proviral diversity broadly reflects plasma HIV-1 RNA diversity during the period of pre-therapy infection.A majority of variants appear most closely related to those circulating in plasma near the time of ART initiation, but the reservoir quasispecies can in some individuals include variants present at the time of transmission, and demonstrates persistence of variants archived throughout ART-naïve infection.

Participant selection and sampling methods
We identified 13 Zambian seroconvertors from the Zambia-Emory HIV Research Project for study according to the following criteria: ART-naïve infection of at least two years, subsequent ART with viremia <50 copies/mL at one or more time points following therapy initiation, and sample availability during treatment (Table 1).All participants received combination ART per country guidelines.We amplified and sequenced a minimum of seven near full-length genomes (NFLGs) by single genome amplification (SGA) from the earliest HIV+ plasma sample available for each participant (seroconversion sample), which was collected a median of 44 days post-estimated date of infection (EDI) (Table 1 and Fig 1).Sequencing was performed using Pacific Biosciences Single Molecule, Real-Time (SMRT) sequencing [33], and the transmitted/founder (TF) virus was inferred as the consensus of the low-diversity NFLGs (S1A Fig) .Participant Z1658F was determined to have been infected with two TF viruses (S1 Fig) .We additionally utilized SGA and SMRT sequencing to amplify and sequence an approximately 3.6 kb amplicon spanning the vpu, env, and nef genes from plasma samples collected approximately one year following the EDI, as well as from the last pre-therapy time point available (Fig 1).For six individuals, we additionally amplified and sequenced proviral DNA from the last pre-therapy time point to assess the divergence of sequences from the TF virus when collected from the cellular compartment rather than plasma.
For all participants, the vpu-env-nef amplicon was also generated by SGA from DNA of peripheral white blood cells collected during treatment; for four individuals, proviral sequences were also sampled at a second, later time point during ART (Table 1 and Fig 1).All sequences analyzed across time points were free of obvious defects such as nonsense mutations, INDELs resulting in disruption of an open reading frame, or in-frame INDELs of more than 90 nucleotides (up to 90 accepted to accommodate Env variable loops).In total, 1,275 sequences, excluding APOBEC hypermutants, were generated and analyzed for all participants and time points (S1 Table ).

Infection date estimates
Although participants studied in this investigation were routinely tested for HIV, providing narrow windows for infection between the last HIV negative and first positive tests, estimation

PLOS PATHOGENS
of the infection date within these time frames is calculated as the midpoint of the antibody test dates.To corroborate our clinically-based EDIs from HIV testing in the single-variant infections, we used Bayesian approaches to infer the mean root date, a proxy for infection date, directly from sequence data.We used all pre-therapy env sequences without evidence of hypermutation or recombination for the analysis, annotated with the sample collection dates.Though Bayesian-estimated mean root dates predated the clinical EDI in all cases, the 95% confidence intervals of the Bayesian root dates overlapped the clinical EDIs in all but four infections (Fig 2).These results confirm the early nature of the infections when sampled at the seroconversion time point.

Reservoir variants are distributed throughout the phylogenetic trees of viral sequences from each individual
For all 13 participants, a maximum-likelihood phylogeny was inferred from within-host alignments of plasma RNA and proviral DNA vpu-env-nef sequences from all time points, where the tree was rooted on the inferred TF virus (Fig 3 and S2 Fig).This permits insight into the divergence of all descendant variants in each population from the TF virus, as well as the inferred evolutionary relationships between them.Divergence of plasma sequences from the  Although proviral variants closely related to and including the TF virus persist in the reservoir, reservoir variants are distributed throughout the viral phylogenies for each individual.Plasma variants from one year post-EDI and the last pre-therapy time point exhibited a "ladder-like" topology characteristic of within-host phylogenies [34], where plasma sequences from a given time point formed distinct clades in all intrapatient trees, and reservoir variants fell within or between these clades.In an initial analysis, we classified individual reservoir variants as being most closely related to variants of the clade within which they fall, or as

Dating of provirus integration indicates variant archiving throughout infection
While visualization of within-host phylogenies inferred from pre-therapy plasma sequences and proviruses persisting during ART allowed us to estimate the era in which the latter integrated into the reservoir, to more precisely estimate the age of proviral variants with respect to the ART-naïve infection, we applied the method developed by Jones et al. [32].This method utilizes pre-therapy plasma sequences to develop a model of within-host evolution relative to sampling time, and places reservoir variants at a distinct date along the infection history, rather than into a broad category based on relatedness to discrete pre-therapy sampling.For this analysis, we inferred maximum-likelihood phylogenies from pre-therapy plasma and reservoir proviral env sequences, where the root was placed at the location that maximized the correlation between the divergence from the root and sample collection date of the pre-therapy plasma sequences ( Consistent with reservoir proviral variants being seeded at various times spanning infection to treatment initiation, there is considerable discrepancy between sample collection dates and inferred integration dates for proviral sequences sampled during ART, with some variants estimated to have been integrated near the time of seroconversion.In a representative case, the point estimates of integration dates for Z1165M indicate a variant was archived within three months of the root date of Feb 21, 2006.In contrast, several variants displaying considerably higher divergence from the root were present as well, including those with estimated integration dates consistent with the last ART-naïve plasma variant date estimates.In participant Z634F, there are two variants dating to approximately one year post-infection, but no earlier variants, and only provirus dating to shortly prior to the initiation of therapy was detected in participants Z1044M and Z1808F (S5 Fig) .Interestingly, where cells were sampled at the last ART-naïve time point, such as for participant Z1165M, the integration date estimates for provirus of these cells typically fell slightly after that of the proviral variants persisting during treatment.Across the group of participants, however, and consistent with our initial analysis in Fig 4C , integration date estimates for proviral sequences supported periodic seeding of variants in the reservoir throughout the infection.

Repeated sampling during ART and persistence of early infection variants
The dynamics of proviral decay during short-term ART influence the results of this investigation, as proviral DNA decays most rapidly within approximately the first one to two years of therapy [35][36][37][38].By sampling during this time frame, we therefore may be sampling provirus that persists only transiently rather than comprising the more stable population of latently infected cells with a slower decay rate.To determine if within-host proviral composition was influenced by the relatively short time on treatment, we sampled an additional time point six months to a year later in four participants.All proviral variants without APOBEC hypermutation were included in phylogenetic trees along with all pre-therapy sequences, and phylogenies were again rooted on the TF virus (  To extend this analysis, we formally compared proviruses sampled at both time points during ART with respect to their genetic divergence from the TF virus.To facilitate combining data across participants, root-to-tip or patristic distances of each reservoir sequence were normalized to the participant's total tree height.Comparison of these scaled root-to-tip distances by sampling time point during ART revealed shorter mean distances for the later reservoir samples compared to the earlier ones (unpaired t test, p = 0.1054, Fig 7).This suggests that, with ongoing treatment, viral variants are not continuing to evolve, as this would bring about an increase in patristic distance.Furthermore, it suggests that early viral variants may be enriched in the reservoir during ART.

Discussion
We observed that proviral sequences from 13 individuals who had undergone short-term ART were distributed among pre-therapy sequences in phylogenetic trees, with the majority of proviral sequences most closely related to variants from the last ART-naïve time point.However, as analysis of the estimated time of integration for proviral sequences indicates, there is archiving of variants throughout ART-naïve infection, from the earliest time of infection to treatment initiation.This finding is consistent with previous work by Jones et al. [32], but extends the stages of pre-therapy infection explored to acute infection.We identified TF viruses from acute infections with longitudinal follow-up through chronic ART-naïve infection and treatment initiation, while the pre-therapy samples of the two HIV-1 infections investigated in Jones et al. are from chronic infections [32].Archiving of variants throughout ART-naïve infection is complementary to the observations that the reservoir is smaller and less diverse in individuals beginning treatment early in infection versus during chronic infection [37,39,40], since preventing replication with ART ensures a halt in viral evolution and concomitant latent infection with progressively more diverse variants.
Within the diverse populations of proviral sequences we observed, we identified variants that were identical to or contemporaneous with the TF virus after as many as six years of ARTnaïve infection and following six to 24 months of ART.These very early, TF-related sequences were observed in five of the 13 individuals sampled and represented from 2.6-7.5% of all reservoir variants in those individuals.It is clear that these very early viral sequences can persist for several years in the absence of therapy, consistent with their integration in long-lived CD4+ T cells.Persistence of ancestral variants is not unprecedented, as several studies assessing drug resistance in patients receiving virologically suppressive ART after a history of non-suppressive therapy found that both ancestral, drug susceptible virus and variants with resistance mutations persist during years of effective treatment [41][42][43][44].
Recent studies have shown that a majority of proviruses persisting during ART exhibit large internal deletions or other defects, such as nonsense mutations resulting from APOBECinduced hypermutation, which render the provirus defective [22,24].Due to sample limitations, we assessed approximately one-third of the genome encompassing the vpu, env, and nef genes, and thus cannot exclude the possibility that sequences we have observed as exact matches to the TF virus in this amplicon might contain differences elsewhere in the genome, including mutations and/or deletions that would prevent viral replication.Nevertheless, all of the sequences used for analysis do represent biologically functional gene regions, since sequences with frameshifting INDELs or nonsense mutations were excluded.Unlike as estimated from the placement of reservoir sequence in the phylogeny.The mean values of proportions for all 13 volunteers are shown, while proportions for each participant are given in S4 Fig.A majority of variants are most closely related to variants present at the last ART-naïve time point.The percentage of variants demonstrating APOBEC hypermutation is also indicated, though these sequences were excluded from analysis.https://doi.org/10.1371/journal.ppat.1008378.g004

PLOS PATHOGENS
Abrahams et al. [31], who used QVOA to characterize sequences reactivated in vitro, we are not exclusively addressing the replication-competent reservoir.However, QVOA are known to underestimate the size of the reservoir, as the bulk of replication-competent proviruses are not induced with single or successive rounds of stimulation [22].Phylogenetic assessment of HIV-1 DNA during virologically suppressive ART serves to address the broad population of persistent provirus within which the replication-competent reservoir is contained, and address its relationship to pre-therapy virus.
In addition to containing TF virus or very early infection variants in some individuals, reservoir proviral populations were overall less evolved from the TF virus than the sequences at the last ART-naïve time point (Fig 4).This finding may be influenced by the short duration of treatment, as all individuals studied here received ART for less than three years at the time of sample collection during treatment, and three participants were sampled within six months of treatment initiation while the reservoir is less stable.However, we did find that early infection variants persisted with continued time on treatment in individuals sampled twice while receiving ART.Furthermore, sequential sampling indicated that with continued time on treatment, the distance of reservoir variants from the TF virus decreased (Fig 7), indicating a potential enrichment for variants dating to earlier in the course of the infection.As viremia rapidly declines in the first phase of viral decay following treatment initiation, followed by a second, slower decay phase [45], latently infected cells decay in stages [35][36][37][38], perhaps with those infected most recently by variants circulating in the plasma just prior to treatment initiation decaying first.This mechanism would be consistent with the observation that CD4+ central memory T cells from four to eight years of ART harbor HIV-1 DNA more closely related to early infection sequences than HIV-1 DNA of shorter-lived CD4+ effector memory T cells, in which there is a more prominent decline of HIV-1 DNA with continued time on treatment [23].However, the relationship between CD4+ differentiation status and the ages of proviruses persisting during ART is by no means clear [46].Further studies must address the phylogenetic influence of latently-infected cell decay.
As HIV-1 prevention and treatment efforts are scaled up globally, research efforts to reduce and/or eliminate the reservoir in pursuit of an HIV-1 cure are expanding as well.Towards this goal, it is critical to characterize the genetic diversity of the reservoir to assess the variants that HIV-1 eradication strategies must target.Our findings indicate that virus is archived throughout infection, and cure strategies should therefore address the genetic diversity of reservoir proviral quasispecies with many unique variants, including those dating back to the time of infection.

Human subjects
Zambian volunteers were enrolled as heterosexual couples in Couples Voluntary Counseling and Testing (CVCT), with HIV testing and counseling of both partners conducted upon  A-C) and participant Z634F (D-F).Maximum-likelihood trees of the env gene for pre-therapy variants (circles), including individual seroconversion variants (grey), one year sequences (blue), last pre-therapy sequences (red circles, plasma and open red diamonds, cells), and proviral variants (filled diamonds) in A and D. Two samples during treatment were assessed for both participants, with the first in purple, and subsequent in black.Trees were rooted to optimize the correlation between root-to-tip distance and sampling time for all pre-therapy plasma variants.The linear model relating root-to-tip distances to sampling time is shown in the dashed lines of figures B and E, with the pre-therapy variants denoted as colored dots, and the phylogenetic relationships between them denoted as faint grey lines.Proviral variants from samples collected during treatment are shown in filled diamonds in the same manner.The estimated integration dates of the proviral variants and 95% confidence intervals are shown in the plots C and F. enrollment.Follow-up HIV testing was conducted approximately every three months for the negative partners of serodiscordant couples, and blood samples were collected from both partners in the event of a positive test as a component of the Zambia-Emory HIV Research Project (ZEHRP).Dates for ART initiation were self-reported, and clinically-based estimated dates of infection (EDIs) were calculated with the appropriate formula of the three following: 1. midpoint of dates for the last antibody negative and first antibody positive test; 2. Fourteen days prior to the first p24 antigen positive, antibody negative test; 3.Ten days prior to the first viral load >1600 copies/mL, antibody negative test.All participants had antibody positive tests for their first HIV+ test, with the exception of participants Z1123M, Z1047M, and Z1808F.EDIs for participants Z1047M and Z1808F were estimated as in formula 2 above, and Z1123M EDI was calculated according to formula 3.For participant Z1047M, plasma from the first HIV + test was not available for HIV sequence amplification, and therefore a sample from 10 days

PLOS PATHOGENS
Archiving of HIV-1 reservoir variants throughout infection later on 24 Aug 2007 was used as the seroconversion sample.All other seroconversion samples were collected the day of the first HIV+ test.
with the Wizard SV Gel and PCR Clean-Up System (Promega) according to manufacturere's instructions, eluting in H 2 O.

Next-generation sequencing
All sequencing was performed with Pacific Biosciences SMRTbell sequencing on the RS II, with individual DNA libraries run on a single SMRT cell.Libraries were generated with 30-60 NFLG amplicons combined at eqiumolar concentrations and identified by nucleotide barcode following reamplification of first round PCR products with barcoded primers, or by barcoded adapter from the SMRTbell Barcoded Adapter Complete Prep Kit-96 (Pacific Biosciences).Libraries for vpu, env, and nef amplicons were generated with 80-100 amplicons per library and identified with barcoded adapters.Libraries were made according to manufacturer's instruction, followed by appropriate size selection with the BluePippin (Sage Science).We are greatly appreciative of library size selection and quality control, as well as sample run of libraries on the RS II performed at the University of Delaware Sequencing and Genotyping Center.
Amplicon reads generated from the SMRT sequencing were analyzed with a unique algorithm to perform read phasing and error correction in generation of final sequences [33].Libraries generated using Pacific Biosciences barcoded adapters were first analyzed with Pac-Bio SMRT analysis software PB Barcode to separate reads by barcoded adapter prior to additional read phasing and error correction with the algorithm described in Dilernia et al. [33].

Phylogenetic trees and reservoir variant dating
Maximum-likelihood phylogenetic trees for complete amplicons lacking frameshifting INDELs, APOBEC hypermutation, or other deleterious mutations were made with the PhyML plugin [48] of Geneious software v9.0.4 [49] using a general time reversible model with six nucleotide substitution categories and gamma distribution parameter with 100 bootstraps.APOBEC hypermutants were first removed from analysis with the LANL Hypermut v2.0 tool [50] with the appropriate transmitted/founder (TF) virus as the reference sequence, and all sequences of p<0.05 considered hypermutated.Trees were rooted on the appropriate TF virus sequence trimmed to the vpu, env, and nef gene amplicon.Patristic distances from the TF virus were extracted from the distance matrix.Trees were edited for visualization with MEGA v7.0.26 [51].Statistics for patristic distance from phylogenetic trees were performed using Prism v8.3.0.Figures were made using For all 12 participants with single-variant infections, we estimated the root date of their within-host plasma HIV-1 RNA sequences using established Bayesian methods.Briefly, within-host pre-therapy plasma HIV-1 env sequences were first screened for hypermutation (using Hypermut v2.0) and recombination (RDP v4.95 [52]) and any hypermutated or withinhost recombinant sequences were removed.Sequences with ambiguous bases were also PLOS PATHOGENS excluded, and identical sequences discarded but for one sequence from the earliest time point at which it was sampled.We ran two parallel 100,000,000 length chains sampling every 10,000 states in the software package BEAST v1.10.4 [53] for each participant.Posterior distributions for the root date were estimated using the unlinked SRD06 substitution model [54], the uncorrelated relaxed lognormal clock models [55], and the coalescent GMRF Bayesian skyride tree model [56,57].After discarding 10-30% of the initial run as burn-in, the chains from parallel runs were combined with LogCombiner v2.5.2 [58] and analyzed in Tracer v1.7.1 [59] to ensure convergence and verify that effective sample size values were >200 for all parameters.
Proviral variant integration dates were estimated as previously described [32] for the 12 participants infected by a single transmitted/founder virus.Briefly, env genes were trimmed from seroconversion and all other pre-therapy variants, as well as proviral variants from samples collected during treatment.Any sequences demonstrating hypermutation, ambiguous bases, or recombination were excluded from analysis, and only the earliest variant of duplicate sequences was kept for analysis.Maximum-likelihood trees were generated with RAxML v8.2.12 [60] and trees were rooted with root-to-tip regression (RTT) using the R package ape v5.3 [61] to maximize the correlation between the divergence from root and the sample collection date of the pre-therapy sequences.The pre-therapy variants were used to train a linear model of the divergence from root and the sample collection date.Finally, the integration dates date and confidence intervals of the proviral variants were estimated from this model.distance from the TF virus or root is expressed as a proportion of the greatest patristic distance or branch length in a given participant's maximum-likelihood tree.Means are shown in horizontal black bars.The proportional or scaled distances of sequences during treatment are significantly lower than sequences from either the cells or plasma at the last ART-naïve time point (Mann-Whitney tests).
, Zambia; Z = Lusaka, Zambia; M = Male; F = Female; SC = seroconversion a See materials and methods for calculation of EDI https://doi.org/10.1371/journal.ppat.1008378.t001PLOSPATHOGENSTF virus by patristic (root-to-tip) distances increases significantly from one year post-infection to the last ART-naïve time point, yet there is a significant decrease in the patristic distances of proviral sequences sampled during treatment compared to those sampled at the last ARTnaïve time point (Fig4B).This decrease does not appear to be an artifact of analyzing sequences from cells versus plasma alone, as ART sequences are significantly closer to the TF virus than sequences from cells at the last ART-naïve time point (S3 Fig).Sequences with minimal divergence from the TF virus are persisting in the reservoir, as can be seen for participants Z1123M and Z326M (Fig 3Aand 3C), where sequences identical to the TF were recovered in the reservoir, having persisted for over two years and six years of ART-naïve infection, respectively.

Fig 1 .
Fig 1. Sampling strategy for study participants.Viral RNA was reverse transcribed, amplified, and sequenced from pre-therapy plasma samples, while proviral DNA was amplified from cells collected during treatment.For six individuals (IDs Z1094F, Z1123M, Z1788F, Z1047M, Z1165M, and Z1658F), we additionally amplified and sequenced proviral DNA from cells at the last pre-therapy time point (red) to assess divergence from the transmitted/founder virus of contemporaneous sequences from plasma vs. cells.EDI = Estimated Date of Infection.https://doi.org/10.1371/journal.ppat.1008378.g001

Fig 2 .
Fig 2. Bayesian root date estimation.Bayesian inference was used to estimate the root date, or time of infection, from pre-therapy plasma env sequences.The 95% highest posterior density intervals surround the mean estimated root dates of the Bayesian inference (circles).The clinically estimated infection dates (EDIs) are depicted as stars.https://doi.org/10.1371/journal.ppat.1008378.g002 Fig 5A and 5D, S5 Fig).The pre-ART plasma variants were then used to train a linear model that related their divergence from the root and their sample collection dates.The linear model was used to infer the integration date and 95% confidence interval of the age of each proviral variant based upon its divergence from the root (Fig 5B, 5C, 5E and 5F, S5 Fig).

Fig 6 )
. Early variants were observed throughout the repeated sampling during ART, as proviral variants classified as seroconversion variants were in both the first and second time point during treatment for participant N133M (Fig 6C).For participant Z1165M, a single proviral sequence most closely related to seroconversion sequences was observed in the second time point during treatment (Fig 6D), and in participant Z1788F, seroconversion and early infection variants differing from the TF virus by up to approximately 30 nucleotides were found during the first and second time points following treatment (Fig 6A).Overall, however, proviral sequences from both time points during treatment were intermingled with each other and the sequences sampled prior to treatment.

Fig 3 .Fig 4 .
Fig 3. Maximum-likelihood (ML) trees for viral and proviral variants in four individuals.Representative ML phylogenetic trees for participants Z1123M (A), Z1124F (B), Z326M (C), and Z1047M (D) rooted on the respective transmitted/founder (TF) virus (grey) identified from the seroconversion sample and depicting all viral variants from one year post-infection (blue), the last ART-naïve sample (red), and during treatment (purple diamonds).Variants from cells collected at the last ART-naïve time point are shown in open red diamonds, while all plasma variants are in filled circles.https://doi.org/10.1371/journal.ppat.1008378.g003

Fig 5 .
Fig 5. Regression-based inference of time of provirus integration.Representative figures for participant Z1165M (A-C) and participant Z634F (D-F).Maximum-likelihood trees of the env gene for pre-therapy variants (circles), including individual seroconversion variants (grey), one year sequences (blue), last pre-therapy sequences (red circles, plasma and open red diamonds, cells), and proviral variants (filled diamonds) in A and D. Two samples during treatment were assessed for both participants, with the first in purple, and subsequent in black.Trees were rooted to optimize the correlation between root-to-tip distance and sampling time for all pre-therapy plasma variants.The linear model relating root-to-tip distances to sampling time is shown in the dashed lines of figures B and E, with the pre-therapy variants denoted as colored dots, and the phylogenetic relationships between them denoted as faint grey lines.Proviral variants from samples collected during treatment are shown in filled diamonds in the same manner.The estimated integration dates of the proviral variants and 95% confidence intervals are shown in the plots C and F. Figures for additional individuals are in S5 Fig.
Fig 5. Regression-based inference of time of provirus integration.Representative figures for participant Z1165M (A-C) and participant Z634F (D-F).Maximum-likelihood trees of the env gene for pre-therapy variants (circles), including individual seroconversion variants (grey), one year sequences (blue), last pre-therapy sequences (red circles, plasma and open red diamonds, cells), and proviral variants (filled diamonds) in A and D. Two samples during treatment were assessed for both participants, with the first in purple, and subsequent in black.Trees were rooted to optimize the correlation between root-to-tip distance and sampling time for all pre-therapy plasma variants.The linear model relating root-to-tip distances to sampling time is shown in the dashed lines of figures B and E, with the pre-therapy variants denoted as colored dots, and the phylogenetic relationships between them denoted as faint grey lines.Proviral variants from samples collected during treatment are shown in filled diamonds in the same manner.The estimated integration dates of the proviral variants and 95% confidence intervals are shown in the plots C and F. Figures for additional individuals are in S5 Fig. https://doi.org/10.1371/journal.ppat.1008378.g005

Fig 6 .Fig 7 .
Fig 6.Maximum-likelihood (ML) trees of all variants for participants with two samples during treatment.ML phylogenetic trees for all four participants: Z1788F (A), Z634F (B), N133M (C), and Z1165M (D) rooted on the respective transmitted/founder virus (grey) and depicting all viral variants from one year post-infection (blue), the last ART-naïve sample (red), and during treatment (purple and black diamonds, with second sample in black).Sequences from cells collected at the last ART-naïve time point are in open red diamonds, while plasma sequences are depicted in circles.https://doi.org/10.1371/journal.ppat.1008378.g006 Prism or, for Fig 1, JMP Pro 14 v14.2.
of reservoir variants for each participant.Where sequences of the given era were not present and the percentage of the reservoir proviral population was therefore zero, the classification is omitted from the pie chart.(PDF) S5 Fig. Proviral variant integration date estimates for each participant.All trees, linear models, and variant integration date estimates not shown in Fig 5 are provided here.(PDF)