A spatio-temporal assessment of simian/human immunodeficiency virus (SHIV) evolution reveals a highly dynamic process within the host

The process by which drug-resistant HIV-1 arises and spreads spatially within an infected individual is poorly understood. Studies have found variable results relating how HIV-1 in the blood differs from virus sampled in tissues, offering conflicting findings about whether HIV-1 throughout the body is homogeneously distributed. However, most of these studies sample only two compartments and few have data from multiple time points. To directly measure how drug resistance spreads within a host and to assess how spatial structure impacts its emergence, we examined serial sequences from four macaques infected with RT-SHIVmne027, a simian immunodeficiency virus encoding HIV-1 reverse transcriptase (RT), and treated with RT inhibitors. Both viral DNA and RNA (vDNA and vRNA) were isolated from the blood (including plasma and peripheral blood mononuclear cells), lymph nodes, gut, and vagina at a median of four time points and RT was characterized via single-genome sequencing. The resulting sequences reveal a dynamic system in which vRNA rapidly acquires drug resistance concomitantly across compartments through multiple independent mutations. Fast migration results in the same viral genotypes present across compartments, but not so fast as to equilibrate their frequencies immediately. The blood and lymph nodes were found to be compartmentalized rarely, while both the blood and lymph node were more frequently different from mucosal tissues. This study suggests that even oft-sampled blood does not fully capture the viral dynamics in other parts of the body, especially the gut where vRNA turnover was faster than the plasma and vDNA retained fewer wild-type viruses than other sampled compartments. Our findings of transient compartmentalization across multiple tissues may help explain the varied results of previous compartmentalization studies in HIV-1.


Introduction
The development of drug resistance in human immunodeficiency virus type 1 (HIV-1) to antiretroviral therapies represents a major public health obstacle. The prevalence of drug resistance in many regions of the world is now above 5% and increasing, driven in part by drug resistance transmitted from one infected individual to another [1]. Transmitted drug resistance has been observed at frequencies between 2.8% and 11.5% in different regions of the world, rendering standard first line therapies ineffective and necessitating costlier therapies for treatment-naïve patients. Despite the increasing prevalence of drug resistance, we still have an incomplete understanding of how drug resistance emerges and spreads spatially within an individual.
A comprehensive understanding of the drug resistance spatial landscape, such as when and where drug resistance arises and how it spreads within a patient, could inform administration practices of current therapy and the design of new antiretroviral therapies. Previous studies have suggested that HIV-1 drug resistance originates primarily in tissues with low or absent drug concentration (i.e., drug sanctuaries) [2][3][4]. If this holds true, then efforts should focus on improving drug penetration in these tissues.
Understanding how drug resistance spreads within the body could improve understanding of the ecology of the virus within a patient, which, in turn, could help determine the best way to sample the viral population, monitor transmission chains, and design eradication strategies. Although nearly all sampling of HIV-1 is done in the blood plasma, it is unclear how well the plasma represents the broader HIV-1 population. Differences between viruses from plasma and tissues could drive misdiagnoses of pre-existing drug-resistant variants, resulting in incorrectly applied therapies and selection for further drug resistance. If virus in the genital tract forms a different population than virus circulating in the blood, this should be taken into account when sampling in order to allow better reconstruction of transmission chains and epidemiological tracking of transmitted drug resistance. Finally, understanding HIV-1 genetic variation across the body might contribute to our understanding of viral reservoirs and the potential for virus eradication.
It has long been recognized that it is important to understand the relationship between virus in the blood and different tissues and many studies have examined this. However, the results are equivocal and some studies find evidence for compartmentalization, whereas others do not (see Table 1 for a summary of some of the major studies which examine the same tissues as this study). Nearly all studies that compared compartments across time found that compartments were only different transiently [5][6][7][8][9][10][11][12]. Many of these studies with multiple time points examined the relationship among different T cell subsets in the blood [7-9, 13, 14], but there is recent interest in examining temporal data in tissues as well [5,6,10,12,15]. Many studies that have been conducted focused on the comparison between blood and only one type of tissue (though see [10,12,[16][17][18]), and thus cannot give us information on the relationships between several compartments. Taken together, previous studies suggest the presence of a complicated spatial landscape on which viral evolution occurs, and much still needs to be done to connect not only how other compartments relate to the blood plasma, but also how they relate to each other.
To increase our understanding of how drug resistance spreads through space and how different compartments in a host are connected to each other, we obtained comprehensive single-genome sequence (SGS) data from four different compartments and the blood, at four (median) time points, in four macaques infected with a simian immunodeficiency virus containing HIV-1 reverse transcriptase (RT-SHIV) [38,39]. The macaques were treated with monotherapy to induce the emergence of two of the most prevalent drug resistance mutations (DRMs) in HIV-1 reverse transcriptase (RT): K103N and M184V) [1]. We sampled viral RNA (vRNA) from plasma and four different compartments (peripheral blood mononuclear cells, lymph node, vagina, and gut) before, during, and after the detection of drug-resistant variants in the blood. Like HIV-1, RT-SHIV is a lentivirus that produces a double-stranded DNA genome after infection of host cells, and we also sampled reverse transcribed viral DNA (vDNA) from the same four compartments.

Female Genital Tract
Overbaugh, et al, 1996 [22]; (approximately 84 generations). Two macaques were treated with the nucleoside analog (NRTI) emtricitabine (FTC; macaques T98133 and A99039) and two received the non-nucleoside RT inhibitor (NNRTI) efavirenz (EFV; macaques A99165 and A01198). Treatment was administered for 8 weeks, followed by a 6 week treatment interruption, and then combination therapy: 1) FTC, tenofovir (TFV), and EFV (macaques A99165 and A01198; which includes a recycled drug to which virus may be resistant) or tenofovir, L-870812, and DRV/r (macaques T98133 and A99039; to ensure complete virus suppression by all 3 drugs). RT-SHIV populations were sampled three times during monotherapy for each macaque (weeks 13, 15/16 and 20), once after treatment interruption (week 26), and once after 12-18 weeks of combination therapy (between weeks [38][39][40][41][42][43][44]. See the black boxes on the bottom of Fig 1 for a summary of sampling and treatments. Samples were taken at the different time points from the blood, LN, gut, and vagina. The first 770 nucleotides of reverse transcriptase (RT) were sequenced from vRNA and vDNA using SGS (S1 Table). We did not find that diversity changed significantly over time except in the vRNA of A01198 and the vDNA of A99165 (S1 Fig, S2 Table), and values of average pairwise diversity (π) ranged between 0.2 and 0.6%, similar to what is seen in HIV-1 at a comparable amount of time after infection [40].
Drug resistance emergence and persistence during and after monotherapy Overview of drug resistance emergence in plasma viral RNA over time. In all animals, after challenge with RT-SHIV, but prior to the start of monotherapy, plasma viremia increased and a set point was reached at approximately 4 weeks post-infection (10 5 −10 6 viral copies/mL plasma, Fig 1). After initiation of monotherapy, plasma viremia transiently decreased in all animals (ranging from 33-to 566-fold) before rebounding to near pre-therapy levels. In both FTC-treated macaques (T98133 and A99039), the expected DRMs M184V/I emerged between weeks 13-15 (1-3 weeks post-treatment), concomitantly with viral rebound and FTC concentrations in the plasma and tissues (S2A Fig). The rate at which viruses carrying DRMs displaced wild-type (WT) viruses suggests a selection coefficient for M184V/I of 0.74 in T98133 and 0.41 in A99039 (i.e., relative fitness benefits of 74% and 41%, respectively). In the EFV treated macaques however, the expected DRM (K103N) emerged but did not reach fixation in one macaque (A99165) and did not arise at all in the other macaque (A01198). In the animal in which K103N arose (A99165), plasma viremia initially declined but rebounded around week 15, coinciding with the spread of resistance. We suspect that the expected resistance mutation did not fix in A99165 and A01198 because selection pressure was weak due to the rapid metabolism of NNRTIs in macaques (S2B Fig), which we have observed previously [41,42]. Because EFV was not present at consistently high concentrations, we did not estimate a selection coefficient s for K103N.
After monotherapy treatment was interrupted, drug-resistant plasma vRNA persisted between weeks 20-26 in both FTC-treated macaques (T98133 and A99039) (Fig 1), which suggests that the cost of drug resistance in the absence of the drug is insufficient for it to notably decrease in frequency over 6 weeks. In the EFV-treated macaque A99165, drug resistance was not observed in the plasma at the end of the treatment interruption, but in this case, the decline in frequency had already started at week 20, likely due to a decrease in EFV concentrations in plasma and tissues. Drug resistance remained absent in A01198.
Combination therapy suppressed RT-SHIV in A99039, and no further drug resistance emerged. In this macaque M184V/I were undetectable in plasma samples taken during virologic suppression (week 38), which suggests that the cost of FTC resistance in the absence of drug is significant on this timescale or that the viruses that were sampled when viremia was suppressed stem from a reservoir that had always remained WT. A99165 and A01198 both began combination therapy that included FTC, but viral replication was not suppressed in either animal. Drug resistance to FTC emerged (in the form of M184I and M184V, respectively), but did not fix in either macaque, reaching frequencies of 4% and 27% in the plasma, respectively.
Drug resistance emerges simultaneously in viral RNA from multiple tissues. In a clinical setting, drug resistance is assessed using samples taken from the blood plasma, but how representative these samples are to levels of drug resistance in vRNA in PBMC and tissues is unclear. Here, we compared the kinetics and composition of drug-resistant variants across multiple anatomic compartments (blood, lymph node, gut, and vagina). Across the three monkeys in which drug resistance to monotherapy was observed, DRMs occurred at similar time points in the plasma and other compartments (Fig 2). In T98133, all compartments had drugresistant vRNA detected at approximately the same time point (week 15/16). A99039 had drug-resistant vRNA first in the plasma at week 15, and then only later in the gut (week 20) and the lymph node (week 27). Although we have no vRNA samples from the vagina in this animal, the vDNA from the vagina contained drug resistance at week 15, suggesting that M184I/V emerged early in that compartment. In A99165, drug resistance was observed first at low frequency in the lymph node at week 13 and then simultaneously in gut and plasma vRNA two weeks later.
Although drug resistance appeared at similar times across compartments, there was some evidence the drug resistance was maintained at different frequencies in different compartments. Among all paired vRNA samples from the same macaque at the same time point (±1 week) where DRMs were neither at 0% or 100% frequency in both compartments, drug resistance was at a lower frequency in the lymph node vRNA than the gut vRNA (5/6 comparisons), PBMC vRNA (5/6 comparisons), the plasma vRNA (5/8 comparisons) and vagina vRNA (4/5 comparisons). None of these comparisons were significant, but they suggest that drug-resistant viruses may not have replicated as rapidly in the lymph node as in other sampled compartments. This could be due to weaker selective pressure consistent with previous reports of poor drug penetrance to the lymph nodes [43,44]. We also observed that a greater proportion of vDNA from the gut contained DRMs than vDNA from other compartments (Fig 2). Among paired vDNA samples from the same macaque at the same time point (±1 week) where DRMs were neither at 0% or 100% frequency in both compartments, drug resistance was at a higher frequency in the gut vDNA than the LN vDNA (6/6 comparisons), PBMC vDNA (7/7 comparisons), and vagina vDNA (7/7 comparisons).  Not only did drug resistance emerge at similar times across different compartments, but identical viral genotypes were observed at similar times in different compartments. Figs 3, 4 and 5 show the frequencies of different viral genotypes throughout time in different compartments in each of the three monkeys that had drug resistance (T98133, A99165, and A99039, respectively). Each color represents a different genotype and the WT is depicted in grey. Genotypes with DRMs are hatched. This visualization clearly shows that genotypes are usually not restricted to single compartments, but they are observed in several compartments.
For each monkey, the most frequent drug-resistant genotypes are often observed across vRNA from all compartments. For example, in T98133, at week 15/16 after initial infection, vRNA sampled from the plasma, lymph node, gut, and vagina all had two different genotypes with DRM M184V (one genotype with only M184V in orange and one genotype with an additional synonymous substitution at the 255 th amino acid of RT, N255N, in light blue) (Fig 3A). Three of the four compartments also contained vRNA with DRM M184I. The similarity of vRNA sequences found in each compartment, particularly vRNA with synonymous mutations (i.e., M184V linked to N255N), suggests that migration was responsible for spreading viral genotypes from one compartment to another, as opposed to independent origins of each genotype in each compartment. The frequencies of genotypes remain similar between the compartments over time (Fig 3A), suggesting that substantial migration was ongoing. Similarly, in A99165 and A99039 (Figs 4A and 5A, respectively), the same drug-resistant genotypes (including linked mutations) are observed at similar time points in the vRNA in different tissue compartments.
DRMs develop on multiple viral genomes. If mutation generates DRMs fast enough (mutation rate multiplied by population size larger than 1), we would expect multiple independent origins of DRMs on multiple backgrounds that spread simultaneously (i.e., both present at intermediate frequencies before either reaches 100% frequency) in what is known as a soft sweep. Alternatively, if DRMs arise only rarely, a single DRM may increase to 100% frequency in the population due to selection before a second DRM arises on a different background in the population. This is called a hard sweep [45].
There are two methods we can use to determine how many times drug resistance developed on an individual viral genome in the macaques. First, we can determine whether different DRMs (distinguishable at either the amino acid or nucleotide level) are observed simultaneously in the same population. Second, by SGS we can determine whether the same DRM occurred on different genetic backgrounds. If drug resistance increases in frequency due to a hard sweep, we should see only one DRM on one genetic background spreading (although the mutational background can later diversify due to development of new mutations). If drug resistance increases in frequency due to a soft sweep, we may see several different DRMs or we may see only one DRM that is present on different genetic backgrounds.
We first observe that drug resistance occurs multiple times through mutations distinguishable from their encodings (i.e., the DRMs themselves have different sequences). In FTCtreated macaques, RT-SHIV developed both M184V and M184I, similar to what has been observed in HIV-infected individuals [46,47]. In both cases, M184V was more common than M184I: 91.5% and 97.0% of drug-resistant vRNA variants had M184V in T98133 and A99039, respectively (Figs 3 and 5), which is similar to what has been shown in HIV-infected individuals due to a fitness advantage of M184V over M184I [47]. The macaque treated with EFV (A99165) only acquired one DRM (K103N). However, K103N can be caused by mutations either from A to U or from A to C at the third residue of amino acid 103 and these can both be present in HIV-infected individuals [48]. Both of these mutations were observed within A99165 (Fig 4). In both of the above cases, the encoding of two different amino acid mutations (M184V/I) or the two different nucleotide-level encodings of the same amino acid (K103N) demonstrates that at least two independent mutations must have occurred within each of the three macaques with drug resistance. This shows that soft sweeps are occurring and shows that the mutation rate multiplied by the effective population size is larger than 1 in all three monkeys.
However, SGS allowed further resolution to determine the number of times each mutation arose. In T98133, DRM M184V was seen at high frequency both with and without a synonymous N255N mutation in the sampled vRNA (Fig 3). This suggests a second origin of drug resistance in which the M184V mutation occurred on a background that already had the synonymous N255N mutation. In principle, the N255N mutation may have arisen shortly after drug resistance began to spread, but given that the M184V+N255N lineage reaches a high frequency, it is more likely that M184V occurred on a genotype that already had the N255N mutation. Combined with the observation of M184I, at least three origins of resistance were likely in T98133. One genotype containing a DRM is observed in the vDNA but was not detected in the sampled vRNA (S3A Fig). In A99039, the DRM M184V occurred on three distinct backgrounds in sampled vRNA ( Fig 5): 1) with L187L+M245T, 2) with L205L+R277K, and 3) with K223K+L109L+V75L. As M184I was also observed at low frequency, at least four origins of resistance were likely in A99039. As in T98133, one additional background is also observed in the vDNA but not in the vRNA (S3C Fig). In A99165, the DRM K103N emerged on three distinguishable backgrounds in sampled vRNA (Fig 4): 1) with no other high frequency mutations, 2) with a synonymous L187L mutation, and 3) with four non-DRMs (K249K, L187L, L205L, Q174R). All three of those backgrounds were observed before detection of K103N. Further, both the nucleotide level encodings of K103N (A!U and A!C) were observed on the L187L background, suggesting at least four origins of resistance were likely in A99165. The vDNA displayed both nucleotide level encodings of K103N on three backgrounds each (S3B Fig). Even A01198, the macaque in which drug resistance to monotherapy did not arise, had DRMs appearing on multiple backgrounds at the final time point when examining the vRNA and vDNA phylogenies, with six unique backgrounds for DRMs visible in the vDNA (S4 Fig).
We used the occurrence of multiple DRMs on different backgrounds to estimate the effective population size (following [49]) at the point of initial treatment. The idea behind this approach is that the number of times a DRM occurs in a population depends on the size of the population. In a large population, we expect DRMs to occur more often than in smaller populations. We differentiated independent origins by examining drug-resistant viral sequences with different codons at a particular DRM (e.g., K103N encoded by AAU or AAC) and different identities by genotype (e.g., a DRM linked to another high frequency mutation), and calling the resulting estimates N e,DRM and N e,geno , respectively. We performed both these estimates for just vRNA and combined vRNA and vDNA (see Materials and Methods). We find estimates of N e among the vRNA between 1.17 Ã 10 4 and 5.71 Ã 10 5 depending on the macaque and which technique was used ( Table 2). vRNA and vDNA combined estimates are similar. These estimates were between two and three orders of magnitude higher than indicates the frequency of the sample with an exact set of mutations (i.e., genotype) at the time of sampling. Grey indicates WT virus, and striations on the background of a color indicate a drug-resistant genotype. Common mutations are labeled, and known DRMs are labeled in pink. For plasma/PBMC, plasma vRNA and PBMC vDNA are shown. Two sequences are considered the same genotype when all mutations found above frequency 1% in the macaque are shared. Sample sizes for at each sampling point are given above each vertical line. (C) Time-structured phylogenies reveal relationships between vRNA sequences (shown as leaf nodes). Branch lengths indicate sequence sampling time, and branches are colored to match (A) and (B). Sampling location is indicated for each sequence, and the identity of all mutations at frequency >1% is shown to the right. Colors indicating mutation type (synonymous, nonsynonymous, DRM, stop or nonsense/missense) are shown in the legend.
https://doi.org/10.1371/journal.ppat.1006358.g003 Two other mutations are notable in relation to the spread of drug resistance. T98133 had the polymorphic mutation V179I emerging on multiple backgrounds (Fig 3). This mutation can cause resistance to some NNRTIs, which T98133 did not receive. V179I appeared on most drug resistant backgrounds (M184V, M184V+N255N, and M184I), possibly indicating a selective benefit. Also notable was that M184V+N255N+V179I appeared first in the vaginal vRNA and appeared only later across other tissues. This is one of the few instances in this data where a simple migration direction is suggested. Similarly, A99165 had the mutation L74V rapidly reach high frequency across tissues (Fig 4), despite L74V conferring resistance to select NRTIs, which A99165 was not given until later time points. However, L74V has been shown to improve fitness of HIV-1 by increasing levels of RT in virions [50]. Both of these mutations support that the selective landscape within RT is composed of more than known DRMs.

Frequencies of common viral variants in viral RNA suggest compartments are not completely well-mixed
We have shown that the same genotypes containing DRMs appeared simultaneously across multiple compartments, suggesting a substantial amount of migration. However, these same genotypes are found at different frequencies in different compartments (e.g., the frequency of M184V+N255N across compartments in Fig 3). We asked whether the observed differences are due to real differences between the compartments or whether they are due to random sampling effects and fairly small sample sizes. In this context, we define two compartments to be compartmentalized if their vRNA genotype frequencies are significantly different when using a randomization test, and well-mixed otherwise.
To assess the extent of compartmentalization, we compared the frequency of vRNA genotypes in different compartments using a variance partitioning statistic, K ST [51] and two other tests: AMOVA [52] and the Slatkin-Maddison test [53]. Between each pair of compartments sampled at the same time point (±1 week) in each macaque, we repeatedly subsampled 10 sequences from each compartment without replacement and computed the test statistic. The significance of each test statistic was assessed via permutation test with larger values of K ST and AMOVA's φ and lower values of Slatkin-Maddison inferred migration events indicating evidence of more compartmentalization. The proportion of tests significant at the 5% level is reported. Therefore, the reported value is the probability of a significant test when drawing 10 sequences randomly. Note that we subsample compartments to the same number of sequences to equalize power across comparisons. The subsample size of 10 was chosen as a tradeoff between power to detect compartmentalization and the number of comparisons that can be computed given that some of the sample sizes are very small (see Materials and Methods).
Across all pairwise tests where each compartment had at least 10 sequences, the probability of rejecting the compartments as well-mixed using a K ST permutation test was 0.47 (Fig 6).
indicates the frequency of the sample with an exact set of mutations (i.e., genotype) at the time of sampling. Grey indicates WT virus, and striations on the background of a color indicate a drug-resistant genotype. Common mutations are labeled, and known DRMs are labeled in pink. For plasma/PBMC, plasma vRNA and PBMC vDNA are shown. Two sequences are considered the same genotype when all mutations found above frequency 1% in the macaque are shared. Sample sizes for at each sampling point are given above each vertical line. (C) Time-structured phylogenies reveal relationships between vRNA sequences (shown as leaf nodes). Branch lengths indicate sequence sampling time, and branches are colored to match (A) and (B). Sampling location is indicated for each sequence, and the identity of all mutations at frequency >1% is shown to the right. Colors indicating mutation type (synonymous, nonsynonymous, DRM, stop or nonsense/missense) are shown in the legend. Note, except for WT, the coloration is not preserved between figures (i.e., the yellow in Fig 3 does not represent the same genotype as the yellow in Fig 4). Rx1 is treatment FTC+TFV+EFV. https://doi.org/10.1371/journal.ppat.1006358.g004 Spatio-temporal dynamism of intra-macaque SHIV evolution This result shows that although DRMs appear to spread quickly from one compartment to another, migration is not pervasive enough to make the populations entirely well-mixed.
We find that compartmentalization is dynamic through time. While some pairs of compartments in some macaques have a consistently high probability of observing compartmentalization (e.g., A99165 gut and lymph node vRNA) or a consistently low one (e.g., A01198, plasma and PBMC vRNA), most often, compartmentalization is transient through time (Fig 6). The Slatkin-Maddison test and AMOVA produced similar results (S5 Fig and S6 Fig, respectively).
We also characterized which vRNA samples appeared most similar to each other throughout time (Fig 7). The blood (plasma and PBMC) and lymphoid systems appear well connected with few rejections of being well-mixed in pairwise comparisons. The most compartmentalized comparisons were between the lymph nodes and mucosal tissues (gut and vagina) or between mucosal tissues themselves. Similar results hold using the Slatkin-Maddison test and AMOVA (Fig 7). Biologically, these observations are in line with expectations about system connectivity.

The differences and similarities between vRNA and vDNA
We have characterized the dynamics of the vRNA in different compartments, but this may or may not be representative of the viral reservoirs. We therefore also sampled vDNA in all compartments, which allows us to compare vDNA and vRNA from the same location. There were marked differences in genetic composition between vRNA and vDNA. Most notably, the vDNA retained a higher percentage of WT viruses than the vRNA in T98133 and A99165 (Figs 3, 4 and 5). In A99039, very little WT vRNA or vDNA was present, even at the initial time point, and the population was dominated by viruses with synonymous variants (L187L, K223K, L205L and combinations thereof). However, there were many fewer DRMs in the vDNA samples than in the vRNA samples, suggesting that, like in the other monkeys, the vDNA is changing more slowly than the vRNA (Fig 5).
To more directly test whether the rate of change was higher in vRNA than vDNA, we computed K ST between consecutive time points separately from both the vRNA and vDNA from indicates the frequency of the sample with an exact set of mutations (i.e., genotype) at the time of sampling. Grey indicates WT virus, and striations on the background of a color indicate a drug-resistant genotype. Common mutations are labeled, and known DRMs are labeled in pink. For plasma/PBMC, plasma vRNA and PBMC vDNA are shown. Two sequences are considered the same genotype when all mutations found above frequency 1% in the macaque are shared. Sample sizes for at each sampling point are given above each vertical line. (C) Time-structured phylogenies reveal relationships between vRNA sequences (shown as leaf nodes). Branch lengths indicate sequence sampling time, and branches are colored to match (A) and (B). Sampling location is indicated for each sequence, and the identity of all mutations at frequency >1% is shown to the right. Colors indicating mutation type (synonymous, nonsynonymous, DRM, stop or nonsense/missense) are shown in the legend. Note, except for WT, the coloration is not preserved between figures. Rx2 is the treatment TFV+L870812+DRV/r. Spatio-temporal dynamism of intra-macaque SHIV evolution the same compartment of each macaque (Fig 8). All pairs of adjacent time points were treated equivalently, even though they were separated by different amounts of time. However, because the average difference in weeks between time points was not significantly different between vRNA and vDNA (7.39 weeks versus 6.53 weeks respectively, t-test p-value = 0.50), we could still compare the number of significantly compartmentalized comparisons as we did in Fig 7. As in the pairwise compartment analysis, we subsampled comparisons to 10 sequences per time point 1000 times and report the proportion of those tests resulting in a permutation test p-value 0.05. We find that the probability of a significant test between consecutive vRNA time points from the same compartment (indicating a different viral composition) was 0.55 as compared to 0.25 between consecutive vDNA comparisons. A t-test between the relative probabilities provides evidence that vDNA changes more slowly than vRNA (p = 6 Ã 10 −4 ). This approach also revealed that vRNA from successive time points in the gut was more often compartmentalized than those from the plasma (0.85 probability of significant test compared to 0.37, p = 2.92 Ã 10 −3 ). We then characterized the relationship between the sampled vRNA and vDNA variants by examining viral variants found in the vRNA but not the vDNA and vice versa. We compared all time points for which there were vDNA and vRNA samples (n ! 3 for both vRNA and vDNA) from a single compartment in a macaque. The proportion of vRNA observed without corresponding vDNA was 25% on average across macaques (S7 Fig), suggesting a large portion of the vRNA was transcribed from vDNA at a frequency below the limit of sampling (< %1/30). A similar proportion of vDNA is sampled without any concurrent vRNA. This may mean that the provirus is latent (i.e., not currently producing vRNA) or replication incompetent.
In three of the four macaques (T98133, A99165, A99039) vDNA became less representative of the vRNA over time (i.e., a larger proportion of vDNA was sampled without vRNA) (S7 Fig), although this effect appeared to plateau in A99165 and A99039. This is consistent with the vDNA building up a "fossil record" or "seed bank" of proviruses that are not transcribed at a given time but that may later be reactivated.
We also examined the extent of G to A hypermutation and premature stop codons found in vRNA versus vDNA sequences. We found that vDNA sequences were approximately five times more likely to have >2% of available G residues mutated to A when compared to vRNA sequences (Fisher's Exact Test odds-ratio: 4.79, p(true OR = 1) < 9.066 Ã 10 −11 ). VDNA sequences were approximately three times more likely to have a premature stop codon as vRNA sequences (Fisher's Exact Test OR: 2.98, p(true OR = 1) < 1.68 Ã 10 −5 ). As expected, this suggests a considerably higher level of selective constraint on the vRNA as compared to the vDNA.

Discussion
In this paper, we present high-resolution sequencing data from a SIV-HIV chimeric virus (RT-SHIV) sampled across space and time in infected macaques. Although our data is sampled from RT-SHIV-infected macaques, this system corresponds well to HIV-1 in humans in many ways. First, when untreated, RT-SHIV viremia in macaques plateau at a similar set point as HIV-1 in many humans, suggesting that census population sizes are similar between the two systems. Second, RT-SHIV reaches levels of genetic diversity that are similar to levels in HIV-1, as quantified by average pairwise diversity [40]. Third, drug resistance arises via the same DRMs that are common in HIV-1 (M184V/I and K103N) and, fourth, like in HIV-1, the mutations spread via soft sweeps after occurring independently on many backgrounds [49,54]. These observations suggest that an intra-host RT-SHIV population closely mimics that of an intra-patient HIV-1 population undergoing monotherapy, which may occur during drug nonadherence.
Similarities between RT-SHIV in macaques and HIV-1 in humans make RT-SHIV a good model system for HIV-1. In addition, it is possible to sample RT-SHIV much more intensively, including from multiple, longitudinal tissue biopsies, than HIV-1. Thanks to viral RNA and viral DNA single-genome sequences from different time points and different compartments, we have detailed information about evolution in time and space which allows us to make more detailed observations about the dynamics of the system. We have data from 4-5 sampled time points from blood, gut, lymph node, and vagina (both vRNA and vDNA) from four infected macaques. This level of detail allows us to explore several open questions concerning intrahost spatial dynamics and drug resistance. Spatio-temporal dynamism of intra-macaque SHIV evolution First, we can ask how well virus from the blood plasma represents the remainder of the intra-host population. This is important because often only the blood is sampled in HIVinfected individuals. We find that while the blood plasma often has the same viral variants as other compartments, it is not necessarily representative of the dynamics occurring in other compartments. For example, the virus population in the gut changes faster than that in the plasma, both among the vRNA and the vDNA. This may be because the CD4+ T cell composition of the gut is different than that of the blood. Specifically, the gut has a higher proportion of effector memory cells CD4+ T cells (T EM ) than the blood or lymphoid tissues [55,56]. Infection with a CCR5-tropic SIV, including the SIV mne027 backbone used in this study [57], leads to significant loss of CD4+ T EM cells in mucosal tissues, such as the gut [58,59]. These CD4 + T EM cells are also shorter-lived than the CD4+ central memory T cells (T CM ) found at higher frequencies in the plasma and lymphoid tissues [58,60]. This may explain the faster rate of change in the expressed virus (vRNA) and in virally infected cells (vDNA) in the gut. Our data also suggest that drug-resistant genotypes may spread more slowly among the lymph node vRNA than the vRNA from other compartments. Previous studies have found that antiretroviral drug concentrations are lower in the lymph node than other tissues [43,44]. Although our sample sizes are underpowered to test for transient DRM frequency differences and we were unable to measure drug concentrations from the lymph node, these results suggest that unequal drug penetration may be another source of differing dynamics among compartments. This is in contrast to a rhesus macaque study showing higher frequencies of LN vDNA with cytotoxic T cell epitope escape mutants compared to PBMC vDNA, as LNs are main sites of both virus replication and cellular immune pressure [12]. Consistent with our results, their data show mutation frequency is correlated with selective pressure. More broadly, these findings suggest that while surveying the plasma may accurately reveal the types of mutations present in the HIV-1 quasispecies, it does not necessarily reflect the frequency or dynamics of these mutations in other compartments.
Second, we sampled RNA and DNA, which allows us to observe not just the part of the viral population that is actively transcribing but also the part of the viral population that is not expressed and may be latent (DNA not represented in RNA). This is of interest because it is not well known how the viral reservoir changes in patients undergoing monotherapy. Previous studies of HIV-1 or RT-SHIV populations during suppressive combination antiretroviral therapy (ART) have shown that the reservoir changes very little, suggestive of little or no ongoing replication during ART [18,[61][62][63]. While studies of actively replicating RT-SHIV populations have suggested that the reservoir is changing through examining changes in vRNA [18,64,65], this study directly measures changes in vDNA over time. We find that while new vDNA is produced, the viral composition of the reservoir still remains relatively stable compared to the vRNA. While the majority of vRNA from different compartments of the macaques can contain DRMs, only a small proportion of the vDNA acquires drug resistance, and the majority remains WT or genotypes that were present before treatment. This observation suggests that only a small proportion of the vDNA is actively producing virus, and the majority is defective or latent, even during active replication.
Third, our detailed sampling over space and time helps explain results from existing studies on compartmentalization in HIV-1. As summarized in Table 1, most studies that characterize within-patient population structure within the tissues sampled in this study find that evidence for compartmentalization is found for some comparisons, and not for others. In addition, those studies that sample at different points in time often observe mixed results over time [5][6][7]12]. Similarly, our data also show that compartmentalization is a transient occurrence in the blood and tissues we sampled, that arises as the population adapts and declines through frequent and fast migration. This transiency may contribute to the equivocal findings across HIV-1 compartmentalization studies. In fact, studies that found either ample support for compartmentalization or no evidence for compartmentalization may have found the opposite result had they sampled at a different time.
Fourth, because our study sampled both across time and from multiple compartments, we were able to determine when and where compartmentalization is most likely. We find that particular compartments, specifically the blood and the lymph nodes, usually harbor the same mutant frequencies, likely because of high rates of migration of infected cells between those compartments. We rarely find evidence for compartmentalization between those compartments. On the other hand, some compartments, specifically the blood and the mucosal tissues, harbor significantly different mutant frequencies, likely because of limited migration of cells between those compartments, such as CD4+ T EM and tissue resident T cells. We usually observe evidence for compartmentalization between those compartments.
In addition to answering these open questions, our data are well suited to provide insight into population genetic parameters of viral populations. Knowing these parameters is important in order to use existing theory to make predictions about HIV-1 evolution, which could help improve antiretroviral therapies. For example, because most theory assumes that beneficial mutations are rare, it is important to know if this holds true for HIV-1, including for the emergence and dissemination of drug resistance. Next to the rate at which mutations occur in the population, the migration rate and selection strength are important parameters in population genetic models. Together, these parameters control the dynamics of mutation frequencies over space and time.
In the RT-SHIV sequence data, we observe that both migration and selection are powerful drivers of allele frequency changes. Selection causes drug-resistant variants to reach high frequencies in the course of just weeks, and migration causes these same variants to be present throughout the body. However, it is interesting to note that neither force overwhelms the other. If selection was much stronger than migration, beneficial variants should reach high frequencies in their originating compartment before migrating to other compartments, but we observe the same beneficial variants in all compartments nearly simultaneously. On the other hand, if migration was a much stronger force than selection, we would expect to see very similar allele frequencies in all compartments. Instead we observe transient but significant compartmentalization, particularly after the onset of a selective pressure. This suggests that while there are instances where selection or migration may be acting more quickly, neither force is much stronger than the other. Since we can estimate the selection coefficient of certain mutations to be around 0.5 (0.41-0.7), migration must be of the same order of magnitude. Taken together, the balanced forces of strong selection and rapid migration create an incredibly dynamic system.
The number of beneficial mutations that arise per generation in the population is an important driver of how fast a population will evolve. This rate (the population mutation rate or θ) is the product of the number of genomes that can mutate (i.e., the population size) and the beneficial mutation rate. If the population mutation rate is much less than 1 mutation/generation, a long period of time will be required for a beneficial mutation to occur. If the population mutation rate is much larger than 1, mutations will happen often and we will observe multiple beneficial mutations spreading in the population before any mutation reaches 100% frequency (i.e., sweeps will be soft). Since the mutation rate is fixed, observing the number of adaptive mutations can inform our estimates of the population size.
For HIV-1, estimates for the census population size are very large (10 7 -10 8 productively infected cells [66,67]) Given the mutation rate of HIV-1, this would imply a large enough population mutation rate for beneficial mutations to occur on hundreds of backgrounds, and sweeps should therefore be exceptionally soft. We do not observe this in our data, which implies that the census population size does not fully determine population dynamics. It is therefore useful to estimate an 'effective' population size (N e ) that better captures the number of replicating viruses over a certain timescale. One common approach to estimate N e is by using the amount of neutral diversity, which led to the first estimates of N e in HIV-1 to be < 10 3 [68]. Indeed, when we use population genetic diversity to estimate population size in our data, we also estimate values of N e on this order of magnitude. However, if N e was this small, the multiple beneficial mutations we observe spreading simultaneously in our data would be extremely unlikely. This shows that similar to the census population size, the diversity-based N e also does not explain the dynamics of drug resistance emergence. We believe that the diversity-based N e captures a long-term effective population size, which may be very small because it is strongly influenced by bottlenecks and selective sweeps, including the strong population bottleneck of initial infection. It may therefore greatly underestimate the size of the population that actually contributes to resistance evolution.
To estimate an N e that appropriately captures the population dynamics at the time of drug resistance emergence, we use the number of origins of resistance, following Pennings, et al. [49]. Given that we observe 3-4 different drug-resistant backgrounds per population, we estimate that effective population size is on the order of 10 5 , which is much lower than the census population size but higher than the diversity-based effective population size. The estimate is similar to the estimate of the HIV-1 effective population size [49], which further shows how RT-SHIV and HIV-1 have similar dynamics. Note that our estimate is a lower bound, given that we cannot distinguish when the same DRM occurs on the same background repeatedly, and so are likely undercounting the number of independent origins of drug resistance. In macaques like T98133, where there is little pre-existing variation before drug treatment, many origins may be missed. In macaques with much pre-existing variation, such as A99039, our estimates for the number of independent origins are likely to be more accurate, given that each drug-resistant genotype contains several unique non-DRMs.
Our findings of soft sweeps also support previous theoretical results about populations undergoing evolutionary rescue (i.e., populations decreasing in size that can only be 'rescued' by a beneficial mutation which make the growth rate positive). Wilson et al. predict that when there is a high probability that the population acquires an adaptive mutation before it becomes very small (i.e., rescue is likely), the adaptive mutation is likely to spread via a soft sweep [69]. In the RT-SHIV populations during drug pressure, all three macaques experienced rescue by development of DRMs and all three experienced soft sweeps.
As described above, we believe that the characterization of the RT-SHIV dynamics is relevant for HIV-1 populations, and specifically those HIV-1 populations treated with ineffective therapy, but this may not extend to suppressive ART. Treating HIV-1 with multiple drugs may cause viral populations to evolve with many fewer beneficial mutations. This can change the dynamics through which drug resistance spreads within a patient. An example of this was previously demonstrated, in which it was shown that more effective combinations of drugs decrease the number of origins of DRMs at treatment failure [54]. Further studies will be needed to determine if the level of dynamism that we observe here is also found in HIVinfected individuals treated with combinations of effective drugs.
Future studies have much to contribute to our growing understanding of intra-patient evolutionary dynamics. Single-genome sequencing provides full linked genotypes, which allow for the evolutionary origin of new mutations to be tracked. However, next generation sequencing could provide a greater depth of data with more sensitive allele frequency estimates, which in turn could be used to better differentiate population compositions across compartments. In addition, isolating and sequencing different types of CD4+ cells could reveal that different cell types form sub-compartments within those examined in this paper, as previous studies have suggested they might (Table 1). To determine the physical source of DRMs, samples would likely need to be taken on shorter intervals and at greater depth to reveal very low frequency variants. Finally, in order to be able to observe additional origins of drug resistance, it may be possible in the future to use barcoded initial viral strains [70].
Our study reveals a dynamic intra-patient evolutionary process in RT-SHIV, featuring abundant mutations, fast migration, and strong selection. As we strive for better care for HIVinfected individuals, a thorough understanding of this process will be essential. RT-SHIV is a powerful tool for quantifying how and where drug resistance emerges, and locating potential drug sanctuary sites, which may accelerate the evolution of resistance. Further, the ability to sample in space presents new opportunities to characterize the latent reservoir and ultimately develop strategies to find a cure.

Ethics statement
All animal-related work was conducted according to the Public Health Services (http://grants. nih.gov/grants/olaw/references/PHSPolicyLabAnimals.pdf). Washington National Primate Research Center (WaNPRC) is accredited by the Association for the Assessment and Accreditation of Laboratory Animal Care (AAALAC) International and registered as a USDA Class R research facility. WaNPRC is certified by the NIH Office of Laboratory Animal Welfare (OLAW A3464-01). All animal-related experiments were performed under protocol 2370-25, approved by the University of Washington Institutional Animal Care and Use Committee. Animals were under the care of a licensed veterinarian and all efforts were made to minimize animal pain and suffering, in accordance with the recommendations of the Weatherall report (http://www.acmedsci.ac.uk/viewFile/publicationDownloads/1165861003.pdf). Temperature in animal quarters is maintained at 75-78˚F and humidity and air quality are controlled. Animals were monitored by veterinary staff at least once daily. Prior to infection, animals were pair housed. Following infection, macaques were housed separately to prevent cross-transmission but were provided regular visual, aural and olfactory contact with other macaques. Waste trays and cages were cleaned and sanitized daily and biweekly, respectively. Animals were fed three meals daily of a commercial monkey chow, which was supplemented with fresh fruits and vegetables, and given access to water ad libitum. Environmental enrichment was provided by a biweekly rotation of toys, use of foraging boards, social housing where permitted and/or positive human interaction, including food treats, with veterinary and research staff. All procedures were conducted while the animals were sedated with intramuscular injection of Telazol (tiletamine/zolazepam, 3-6mg/kg) for minor surgeries (e.g. biopsies) or with Ketamine (10-15 mg/kg) for blood draws and other minor procedures. Animals that were diagnosed by the veterinarian to be experiencing more than momentary pain and distress were evaluated and treated with appropriate analgesic drugs as indicated. Animals scheduled for necropsy were euthanized by intravenous injection of pentobarbital (80 mg/kg) in the saphenous vein after sedation with 3 mg/kg Telazol.
Blood was obtained in EDTA-treated tubes weekly or bi-monthly for peripheral lymphocyte counts, peripheral blood mononuclear cell (PBMC) pellets, and plasma pelletable vRNA quantification, as previously described [71][72][73]. The lower limit for quantification was 30 vRNA copies/ml plasma. Tissue biopsies were also taken at multiple time points throughout the study. Superficial LNs (i.e. axillary, inguinal) were taken through a <1 inch incision in the skin and CD4+ T cells were isolated from LNs by magnetic separation (Miltenyi). Rectal (gut) pinch biopsies were taken as previously described [74]. Briefly, sterile 2.0 mm biopsy forceps collected biopsies via an 8.9 mm diameter video gastroscope. Vaginal pinch biopsies (not full thickness) were obtained with a rigid vagiscope using biopsy forceps for a maximum of 4-6 biopsies (approximately 2-3 mm diameter) per time point. Plasma, PBMC, LN CD4+ T cells, and mucosal biopsies were stored at -80 C until use for nucleic acid extraction or drug concentration measurements.

Viral DNA and RNA isolation and single-genome sequencing (SGS)
Viral RNA and viral DNA were isolated from frozen samples as previously described [75]. SGS was performed on vRNA and vDNA by limiting dilution of cDNA made from RNA or DNA, as previously described [18,39,62]. Briefly, cDNA produced from viral RNA or viral DNA was diluted to a single copy and an amplicon of the first 770bp of RT was generated and sequenced, with a median of 28 sequences per sample (among samples with sequences) (S1 Table). Sequences containing mixtures were discarded. Sequences that identified more than one genome were not included in the analysis. DRMs were determined from the IAS-USA 2015 Update of the Drug Resistance Mutations in HIV-1 [76]. DRMs listed for drugs not being used to treat macaques were not assessed as DRMs.

Antiretroviral drug concentrations
Drug concentrations were quantified by LC-MS/MS analysis using methods similar to those previously published [77,78]. Tissues were homogenized in 1mL of 70:30 acetonitrile-1mM ammonium phosphate (pH 7.4) with a Precellys 24 tissue homogenizer (Bertin Technologies, Montigny-le-Bretonneux, France). Plasma underwent protein precipitation with an organic solution containing a stable-labeled internal standard. For all compounds, a Shimadzu highperformance liquid chromatography system was used for separation, and an AB SCIEX API 5000 mass spectrometer (AB SCIEX, Foster City, CA, USA) equipped with a turbo spray interface was used as the detector. Samples were analyzed with a set of calibration standards (0.02 to 20 ng/mL for tissue and 5-5,000ng/mL for plasma) and quality control (QC) samples. The precision and accuracy of the calibration standards and QC samples were within the acceptable range of 15%. Tissue density was assumed to be 1.06 g/cm3. All LC-MS/MS data underwent QC by a designated individual not directly involved in this study to ensure accuracy.

Muller plots
Muller plots show the change in frequency of different sampled genotypes over time separately by compartment and macaque. Time points were included in which at least five sequences of vRNA or vDNA were available in a compartment within a macaque (black vertical lines).
Time points with fewer than five sequences of vRNA or vDNA in a single compartment were excluded from the plot. Genotypes were called using all positions with minor allele variants at frequency higher than 1% within each macaque (across all compartments and time points). Genotypes with at least one DRM are marked with black hatching.

Phylogenetic analysis
Phylogenetic analysis of base pairs 135-900 using BEAST [79] using a HKY substitution model [80], estimated base frequencies, a strict clock and no site heterogeneity model. Default priors were used as provided by BEAUti 1.8.3, with a uniform clock rate prior of [0, 1] and a starting frequency of 0.005. The chain length was 5 Ã 10 7 with parameters recorded every 2.5 Ã 10 4 states resulting in 2000 trees from which maximum clade credibility trees were derived using the option "keep target heights." This process was repeated separately for vRNA and vDNA from each macaque. The trees were plotted using ggtree [81] and annotated with substitutions as described in Materials and Methods: Muller Plots and the location from which the samples were taken (i.e., PBMC, plasma, vagina, gut and LN). Branches were colored to match Muller plot coloration and internal branches with no predicted state were plotted in black.

Assessing viral compartmentalization
Compartmentalization of viral genotypes and change in their frequencies over time is assessed using K ST [51,82]. K ST is computed as follows: where the total average pairwise diversity of the sample is given by x i x j d ij and the average pairwise diversity within each of k compartments is given by and x i,k is the frequency of genotype i in compartment k, x i = ∑ k x i,k is the frequency of genotype i across all compartments and δ ij is the fraction of sites at which i 6 ¼ j.
The significance of K ST was assessed via permutation test, in which compartmental identities were repeatedly reassigned without replacement. Labels were permuted 1,000 times, and significance was assessed with a one-sided test with larger values of K ST indicating stronger evidence for compartmentalization. Because the power of this test was found to vary when compartments were subsampled to different sample sizes (S8 Fig), we performed a subsampling procedure to make compartmentalization results more comparable across compartments. We used a sample size of 10, which allowed us to retain much of the data for analysis while still having power to detect signals of compartmentalization visible at larger sample sizes (S8 Fig). Each compartment was subsampled without replacement 1000 times, and these subsampled compartments were used to perform the compartmentalization tests. When compartments had fewer than 10 sequences at a given time point, they were excluded from the compartmentalization analysis. We report the proportion of these 1000 tests that result in a permutation test p-value 0.05. This can be interpreted as the probability of a pairwise significant test if 10 sequences are drawn randomly from each compartment. K ST permutation testing subsampled to 10 sequences was also used to assess compartmentalization between subsequent time points taken from the same compartment (following Achaz, et al. [83]).
For verification, compartmentalization tests were also performed using AMOVA [52] and the Slatkin-Maddison test [53]. AMOVA was performed in R pairwise on compartments subsampled to 10 sequences 1000 times using the package poppr [84] with 1,000 permutations and a quasi-euclidean transformation to convert distance matrices. AMOVA was found to have a low false positive rate and power at a sample size of 10 (S9 Fig).
The Slatkin-Maddison test was performed with 1,000 permutations as implemented in HyPhy [85] from trees constructed via UPGMA. For the Slatkin-Maddison test, we excluded duplicate sequences before performing the tests, following other papers [5,33,86]. When duplicate sequences were not removed, the test had a high false positive rate in assessing compartmentalization between two samples from the same compartment. Because compartments were required to have 10 sequences for comparison, not all tests performed with AMOVA and KST could be performed with the Slatkin-Maddison test (i.e., there were !10 sequences in a given compartment but <10 unique sequences). Because sample sizes of unique sequences were smaller than full sample sizes, only 100 sets of sequences of size 10 were used to perform the Slatkin-Maddison test. The resulting test was found to a have a low false positive rate and had power to detect compartmentalization in at least a subset of comparisons at a subsample size of 10 (S10 Fig).

Estimating selection coefficients for DRMs
Selection coefficients were estimated by assuming logistic growth of the drug resistant variants starting at the first treatment administration time point (week 12). This relies on the assumption that DRMs had reached establishment frequency (1/Ns) before treatment. We assumed generation times of 1/day, similar to HIV-1 and previously published [41] and effective population size (N e ) of 1.5 Ã 10 5 [49] and then logistic growth of the allele frequency x as follows: where t is measured in generations. We fit s values using two time points-absence (or near absence) of DRMs at week 13 and the DRM frequency at the next available time point (week 15 for T98133 and A99165 and week 16 for A99039). In cases where multiple DRMs were present (e.g., M184V and M184I in T98133), a single s value was fit for both DRMs.

Viral diversity measurements
Intra-compartment diversity was measured for each compartment using the average pairwise diversity (see the definition for π t in Assessing Viral Compartmentalization above). The relationship between diversity and time within each macaque was fit with a linear regression of intra-compartment diversity predicted by time (i.e., weeks after infection), separately for vDNA and vRNA. The difference between vRNA and vDNA was assessed by fitting two nested linear models of time on diversity-one in which an indicator variable discriminated between whether a diversity observation was from the vDNA or the vRNA, and one without an indicator variable. An ANOVA was performed to assess the significance of the indicator variable.

Estimating effective population size (N e ) of RT-SHIV
We used both nucleotide level diversity and genotype structure to estimate the effective population size of the intra-macaque RT-SHIV population. Pairwise nucleotide diversity relates to effective population size N e via the formula: 2N e m ¼ p t as described in Assessing Viral Compartmentalization above. Following Leigh-Brown [68], we estimated N e using vRNA sequences taken from the blood plasma. It is well known that the estimate of N e is much lower than the census population size, not only in HIV-1 but also in humans and fruit flies.
Aware of the problems with using neutral diversity to estimate N e , Pennings et al. identified an alternative approach that should lead to an N e estimate that is much closer to the short term N e that is relevant for drug resistance evolution that can be estimated directly from the nucleotide level encodings of DRMs [49]. Briefly, the diversity of different DRM variants i through j can be summarized by the complement of their summed squared frequencies (or heterozygosity): N e can then be estimated from the heterozygosity: We calculate the heterozygosity using two different definitions of DRM variants: 1) DRM variants that are different by state (e.g., AAU versus AAC for K103N) and 2) DRM variants that can be distinguished by linkage to a common segregating variant (non-DRM mutation at a frequency of at least 10%). We label the resulting estimates of the effective population size N e,DRM and N e,geno respectively. Both N e,DRM and N e,geno are computed among the vRNA variants only and among all sampled variants (vRNA and vDNA) at the first time the vRNA is over 90% drug-resistant, or maximally drug-resistant if 90% drug resistance is never achieved. These times for T98133, A99165 and A99039 are weeks 15/16, week 15 and weeks 20/21, respectively.
All estimates rely on the HIV-1 mutation rate of μ = 1.4 Ã 10 −5 for transitions (such as M184V) and 2 Ã 10 −6 for transversions (both forms of K103N) [87]. The average pairwise diversity (π) increases over time in both the vRNA (top row) and vDNA (bottom row) in each macaque (shown in the columns). Each colored line marks a different compartment with the diversity of all compartments combined shown in black. The asterisk marks where diversity decreases at the final sampling point for A99039, and corresponds to viral suppression (see Fig 1). Grey shading indicates monotherapy or combination therapy, as indicated below the x-axis. Rx1 is treatment FTC+TFV+EFV and Rx2 is the treatment TFV+L870812+ DRV/r. (TIF)

S2 Fig. FTC and EFV plasma and tissue concentrations detected during monotherapy.
FTC and FTC-TP (A) and EFV (B) concentrations were measured in limited plasma (top) and PBMC and mucosal biopsy samples (bottom) taken from all animals at multiple time points. Grey shading indicates monotherapy (weeks [12][13][14][15][16][17][18][19][20]. Open symbols represent measurements that were below the lower level of quantitation. NA indicates sample was not available. Unfortunately LN tissues were not available for drug measurements.