^{1}

^{2}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: SL VVG. Performed the experiments: SL. Analyzed the data: SL VVG. Contributed reagents/materials/analysis tools: SL VVG. Wrote the paper: SL VVG.

Recent studies have highlighted the ability of HIV to escape from cytotoxic T lymphocyte (CTL) responses that concurrently target multiple viral epitopes. Yet, the viral dynamics involved in such escape are incompletely understood. Previous analyses have made several strong assumptions regarding HIV escape from CTL responses such as independent or non-concurrent escape from individual CTL responses. Using experimental data from evolution of HIV half genomes in four patients we observe concurrent viral escape from multiple CTL responses during early infection (first 100 days of infection), providing confirmation of a recent result found in a study of one HIV-infected patient. We show that current methods of estimating CTL escape rates, based on the assumption of independent escapes, are biased and perform poorly when CTL escape proceeds concurrently at multiple epitopes. We propose a new method for analyzing longitudinal sequence data to estimate the rate of CTL escape across multiple epitopes; this method involves few parameters and performs well in simulation studies. By applying our novel method to experimental data, we find that concurrent multiple escapes occur at rates between 0.03 and 0.4 day^{−1}, a relatively broad range that reflects uncertainty due to sparse sampling and wide ranges of parameter values. However, we show that concurrent escape at rates 0.1–0.2 day^{−1} across multiple epitopes is consistent with our patient datasets.

Since the early 1990s, cytotoxic T lymphocytes (CTLs) have been known to play an important role in HIV infection with CTLs targeting HIV epitopes and, in turn, HIV escapes arising through mutations in the targeted epitopes. Over the past decade, studies have shown that CTL responses concurrently target multiple HIV epitopes, yet the effect of concurrent responses on HIV dynamics and evolution is not well understood. Through an analysis of patient datasets and a novel statistical method, we show that during early HIV infection concurrent CTL responses drive concurrent HIV escapes at multiple epitopes with significant pressure, suggesting a complex picture in which HIV simultaneously explores multiple mutational pathways to escape from broad and potent CTL response.

During Human Immunodeficiency Virus 1 (simply HIV hereafter) infection, cytotoxic T lymphocyte (CTL) responses play a significant role in shaping viral dynamics and evolution [

HIV-specific CTL responses first arise about 3 weeks into infection, several days prior to peak viral load, and initially target roughly 3–5 epitopes [

Viral escape from CTL-mediated killing follows a temporal pattern similar to CTL response, although not all responses elicit an escape and escapes do not always occur in the same order as the CTL responses [

Previous studies have measured the rate of CTL escape as a means of quantifying the strength of CTL response [

A common approach for estimating escape rates, introduced in [

In this work we consider the CTL response associated with the viral escapes that occur during the first 2–3 months of infection in four previously analyzed patient datasets from the Center for HIV/AIDS Vaccine Immunology (CHAVI) cohort: CH40, CH58, CH77, and CH256 [

Pandit and De Boer [

Using specific examples and simulations, we show that estimates of escape rates based on the logistic model can lead to bias in the presence of multi-epitope escape, with the direction of bias depending on the structure of the escape. Furthermore, we show that application of the logistic model when CTL escape is sampled at a single time point leads to estimates that are severely downward biased (i.e., are underestimates). Importantly, these biases are unrelated to the presence of replicative fitness costs. Instead, bias associated with the logistic model is due to ignoring concurrent escapes and making strong assumptions regarding mutant frequency when mutants are undetectable.

To address these limitations in the logistic model, we introduce a novel method for estimating escape rates that removes bias associated with multi-epitope escape by applying the logistic model to pairs of variants, thereby generalizing the notion of wild type and mutant to the setting of multiple epitopes. This novel method still suffers from bias when CTL escape is captured at a single time point, so we introduce a further extension, involving the introduction of three parameters _{I}, and _{I} is the time at which the first escape mutation occurs, and _{I} and _{I} and

Although we are unable to give a narrow range for the rate at which escapes proceed, our results show that escape rates in the range of 0.1–0.2 day^{−1} across multiple epitopes are consistent with our dataset. Such significant escape rates across multiple epitopes would reflect a broad yet still relatively strong CTL killing. If replicative fitness costs associated with escape exist, CTL kill rates are above the range 0.1–0.2 day^{−1}, suggesting even stronger CTL killing. Importantly, the wide range of escape rates contained within our lower and upper bound reflects a range of model assumptions consistent with our datasets. More accurate escape rate estimates require either more model assumptions, corresponding to narrower choices for _{I} and

Goonetilleke et. al. [

In each of our four patient datasets, two sample time points fall within the first 2–3 months of infection: we label these sample times _{1} and _{2}. To analyze early CTL escape, we consider only putative epitopes with mutation in at least one sequence collected at _{1} and _{2}. Further, to avoid variation unassociated with CTL response, we only consider putative epitopes that meet at least one of two criteria: 1) the putative epitope is supported by ELISpot assays in [_{1} and _{2} (i.e. different sequences collected at _{1} and _{2} had different mutations on the putative epitope) and the putative epitope was eventually lost as infection progressed [

Following [

Vertices of an escape graph represent viral variants that are part of the escape pathway and edges correspond to epitope mutations needed to change one variant into another. Numbering below each vertex indicates whether a particular putative epitope is wild-type (0) or escape (1). Numbers above each vertex that are separated by a slash represent percent of this particular variant at the two time points _{1} and _{2}, respectively. For example, in Fig 1A, the vertex numbered below by 111100 corresponds to a viral variant mutated at putative epitopes NEF185, GAG113, GAG395, and VPR74, but not at putative epitopes POL80, VIF57 and comprises 0% and 35% of the sequences sampled at _{1} and _{2}, respectively. Yellow and red vertices represent initial variants and expansion variants, respectively; in all patients, initial mutations lead to the first escape which is followed by the expansion of viral variants. CTL responses to putative epitopes are supported by ELISpot assays (underlined epitopes), HLA association (non-underlined, black text epitopes), and multiple mutant haplotypes (red text epitopes). See _{1} and _{2} values and [

The sample frequencies of each variant in the escape graph at _{1} and _{2} are shown in

The form of the escape graphs reflects the concurrent nature of HIV escape in the four patients and is in-line with the previous results of Pandit and De Boer [_{2} the CH256 escape involves four variants that are all children of variant 100000: 110000, 101000, 100100, and 100010. Escape through these four variants involves mutation at distinct epitopes, reflecting concurrent escape from CTL responses to POL393, NEF185, ENV799, and VIF169 through multiple pathways. For NEF185, escape occurs concurrently through variants 101000 and 101001, with variant 101001 linking escape at NEF185 and ENV606. Similar patterns are seen in the other patients. As shown in

At sample times after _{2}, a new set of vertices arise from the expansion vertices and the expansion vertices collapse to low frequency or are no longer sampled. For sample times extending up to 6 months, most variants are seen once at intermediate frequencies and then disappear at the next sample time as occurs for the initial vertices in moving from _{1} to _{2}. This observation has been made previously, although not in the context of linked data [

Current methods for estimating escape rates consider escape at each epitope separately by grouping variants into wild type and mutant according to the absence or presence, respectively, of mutation solely at the given epitope [_{WT}(_{MT}(_{1} and _{2}.

Escape rate estimates based on _{1} and _{2},

In the presence of multi-epitope escape, variants within and across the wild type and mutant groups can differ at epitopes other than the one used to form the groupings, leading to bias. As a concrete example, consider the hypothetical escape graph shown in panel A of ^{−1} at epitopes a, b, and c, respectively. Under these assumptions, the escape rate at each epitope should equal the CTL kill rate at the epitope. However, applying ^{−1} for epitopes a, b, and c respectively.

We simulate dynamics of viral escape from constant CTL response with killing rates of 0.4, 0.3, and 0.5 day^{−1} against epitopes a, b, and c, respectively. All variants have equal replicative fitness, mutation is ignored, and dynamics are generated deterministically according to _{1} = 0 with the initial frequencies shown to the left of the slashes and run to _{2} = 10. In this case _{1} = 0 does not model initial infection; instead it simply serves as the initial time point. The escape graph obtained by sampling the virus population at times _{1} = 0 and _{2} = 10 days with

The overestimate of the escape rate at epitope a (0.69 instead of the true 0.4) is caused by the groupings: the wild type group is composed of variant 000 and the mutant group is composed of variants 100, 110, and 101. The escape of mutant variants 110 and 101 is driven by CTL responses to epitopes a, b and c, but is erroneously attributed solely to CTL response to epitope a, leading to the overestimate. Similarly, the underestimate of the escape rate at epitope b (0.25 versus the true value of 0.3) results from a wild type group that includes variant 101 while the mutant group contains variant 110. The escape rate of mutant variant 110 relative to wild type variant 101 is slowed by response at epitope c and

A second problem associated with _{1} or _{2}. This occurs when no mutations have occurred by time _{1} or _{2}, when the sweep of the mutants has completed and no wild types exist, or when the modest number of samples possible in SGA fails to capture mutant or wild type variants. In such cases _{2} (

Finally, _{1} and _{2}, mutation plays a minor role. However, in situations when the mutant population is still small at _{1}, mutations significantly increase the number of mutants at _{2} relative to _{1} and

_{1} and _{2}. All three biases discussed above are in effect, but the dominant bias is the small frequency of mutants at _{1}, leading to samples in which the mutants are not present and the 1/(^{−1} for a true escape rate of 0.6 day^{−1}.

We ran stochastic simulations of viral escape assuming CTL response at 6 epitopes with an early CTL response at a single epitope followed roughly a week later by CTL response at five additional epitopes. For each simulation, we estimate the escape rate at the 5 additional epitopes using _{1} = 30 and _{2} = 60 and then calculate the relative error of the estimated escape rate, (estimate-true rate)/true rate, based on sampling _{1} and _{2} (column “sampled freq”), based on the exact frequencies of wild type and mutant variants at _{1} and _{2} (column “exact freq”), and based on exact frequencies as well as a model in which no mutations occur after _{1} (column “exact freq/no mutation”). Due to the later CTL response, the 5 additional epitopes correspond to expansion variants. We show relative errors of escape rate estimates under linear and full escape graphs. The linear escape graphs can include only the variants 000000, 100000, 110000, …, 111111. The full escape graphs can include all 2^{6} possible haplotypes formed by wild type and mutants at the 6 epitopes. Strong and weak CTL response reflect simulations in which the killing rate at the 5 additional epitopes had a maximum value of ^{−1} and ^{−1}, respectively, with the exact kill rate varying across simulations (see

CTL response | graph | sampled freq | exact freq | exact freq/no mutation |
---|---|---|---|---|

strong | linear | -0.65 (-0.89,-0.49) | 0.52 (0.31,1.07) | 0.37 (0.24,0.87) |

weak | linear | -0.96 (-0.98,-0.75) | 1.29 (0.82,2.42) | 0.85 (0.56,1.49) |

strong | full | -0.64 (-0.84,-0.47) | -0.13 (-0.3,0.02) | -0.28 (-0.62,-0.09) |

weak | full | -0.95 (-0.97,-0.74) | 0.16 (0.02,0.43) | -0.19 (-0.36,-0.07) |

Column _{1} and _{2}, thereby removing the effect of sampling variance and the 1/(

Finally, to form the column _{1} and _{2} and then applied

To account for concurrent multi-epitope escape our approach is to associate an escape rate with each edge of the escape graph. Intuitively, each parent-child vertex pair represents a “competition assay” measuring the selective advantage of the child relative to the parent and since there are only two variants considered, the bias associated with multi-epitope escape in _{P}(_{C}(_{WT}(_{MT}(_{1}, _{2}, but in _{P}(_{C}(

We use the same simulations as described in

CTL response | graph | sampled freq | exact freq | exact freq/no mutation |
---|---|---|---|---|

strong | linear | -0.92 (-1.24,-0.71) | 0.03 (0.01,0.07) | -0.01 (-0.02,-0.01) |

weak | linear | -1.57 (-1.95,-1.32) | 0.16 (0.1,0.3) | -0.04 (-0.06,-0.02) |

strong | full | -0.83 (-0.99,-0.68) | 0.02 (0,0.05) | -0.01 (-0.03,0.01) |

weak | full | -1.49 (-1.84,-1.22) | 0.18 (0.11,0.33) | 0.01 (-0.01,0.04) |

Zero frequencies are particularly problematic in the context of _{1}, while in

For our patient escape graphs, most vertices are expansion vertices and have zero frequency at _{1}. This difficulty exists in our simulations as well, where roughly 90% of all variants have a true frequency of less than 0.1 at _{1} = 30 days, reflecting the time needed for mutant variants to arise and expand to significant frequencies. As a result,

To estimate escape rates for edges pointing to expansion vertices, we replace _{2}:

In _{I} is the time of the first parent to child variant mutation, and _{C}(_{2}), _{P}(_{2}) and a choice for the parameters _{I} and _{I}, _{2}] rather than [_{1}, _{2}]. In practice, since

To explain _{I}, we consider different models for the ratio _{C}(_{2})/_{P}(_{2}). Let _{P}(_{C}(

In _{2} under the initial condition _{C}(0) = 0 and _{P}(0) = _{0} gives

_{I} = 0, reflecting the assumptions implicit in _{P}(

Starting mutation at _{cutoff} before which no mutations occur and after which mutations occur at rate _{P}(_{I} = _{cutoff}.

Other models of parent-child variant dynamics and mutation are possible. If we assume that parent variants mutate into child variants at a rate _{P}(_{I}, but that the actual number of mutations varies around this average due to mutational stochasticity, we arrive at

_{I}, _{I} and _{I} and ^{−4} for all parent-child pairs. As the table shows, changing any one of the parameters within a reasonable range can lead to estimates that are upper or lower bounds, reflected by a positive or negative relative error, respectively.

We use the same simulations as described in _{I} on escape rate estimates using _{I} to a value, as shown in the column labels, and set the other two values to the true value of the escape, which we record during the simulations, and then compute the relative error of the escape rate estimate. Changing the value of each of the three parameters within a reasonable range can lead to under or over estimates.

CTL response | graph | _{I} = 0 |
_{I} = 30 |
^{−5} |
^{−4} |
||
---|---|---|---|---|---|---|---|

strong | linear | 0.07 | -0.35 | -0.38 | 0.43 | 0.22 | -0.33 |

weak | linear | 0.12 | -0.82 | -0.43 | 0.56 | 0.48 | -0.69 |

strong | full | 0.13 | -0.28 | -0.38 | 0.39 | 0.2 | -0.29 |

weak | full | 0.18 | -0.74 | -0.42 | 0.57 | 0.44 | -0.65 |

We do not know the ‘true’ _{I} for a parent-child pair within a given patient and _{I} values. Previous approaches for estimating escape rates across multiple epitopes using the standard model and/or birth death processes can be viewed in the context of _{I} and

To apply _{I} and

In our simulations, _{I} ∈ [16, 34] for roughly 75% of expansion variants. Since peak viral load generally occurs 21 days post infection [_{I} falls between 5 days prior and two weeks after peak viral load. As an additional verification of the lower bound on _{I}, the CTL response likely arises roughly a week prior to peak viral load [^{−5}, 3 × 10^{−4}]. Given a per nucleotide, per reverse transcription mutation rate of 3 × 10^{−5} [_{I} five days prior to peak viral load and ^{−4} and with _{I} two weeks post peak viral load and ^{−5}, respectively, gives us lower and upper bounds for the escape rate. We also consider intermediate values for our parameters determined by the medians of _{I}: _{I} four days post peak viral load. We choose ^{−4} as an intermediate value of

We use the same simulations as described in _{I} = −5 days (5 days before peak viral load), ^{−4} and _{I} = 14 days post peak viral load, ^{−5}, respectively. The intermediate values are calculate using _{I} = 4 days post peak viral load, and ^{−4}. Lower and upper bounds provide an estimated range for the escape rate, while the intermediate estimate reflects escape rates assuming less extreme parameter choices.

CTL response | graph | lower bound | upper bound | intermediate |
---|---|---|---|---|

strong | linear | -0.56 (-0.84,-0.42) | 1.09 (0.95,1.3) | 0.14 (0.09,0.2) |

weak | linear | -1.27 (-1.57,-1.09) | 1.79 (1.51,2.27) | 0.13 (0.06,0.22) |

strong | full | -0.5 (-0.74,-0.39) | 1.01 (0.81,1.21) | 0.15 (-0.01,0.22) |

weak | full | -1.16 (-1.52,-0.93) | 1.83 (1.56,2.47) | 0.21 (0.14,0.34) |

While the parameter ranges we have chosen are relatively broad, the possibility exists that we have missed the true parameter range of HIV infection. Our method does not eliminate parameter dependence, but by choosing a broad range of parameter values, we are more likely to have captured true dynamics than if we simply specified particular values for _{I} and

_{1} and _{2} (_{1}, _{2} and eight different haplotypes as infection progressed, dynamics suggestive of epitope shattering [_{I} and

We use experimental data on kinetics of HIV escape from multiple CTL responses (see ^{−1} units.

patient | epitopes | lower bound | intermediate | upper bound | single epitope | previous |
---|---|---|---|---|---|---|

CH40 | POL80 | 0.08 | 0.17 | 0.27 | -0.01 | 0.02 |

CH40 | VIF57 | 0.03 | 0.11 | 0.2 | 0.09 | 0.03 |

CH58 | ENV830 | 0.11 | 0.22 | 0.34 | 0.17 | 0.12 |

CH58 | GAG240 | 0.07 | 0.17 | 0.28 | 0.08 | 0.08 |

CH58 | NEF105 | 0.05 | 0.15 | 0.26 | 0.09 | 0.07 |

CH77 | ENV350 | 0.28 | 0.77 | 3.22 | 0.21 | 0.36 |

CH77 | NEF17 | 0.11 | 0.49 | 2.34 | 0.04 | 0.30 |

CH77 | VPU57 | 0.15 | 0.55 | 2.53 | 0.01 | 0.05 |

CH77 | NEF73 | 0.22 | 0.66 | 2.87 | 0.06 | 0.29 |

CH77 | ENV605 | 0.22 | 0.66 | 2.87 | 0.01 | 0.01 |

CH256 | VIF169 | 0.02 | 0.08 | 0.14 | 0 | 0.04 |

CH256 | NEF185 | 0.06 | 0.13 | 0.19 | 0 | 0.08 |

CH256 | ENV799 | 0.03 | 0.09 | 0.15 | 0.07 | 0.03 |

CH256 | POL393 | 0.03 | 0.09 | 0.15 | -0.02 | 0.03 |

CH256 | ENV606 | 0.03 | 0.1 | 0.16 | 0.02 | 0.03 |

For CH58 and CH77, the lower bounds demonstrate that escape can proceed concurrently, with rates exceeding 0.05 day^{−1} at multiple epitopes. The lower bounds are less informative for CH40 and CH256. For both these patients, a single epitope has a lower bound escape rate exceeding 0.05 day^{−1}, but the other bounds are lower, roughly 0.03 day^{−1}.

Across all patients, the upper bound escape rates are significant, allowing for the possibility of fast escape at multiple epitopes. However, with the exception of patient CH77, the upper bounds do not exceed 0.4 day^{−1}, roughly the upper range seen in previous studies that considered escape separately at each epitope [

The large difference between our lower and upper bounds reflects a large range of dynamics consistent with our patient datasets. Therefore, current acute HIV infection datasets based on SGA sampling and sparse temporal sampling cannot accurately estimate escape rates without significant modeling assumptions. Our intermediate estimates provide one case of such modeling assumptions. The estimates, formed by typical values for _{I} seen in simulation, demonstrate that significant escape rates of roughly 0.1–0.2 day^{−1} across multiple epitopes are consistent with our data and reflects reasonable assumptions.

The single escape rate estimates tend to fall near our lower bound estimates, possibly reflecting a downward bias of ^{−1} and our novel estimates suggest at least an escape rate of 0.22 day^{−1}, a 22-fold increase. Since single estimates of escape rates are based on samples over a longer time period, their downward bias relative to our estimates suggests that escape rates may indeed slow down as time progresses as has been suggested previously [

Overall, our analysis strongly suggests that escape in acute HIV infection occurs concurrently from multiple CTL responses and proceeds at significant rates. However, whether escape rates reach the ranges of 0.1–0.2 day^{−1} suggested by our intermediate values cannot be determined without more assumptions (leading to narrower rate for parameters _{I}, and

We have presented a novel method for estimating the rate of concurrent escape of HIV from multiple CTL responses that can be applied to our dataset and potentially other datasets. The method is based on an escape graph representing the mutation pathways through which HIV evades CTL response. Our method extends the logistic model of [

Our results suggest that CTL escape can occur concurrently at multiple epitopes, with escape rates ranging between 0.03 and 0.4 day^{−1} across multiple epitopes. The upper bound of 0.4 day^{−1} is in-line with upper bounds for CTL escapes found using the logistic model [

Pandit and De Boer [

Kessinger et. al. [

Da Silva calculated an effective population size (_{e}) for HIV of 10^{2}–10^{3} during early infection based on a census population on the order of 10^{7}. In simulations of CTL escape based on a Wright-Fisher model with _{e} of 10^{2}–10^{3}, da Silva observed that escapes do not happen concurrently. Since our simulations assume census population sizes similar to da Silva’s (i.e. 10^{7}), combining our results with those of da Silva raises the possibility that a Wright-Fisher model does not accurately reflect HIV escape dynamics, as has been suggested by previous authors [

Concurrent viral escape variants may affect each other indirectly through competition for target cells. Previous authors have considered such interactions and the potential role of clonal interference on viral evolution, see the reviews [_{I} and

Our approach and results come with the statistical caveat that a biased escape graph will produce biased estimates. For example, the escape graph will be biased when many low frequency variants are present. To see why, consider an extreme example of CTL escape at 100 epitopes and imagine that 100 variants exist in the viral population, each variant mutated at a single epitope and each variant at frequency 1%. If we form an escape graph based on sampling 10 sequences, the most likely outcome is 10 different variants, each with sample frequency 10% but with true frequency 1%. Dataset CH77 has a sampling pattern consistent with such biasing. In general, formation of the escape graph is statistically complex and other forms of bias may exist, although such biases did not arise in our simulations. Exploration of this issue requires further work.

Much of our approach follows from the sparse sampling of current datasets. While the rise of deep sequencing datasets addresses the shallowness of sampling, understanding the complexity of early escape requires linkage information and would benefit from more sampling time points. Our methods and results highlight the importance of better sampled datasets for understanding early HIV dynamics and evolution.

The left escape graph of

See text for details of simulations and

Our goal in estimating escape rates is to quantify the CTL killing rates associated with CTL response at different epitopes. When multiple paths exist between two vertices, multiple edges must correspond to escape at the same epitope: here edges

Ideally, we would estimates escape rates at all edges and this would provide information about CTL response in the context of different haplotypes. For example, in

In our datasets, the escape graphs of patients CH40 and CH77 require no pruning to reduce them to escape trees. The escape graph of patient CH58 requires the removal of two vertices, but the associated variants are at very low frequency at _{1}, 0.02 and 0.04 respectively, and are not sampled at _{2}. In CH256 we remove three vertices to produce an escape tree, with all three corresponding variants unsampled at _{1} and sampled at 0.04 frequency at _{2}, see

To perform simulations of viral escape, we use the model of Batorsky et. al. [

For each simulation we track the CTL kill rate at a given epitope, providing us with the true escape rate (i.e. _{1} = 30 and _{2} = 60, form an escape graph based on the simulated sequences, and then estimate the escape rate,

_{on}, _{max,i}, _{i}, _{max,i} shown are for the strong CTL response simulation. Weak CTL response simulation differ only in _{max,i} = .12_{rec}, and

parameter | meaning (units) | Dominant | Subdominant |
---|---|---|---|

number of epitopes | 1 | 5 | |

_{on} |
time response initiates (day) | 14 | 20 + 10 × |

_{max,i} |
maximum kill rate (day^{−1}) |
0.4 | 0.3 × |

_{i} |
saturation constant (log10(infected cells)) | 3 | 3 |

proliferation rate (day^{−1}) |
2 | 1.2 | |

contraction rate (day^{−1}) |
0.4 | 0.4 | |

recombination rate per nucleotide (day^{−1}) |
1.4/5 × 10^{−5} |
||

_{rec} |
breakpoints per recombination | 5 | |

mutation rate per epitope (day^{−1}) |
10^{−4} |

To model CTL response at _{i}(_{i}(_{on,i}, where _{on,i} represents a time at which CTL response to epitope _{on,i}, _{i}(_{i}(_{on,i}) = 0.01 giving the initial condition of the response. (See _{i}(_{max,i}. Rates of expansion and contraction for the immune response (

At time 0, the population starts with 1 variant possessing all 6 epitopes. The population size ^{7} within the first 21 days, following estimates of the total number of infected cells in the body, and then collapses to 10^{4.5} over the following two weeks and subsequently holds steady [^{7} to 10^{4.5} variants is larger than estimated in [^{−4}.

Recombination occurs at rate (_{A} _{B} with ^{−5} and where _{A}, _{B} are the frequencies of the recombining variants

For full details regarding patient datasets CH40, CH58, CH77, and CH256 see [

For patients CH40 and CH58, we apply our methods to the first two time points sampled after the onset of symptoms. Escape at CH77 was very fast and broad, encompassing 9 loci by the first timepoint sampled, day 14 post symptoms. To study this early escape, we assume that 10 days prior to symptoms, a time likely about 5 days prior to peak viral load, sampling would have been homogeneous for the founder variant. We then use 10 days prior to symptoms as our first timepoint and day 14 past symptoms as our second timepoint. For CH256, the first timepoint sampled was homogeneous for the founder variant and the second timepoint possessed variation at a single loci, so we use the second and third timepoints as our two sample times.

To choose lower bounds for _{I}, we estimate the patients’ peak viral load times and picked a time 5 days earlier. _{I}, _{1}, and _{2} that we use for each patient dataset along with the number of samples available for the 5’ and 3’ end at each time point.

For CH77 all of the putative epitopes are on the 3’ end, so construction of the escape graph follows directly from the data. For the other three patients we construct a full escape graph by attaching 5’ edges onto the 3’ escape graphs. CH58 and CH256 each has only a single putative epitope on the 5’ end, meaning we form the full escape graph by adding a single edge to the 3’ escape graph. CH40 has putative epitopes evenly split between the 5’ and 3’ ends. Since we lack linkage information, the full escape graph represents a guess on our part. But we attach 5’ escapes to parts of the 3’ escape graphs near the root vertex, if the 5’ escapes are actually attached further away from the root our estimated escape rates would be higher.

(PDF)

(TIF)

(TIF)

(TIF)

(TIF)

Shown are CTL kill rate profiles targeting 6 viral epitopes (panel A), the epitope mutation frequencies (panel B), the variant frequencies (panel C), and the rate at which each variant population produces mutants (panel D). The kill rate for a given variant was the sum of the kill rates across all epitopes in the variant haplotype, meaning that we assumed additive killing across epitopes for which there was a ‘0’ in the variant label shown in the legend. Epitope mutation frequencies were computed by summing up the frequencies of all variants mutated at the given epitope. The simulation was run with _{1} = 30 and _{2} = 60. The census population size ^{7} over the first 3 weeks of infection, collapsed to 10^{4.5} over the next two weeks, and then hold steady. We assumed no fitness cost of the escape mutations in these simulations (i.e., same replicative fitness for all variants). In Panel D, the rate (day^{−1}) at which 000000 variants mutates rises to roughly 1000, we plot on a more modest scale to make the other variant mutations rates visible. The sudden changes in slope seen in Panel D for variant 100000 at times 21 and 35 reflect the sudden change in the

(TIF)

We simulated HIV evolution using a stochastic model as described in the Methods and graph the pathways of viral escape from 6 CTL responses. Panel A shows the escape graph generated by considering all variants with frequencies greater than 0.01 at either _{1} or _{2}, and panel B shows the escape graph generated by random sampling of 15 sequences at times _{1} and _{2}. For example, 2 of the 15 samples at _{1} were viral variant 100000, which is a frequency of 13% as shown in the panel B. Edges in the escape graph give the epitope mutated in moving from parent to child. Initial and expansion variants are colored red and yellow, respectively.

(TIF)

Shown are the total number of nucleotides sites spanned by all such epitopes and in parenthesis the percentage of the viral genome covered by such epitopes (epitope sites), the total number of variable sites within all such epitopes and in parenthesis the percentage of these variable sites relative to the number of variable sites across the viral genome (epitope variable sites), and the p-value assuming all sites across the genome are equally likely to be variable (p-value).

(PDF)

(PDF)

All times are in units of days since the onset of symptoms. 5’ samples and 3’ samples give the number of sequences sampled for each 1/2 genome at _{1} and _{2}, respectively.

(PDF)

SL thanks NIMBIOS members and staff for gracious hospitality and support throughout the short-term visit during which this work was initiated.