^{1}

^{1}

^{1}

^{1}

^{2}

^{*}

Conceived and designed the experiments: NMD. Performed the experiments: RB VS ARS. Analyzed the data: RB NMD. Contributed reagents/materials/analysis tools: RB VS ARS NMD. Wrote the paper: NMD.

The authors have declared that no competing interests exist.

Whether HIV-1 evolution in infected individuals is dominated by deterministic or stochastic effects remains unclear because current estimates of the effective population size of HIV-1 _{e}_{e}^{2}–10^{4}, smaller than the inverse mutation rate of HIV-1 (∼10^{5}), implying the predominance of stochastic forces. In contrast, a model that includes selection estimates _{e}^{5}, suggesting that deterministic forces would hold sway. The consequent uncertainty in the nature of HIV-1 evolution compromises our ability to describe disease progression and outcomes of therapy. We perform detailed bit-string simulations of viral evolution that consider large genome lengths and incorporate the key evolutionary processes underlying the genomic diversification of HIV-1 in infected individuals, namely, mutation, multiple infections of cells, recombination, selection, and epistatic interactions between multiple loci. Our simulations describe quantitatively the evolution of HIV-1 diversity and divergence in patients. From comparisons of our simulations with patient data, we estimate _{e}^{3}–10^{4}, implying predominantly stochastic evolution. Interestingly, we find that _{e}_{e}^{5} reduces as the frequencies of multiple infections of cells and recombination assumed increase. Our simulations with _{e}^{3}–10^{4} may be employed to estimate markers of disease progression and outcomes of therapy that depend on the evolution of viral diversity and divergence.

The within-host genomic evolution of HIV-1 is driven by both deterministic forces such as selection and stochastic forces such as random genetic drift. The large census population of HIV-1 infected cells, ∼10^{7}–10^{8} in a typical patient _{e}_{e}_{e}^{5} cells, which leaves unclear the nature of HIV-1 evolution

_{e}_{e}_{e}_{e}_{e}

To estimate _{e}_{e}^{2}–10^{4}. These latter studies employed several tests to ascertain the predominant neutrality of HIV-1 evolution. More recent evidence, however, points to significant selective pressures on both the _{e}_{e}^{5}. The latter model, however, did not include recombination. Growing evidence _{e}_{e}

Substantial efforts are ongoing to describe HIV-1 evolution in the presence of recombination _{e}_{e}_{e}

Our aim is to employ a model of HIV-1 evolution that accurately mimics viral genomic diversification in infected individuals as a function of the population size and estimate _{e}_{e}_{e}_{e}_{e}_{e}_{e}

We perform simulations to predict the evolution of viral diversity, _{G}_{S}

We employ parameter values representative of HIV-1 infection ^{−5} substitutions per site per replication ^{−4} crossovers per site per replication _{e}^{2}–10^{4}

In _{G}_{S}_{G}_{S}_{G}_{S}_{G}_{S}

The evolution of (A) viral diversity, _{G}_{S}

In a previous study, we have examined in detail the influence of variations in parameter values on the evolution of _{G}_{S}_{G}_{S}_{G}_{S}_{G}_{S}_{G}_{G}

Of importance here is that _{G}_{S}_{G}_{S}_{G}_{S}_{G}_{S}_{e}

In one of the most comprehensive longitudinal studies, Shankarappa et al. determined the evolution of diversity and divergence of the C2-V5 region of the

With data from each patient, we compare our predictions of _{G}_{S}_{G}_{S}_{e}_{e}

Sum of squares of the errors (SSE) between data from patients _{G}_{S}_{e}

Best-fit predictions of our simulations (solid lines) presented with experimental data (symbols) of the evolution of viral diversity, _{G}_{S}_{e}

Patient | Effective population size, _{e} |
Viral generation time, τ (days) ( |
Effective population size, _{e} |
Viral generation time, τ (days) ( |
Disease progression time (months) |

1 | 400 | 0.8 | 1500 | 1.0 | 78 |

2 | 1000 | 1.0 | 5000 | 1.2 | 96 |

3 | 1000 | 1.1 | 5000 | 1.3 | 84 |

5 | 500 | 0.7 | 1500 | 0.7 | 72 |

6 | 400 | 0.7 | 1500 | 0.9 | 60 |

7 | 1500 | 1.0 | 5000 | 1.0 | 78 |

8 | 1500 | 1.2 | 10000 | 1.4 | 72 |

9 | 5000 | 1.4 | 10000 | 1.3 | 132 |

11 | 10000 | 1.7 | 100000 | 1.8 | 144 |

Remarkably, we find that both _{e}^{+} T cell count to drop below 200 cells/µL (_{e}_{e}_{e}

Correlation of (A) _{e}^{+} T cell count to fall to 200 cells/µL

We next examine the effect of a smaller frequency of multiple infections, where _{G}_{S}_{G}_{S}_{e}_{G}_{S}_{e}_{e}_{e}_{e}^{3}–10^{4} except for one patient (Patient 11) for whom _{e}^{5}, indicating the predominance of stochastic forces underlying HIV-1 evolution. Again, we find that both _{e}

The evolution of (A) viral diversity, _{G}_{S}

Sum of squares of the errors (SSE) between data from patients _{G}_{S}_{e}

Best-fit predictions of our simulations (solid lines) presented with experimental data (symbols) of the evolution of viral diversity, _{G}_{S}_{e}

We examine next whether the nature of the fitness landscape has any influence on our estimates of _{e}_{i}_{iF}L_{iF}_{G}_{S}_{G}_{S}_{e}_{e}

The evolution of viral diversity, _{G}

Our estimates of _{e}_{e}

Rouzine and Coffin argue that deep in the stochastic regime, one of the four haplotypes in a two-locus/two-allele model is expected to be underrepresented because of the strong influence of drift. As _{e}_{e}

To validate our results against those of Rouzine and Coffin, we first perform simulations under the conditions they employ. We let ^{2} relative to the wild-type. We let the founder sequence be the double mutant. Each cell is infected with a single virion (

Following Rouzine and Coffin, we perform simulations over a range of values of ^{−5}) yields an equivalent population size corresponding to the HIV mutation rate of 3×10^{−5} substitutions per site per replication. _{e}_{e}^{5} for evolution with selection, as deduced by Rouzine and Coffin. We note, as recognized by Rouzine and Coffin, that the mean experimental _{e}^{5}, whereas the 95% confidence limit on the experimental data extends up to _{e}^{5} is a lower bound and _{e}^{6}.

The frequency of the least abundant haplotype in a two-locus/two-allele model determined from our simulations (solid symbols) and by Rouzine and Coffin _{i}_{0} (green) or _{i}_{i}_{-1} (orange). Error bars represent standard deviations. Values of _{e}

Recombination can increase _{e}^{−4} crossovers per position per replication is the recombination rate. A cell infected by proviruses with mean fitness ^{−5} substitutions per site per replication over a range of values of _{e}_{e}^{2}–10^{4} (corresponding to the 95% confidence interval on _{e}

Next, we let _{i}_{0}, the rate constant of the infection of uninfected cells, so that ∼70% of the cells are singly infected, ∼21% are doubly infected, and so on. Then, we find that _{e}^{4} cells, higher than the estimate with _{i}_{i-}_{1}, which reduces the frequency of multiple infections even further, so that ∼77% of the cells are singly infected, ∼19% are doubly infected, and so on. With this distribution, we find _{e}^{5}, consistent with the estimate of Rouzine and Coffin. Note that when we ignore multiple infections of cells (_{e}

The widely varying prevalent estimates of _{e}_{e}_{e}^{3}–10^{4}, substantially smaller than the inverse mutation rate of HIV-1, implying the predominance of stochastic forces underlying HIV-1 evolution _{e}

The best-fit values of the viral generation time, τ, we obtain are in good agreement with prevalent estimates. We find that for the nine patients we consider the mean τ is 1.1–1.2 days (range 0.7–1.8 days). Previous studies estimate τ to be ∼1.2 days (range 0.65–2.97 days) using a coalescent approach

We find remarkably that _{e}_{e}^{+} T cell count to drop below 200 copies/µL, and thus to faster disease progression. A small τ implies fast viral replication and hence rapid disease progression. The origins of the correlation between _{e}_{e}_{e}

Our simulations present an explanation of the wide variation in the prevalent estimates of _{e}_{e}^{2}–10^{4} _{e}^{5} _{e}_{e}_{e}^{5}, in agreement with Rouzine and Coffin, to ∼10^{2} (

We note that with the same frequency of multiple infections (_{e}^{4} (_{G}_{S}_{e}_{G}_{S}_{G}_{S}_{e}_{e}_{G}_{S}

Several factors may underlie the small values of _{e}

Our study has limitations. First, the fitness landscape employed in our simulations (

The relative fitness, _{i}_{iF}L_{min} = 0.24, _{50}

Second, in some of the nine patients we consider, viral diversity rises to a peak and then drops to a plateau, whereas our simulations yield best-fits that predict a monotonic rise of the diversity to the plateau (

Nonetheless, our simulations incorporate the key evolutionary forces that govern HIV-1 diversification

The predominance of stochastic forces predicted by our estimates of _{e}

We consider a fixed population,

We let infections occur in discrete generations. In any generation, each cell is infected by

In each generation, we compute the average diversity, _{ij}_{i}_{j}_{i}

Several realizations of the infection process are averaged to obtain the expected evolution of _{G}_{S}

For selection, we employ the recently determined experimental fitness landscape for HIV-1, which quantifies the fitness of a genome, _{i}_{i}_{iF}_{min} = 0.24 is the minimum fitness of sequences attained at arbitrarily large absolute Hamming distances from _{50}L_{i}_{min}

Patient | Synonymous substitution rate (×10^{−4} per site per month) |
Non-synonymous substitution rate (×10^{−4} per site per month) |

1 | 3.5 | 5.4 |

2 | 3.1 | 8.7 |

3 | 3.7 | 14.1 |

5 | 7.9 | 8.5 |

6 | 4.5 | 5.2 |

7 | 2.2 | 9.5 |

8 | 3.7 | 9.2 |

9 | 1.6 | 5.3 |

To estimate the frequency of multiple infections of cells from viral dynamics, we consider the following model:

Here, uninfected CD4^{+} cells, _{T}_{0}. The latter infections produce singly infected cells, _{1}, which in turn are lost by further infections with the second order rate constant _{1}, or by death at the rate _{i}

We solve these equations using parameter values representative of HIV-1 infection ^{5} cells/ml/day, _{T}_{0} = 2.4×10^{−8} ml/day δ = 1/day, ^{3} virions/cell, and _{i}_{i}_{1} = 0.7_{0} captured _{1}>_{0}. Cell-to-cell transmission could also result in a high frequency of multiply infected cells _{i}_{0} or _{i}_{i-}_{1} for all _{1}>_{0}.)

At long-times following the onset of infection, the above equations predict that infection reaches a steady state. The steady state populations of _{i}_{i}_{0}. Whereas, when _{i}_{i-}_{1}, at steady state ∼77% of the infected cells are singly infected, ∼19% are doubly infected, ∼3.4% are triply infected, ∼0.5% are quadruply infected, ∼0.04% are quintuply infected, and ∼0.002% are hextuply infected.

^{+}cell turnover.