## Figures

## Abstract

The ability to accelerate the accumulation of favorable combinations of mutations renders recombination a potent force underlying the emergence of forms of HIV that escape multi-drug therapy and specific host immune responses. We present a mathematical model that describes the dynamics of the emergence of recombinant forms of HIV following infection with diverse viral genomes. Mimicking recent in vitro experiments, we consider target cells simultaneously exposed to two distinct, homozygous viral populations and construct dynamical equations that predict the time evolution of populations of uninfected, singly infected, and doubly infected cells, and homozygous, heterozygous, and recombinant viruses. Model predictions capture several recent experimental observations quantitatively and provide insights into the role of recombination in HIV dynamics. From analyses of data from single-round infection experiments with our description of the probability with which recombination accumulates distinct mutations present on the two genomic strands in a virion, we estimate that ∼8 recombinational strand transfer events occur on average (95% confidence interval: 6–10) during reverse transcription of HIV in T cells. Model predictions of virus and cell dynamics describe the time evolution and the relative prevalence of various infected cell subpopulations following the onset of infection observed experimentally. Remarkably, model predictions are in quantitative agreement with the experimental scaling relationship that the percentage of cells infected with recombinant genomes is proportional to the percentage of cells coinfected with the two genomes employed at the onset of infection. Our model thus presents an accurate description of the influence of recombination on HIV dynamics in vitro. When distinctions between different viral genomes are ignored, our model reduces to the standard model of viral dynamics, which successfully predicts viral load changes in HIV patients undergoing therapy. Our model may thus serve as a useful framework to predict the emergence of multi-drug-resistant forms of HIV in infected individuals.

## Author Summary

Retroviral recombination, a process akin to sexual reproduction in higher organisms, may accelerate the accumulation of mutations and the development of multi-drug resistance in HIV patients. Recombination occurs when the enzyme reverse transcriptase switches between the two RNA strands of a virion, yielding a provirus that is a mosaic of the two strands. The latter strands are often distinct, thereby allowing recombinational diversification, when multiple viruses infect individual cells. The enormous HIV recombination rate and recent evidence of frequent multiple infections of cells render recombination a powerful force underlying the development of multi-drug resistance in vivo. The dynamics of the emergence of recombinant genomes, however, remains poorly understood. Recent experiments allow a closer look at HIV recombination: cells are exposed to two kinds of reporter viruses and the frequency of recombinant proviruses is detected, which enables direct quantification of the extent of recombination. The observations, however, are not described by available models, leaving a gap in our understanding of HIV recombination. We present a model that describes HIV dynamics with multiple infections of cells and recombination, captures several recent experimental observations quantitatively, provides insights into HIV recombination, and presents a framework for describing the development of multi-drug resistance in HIV patients.

**Citation: **Suryavanshi GW, Dixit NM (2007) Emergence of Recombinant Forms of HIV: Dynamics and Scaling. PLoS Comput Biol 3(10):
e205.
doi:10.1371/journal.pcbi.0030205

**Editor: **Sebastian Bonhoeffer, Swiss Federal Institute of Technology Zurich, Switzerland

**Received: **May 7, 2007; **Accepted: **September 5, 2007; **Published: ** October 26, 2007

**Copyright: ** © 2007 Suryavanshi and Dixit. This is an open-access article distributed
under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the
original author and source are credited.

**Funding: **This work was supported by US National Institutes of Health grant AI065334.

**Competing interests: ** The authors have declared that no competing interests exist.

**Abbreviations:
**CFP,
cyan fluorescent protein; GFP,
green fluorescent protein; YFP,
yellow fluorescent protein

## Introduction

During the reverse transcription of HIV in an infected cell, the viral enzyme reverse transcriptase switches templates frequently from one genomic RNA strand of a virion to the other, yielding a recombinant proviral DNA that is a mosaic of the two parent genomes. If one strand contains a mutation that confers upon HIV resistance to one administered drug and the other strand resistance to another drug, recombination may bring the two mutations together and give rise to progeny genomes resistant to both those drugs [1,2]. Recombination may thus accelerate the emergence of multi-drug resistance in infected individuals. A prerequisite for recombination to induce genomic diversification is the presence of heterozygous virions [3], which contain nonidentical genomic RNA strands and are formed when individual cells are infected by multiple virions. Recent experiments present evidence of the predominance of multiple infections of cells both in vitro and in vivo [4–7]: infected splenocytes from two HIV patients, for instance, were found to harbor up to eight proviruses, with three to four proviruses per cell on average [6]. The high incidence of multiple infections of cells coupled with the high recombination rate of HIV, estimated to be several times greater than the HIV point mutation rate [7–10], sets the stage for recombination to act as a powerful agent driving the emergence of multi-drug-resistant forms of HIV in patients undergoing therapy. In addition, recombination may serve to preserve diversity in genomic regions not affected by bottlenecks introduced by drug therapy or host immune responses, and improve the adaptability of HIV to new environments [11]. Indeed, in addition to several circulating recombinant forms of HIV, a large number of recombinant forms unique to individuals have been identified [12]. It is of great importance, therefore, to understand how recombinant forms of HIV arise in infected individuals.

Remarkable insights into HIV recombination emerge from recent in vitro experiments,
in which target cells were simultaneously exposed to two kinds of reporter viruses,
and cells infected with recombinant proviruses detected [3–5,7,13,14]. Rhodes et al.
determined using single-round infection assays that the likelihood of the
accumulation by recombination of distinct mutations present on the two viral genomes
in a virion increases with the separation between the mutations and reaches an
asymptotic maximum at a separation of ∼1,500 base pairs [14]. Further, Rhodes et
al. found that the cell type employed for
infection—CD4^{+} T cells or macrophages, for
instance—does not influence the recombination rate. In contrast, Levy et
al. argue that subtle virus–cell interactions cause recombination to occur
at different rates in different types of cells [7]. Further, using replication-competent
reporter viruses, Levy et al. investigated the dynamics of the emergence of
recombinant genomes in vitro and in SCID-hu mice [7]. Interestingly, Levy et al. observed
two scaling patterns. First, the percentage of cells coinfected with both the
reporter genomes employed for infection was proportional to the total percentage of
infected cells. Second, the percentage of cells infected with recombinant proviruses
was linearly proportional to the percentage of coninfected cells and,
correspondingly, proportional to the square of the total percentage of infected
cells. Further, Levy et al. found that the scaling patterns were independent of the
initial viral load, the time following the onset of infection, and whether the
experiments were conducted in vitro or in SCID-hu mice.

Standard models of viral dynamics, which successfully describe short-term (a few weeks) viral load changes in patients undergoing therapy, are predicated on the infection of individual cells by single virions and ignore recombination [15–17]. Recent modeling advances and simulations incorporate descriptions of multiple infections of cells and recombination, and present insights into the subtle interplay between mutation, recombination, fitness selection, and random genetic drift that underlies the genomic diversification of HIV in vivo [18–23]. Bretscher et al. developed a model of HIV dynamics that includes mutation, double infections of cells, and recombination and found that for infinitely large cell populations, the influence of recombination on the development of drug resistance depends sensitively on epistasis, i.e., on the nature of fitness interactions between mutations [20]. Bretscher et al. argue that phenotypic mixing—the assortment of viral proteins arising from different proviral genomes within a multiply infected cell during the assembly of progeny virions—compromises the selective advantage of the fittest strains and enhances the relative abundance of less fit strains: less fit strains piggyback on fitter strains. At the same time, in a two-locus/two-allele model, recombination, which breaks nonrandom associations of mutations and hence lowers linkage disequilibrium, enhances the relative abundance of single mutant strains compared to wild-type and of double mutant strains when fitness interactions result in positive epistasis, i.e., when mutations interact antagonistically in lowering viral fitness. Bretscher et al. predict, contrary to the prevalent paradigm, that phenotypic mixing and positive epistasis together result in a deceleration of the growth of drug-resistant viruses upon increasing the recombination rate [20]. Recent experimental evidence points to a mean positive epistasis underlying fitness interactions in HIV-1, which, following the predictions of Bretscher et al. [20], raises questions about the benefits of recombination to HIV-1 and, more generally, of the evolutionary origins of recombination and sexual reproduction [24].

Fraser presents a detailed model of HIV dynamics considering up to three infections of cells, mutation, recombination, fitness selection, and different dependencies of the frequency of multiple infections of cells on the viral load [22]. In agreement with Bretscher et al. [20], Fraser found that recombination inhibits the development of drug resistance during antiretroviral therapy and, further, that this effect is modulated not only by epistasis but also by the dependence of the frequency of multiple infections on viral load [22].

In a more recent study, Althaus and Bonhoeffer extend the description of Bretscher et
al. [20] to
finite population sizes, where the relative abundance of different mutant strains
may be determined stochastically rather than deterministically [18]. Interestingly,
Althaus and Bonhoeffer found that even when positive epistasis governs fitness
interactions between resistance mutations, recombination may significantly
accelerate the development of drug resistance when the effective population size is
∼10^{4}–10^{5} [18]. Using bit-string simulations,
Bocharov et al. found that multiple infections of cells and recombination act in
synergy to enhance viral genomic diversity [19], which in turn may increase the
likelihood of the emergence of drug resistance. Bocharov et al. note, however, that
the time for the selection of fitter genomes that contain multiple mutations may be
highly variable, indicative of the stochastic nature of viral evolution in vivo
[19].
Rouzine and Coffin developed a description of viral evolution with fitness
selection, recombination between multiple loci, and random genetic drift, and
predict that below a critical population size, the viral population within an
individual may converge to a clone in the absence of mutation, leaving little scope
for recombination to introduce genomic diversification [23]. Rouzine and Coffin
suggest therefore that a reduction of the viral population in an infected individual
by combination antiretroviral therapy may decelerate significantly the emergence of
drug resistance [23]. The effective population size in vivo remains to be
established [25].

Currently available models thus make valuable predictions of the influence of recombination on HIV dynamics and the emergence of drug resistance in infected individuals undergoing therapy. The predictions, however, are diverse and have not been compared with available experimental data [3–5,7,13,14]. An important gap thus exists in our understanding of HIV recombination. Consequently, for instance, the origins of the experimental scaling and dynamical patterns associated with HIV recombination [7] remain poorly understood. Similarly, the recombination rate, or the frequency of template switching events during reverse transcription, remains to be established [7,14].

One limitation of currently available models lies in the approximate descriptions of the dynamics of multiple infections of cells employed. For example, the frequency with which cells are doubly infected is assumed either to be constant [18,20], proportional to the viral load, or proportional to the square of the viral load [22]. Because multiple infections of cells are a prerequisite for the formation of heterozygous virions, an accurate description of HIV dynamics with recombination depends critically on the underlying description of multiple infections, as suggested also by Fraser [22]. The frequency of multiple infections depends not only on the viral load, but also on CD4 down-modulation induced by viral gene expression following the first infection of a cell [26–28], which lowers the susceptibility of the cell to further infections. In a recent study, Dixit and Perelson developed a model that explicitly accounts for CD4 down-modulation and presents a rigorous description of the orchestration of multiple infections of cells by free virions [29]. The model elucidates the origins of one of the two scaling relationships observed by Levy et al. [7]: that the number of doubly infected cells is proportional to the square of the total number of infected cells. Levy et al. note that at the onset of infection in their experiments, because equal numbers of the two kinds of reporter viruses are employed, the probability that a cell is infected with both the reporter genomes is the product of the probabilities that the cell is infected independently with each of the genomes, which explains the origin of the observed scaling early during infection. Later in the infection process, multiple infections of a cell need not be simultaneous and may be sequential. Yet, the quadratic scaling persists. Levy et al. speculate that the absence of any functional hindrance to multiple infections may underlie the persistence of the scaling throughout the infection period. The inhibition of multiple infections by CD4 down-modulation, however, may not be negligible. Dixit and Perelson consider the expected inhibition of multiple infections by CD4 down-modulation and identify conditions under which the observed scaling relationship may hold [29].

The latter model, however, does not distinguish between different viral genomes that infect cells and thereby precludes a description of recombination. Thus, for instance, the dynamics of the emergence of recombinant genomes and the origins of the second scaling relationship observed by Levy et al.—that the percentage of cells infected by recombinant genomes is linearly proportional to the percentage of cells coinfected with both the reporter genomes employed for infection—remain to be elucidated.

The analysis of Dixit and Perelson [29] provides insights into the origins of the latter scaling. According to the analysis, one set of conditions under which the quadratic scaling between the population of coinfected cells and that of all infected cells holds occurs at time periods that are long compared to the characteristic viral production and clearance times, so that viral populations are in pseudo equilibrium with the infected cell populations [29]. Under the same conditions, when two distinct reporter genomes are employed for infection, the population of heterozygous virions containing a copy each of the two reporter genomes is expected be in pseudo equilibrium with, and hence proportional to, the population of cells coinfected with both the reporter genomes. Further, for time periods long compared to the lifetimes of infected cells and the characteristic timescale of the infection of uninfected cells, the production and death rates of infected cells are expected to exhibit a pseudo steady state. The number of cells infected with recombinant proviruses would then be proportional to the population of heterozygous virions and hence to the population of coinfected cells, which may explain the origins of the second scaling relationship observed by Levy et al. [7]. (We present more mathematical arguments below.)

Whether currently available models of HIV dynamics that include infections by distinct viral genomes and recombination validate the above arguments and predict the observed scaling remains unclear. Dixit and Perelson predict the existence of the above scaling relationships under certain parameter regimes and, importantly, that the scaling relationships may depend on the length of time following the onset of infection [29]. In contrast, Levy et al. found that the scaling is independent of the time following the onset of infection [7]. Further, Levy et al. found that for the different initial viral loads employed, which varied well over two orders of magnitude, the parametric plots of different cell populations defining the scaling relationships superimpose remarkably tightly [7], whereas a similar superimposition at all times following the onset of infection is not apparent from the above scaling arguments, which hold after the establishment of pseudo equilibrium between viral production and clearance. A comprehensive model of HIV dynamics with recombination that quantitatively captures available experimental data is currently lacking.

In this work, we develop a detailed model of HIV dynamics that considers multiple infections of cells by distinct viral genomes and describes recombination. Our model captures several recent in vitro experimental findings quantitatively and provides key insights into the mechanisms underlying the emergence of recombinant forms of HIV. At the same time, our model is consistent with the standard model of viral dynamics, which successfully captures viral load changes in patients following the onset of antiretroviral therapy, and may therefore be extended to describe HIV dynamics with recombination in vivo.

## Results

### Model Formulation

We consider in vitro experiments where a population of uninfected
CD4^{+} cells, *T*, is exposed
simultaneously to two populations, *V*_{11} and
*V*_{22}, of homozygous virions containing genomes 1
and 2, respectively. The genomes 1 and 2 are assumed to be distinct at two
nucleotide positions, *l*_{1} and
*l*_{2}, a distance *l* apart on the
viral genome, with genome 1 having a mutation at position
*l*_{1} and genome 2 at
*l*_{2} (Figure 1A). Following exposure, target cells become infected singly
or multiply with one of the genomes, or coinfected with both genomes. Coinfected
cells produce heterozygous virions, *V*_{12}, which
contain a copy each of genomes 1 and 2.

Viral genomes 1 and 2 employed at the onset of infection (A) and the four genomes resulting from the recombination of genomes 1 and 2 (B).

Infection of target cells by the virions *V*_{12} yields
two kinds of “recombinant” genomes depending on the template
switching events during reverse transcription (Figure 1B). When the mutations at positions
*l*_{1} and *l*_{2} are both
included in the resulting proviral DNA, a recombinant genome that we denote as
genome 3 is formed. When both the mutations are excluded, the other recombinant,
genome 4, results. When one of the two mutations is included but not the other,
genomes 1 and 2 are recovered. Thus, four kinds of viral genomes, 1, 2, 3, and
4, eventually infect cells.

We distinguish infected cells by the proviral genomes they contain. We denote by
*T _{i}* cells containing a single provirus

*i*, where

*i*∈ {1, 2, 3, 4} represents the four genomes above. Thus, cells

*T*

_{1}contain a single provirus 1, cells

*T*

_{2}contain a single provirus 2, and so on. We denote by

*T*, where

_{ij}*i*and

*j*∈ {1, 2, 3, 4}, cells that contain two proviruses. Thus, cells

*T*

_{11}contain two copies of provirus 1, and cells

*T*

_{12}contain a copy of provirus 1 and a copy of provirus 2. Because cells

*T*are indistinguishable from cells

_{ij}*T*, we subject

_{ji}*i*and

*j*to the constraint

*i*≤

*j*, resulting in ten kinds of doubly infected cells:

*T*

_{11},

*T*

_{12},

*T*

_{13},

*T*

_{14},

*T*

_{22},

*T*

_{23},

*T*

_{24},

*T*

_{33},

*T*

_{34}, and

*T*

_{44}. Extending the description, cells

*T*are infected with three proviruses, and so on. Our aim is to describe the dynamics of recombination observed in experiments that employ two kinds of reporter viruses to infect cells. In these experiments, the number of cells infected with more than two genomes is estimated to be small [7]. Therefore, and for simplicity, we restrict our model to single and double infections of cells.

_{ijk}Random assortment of viral RNA produced in infected cells gives rise to ten
different viral populations, which we denote *V _{ij}*,
where

*i*≤

*j*and

*i*and

*j*∈ {1, 2, 3, 4}, based on the viral genomes,

*i*and

*j*, contained in the virions. Thus, for instance, virions

*V*

_{34}contain a copy each of genomes 3 and 4. Cells infected with a single kind of provirus,

*T*and

_{i}*T*, give rise to homozygous virions,

_{ii}*V*. Cells coinfected with distinct proviruses,

_{ii}*T*, produce the homozygous virions

_{ij}*V*and

_{ii}*V*and the heterozygous virions

_{jj}*V*.

_{ij}Below, we write equations to describe changes in the various cell and viral populations following the onset of infection.

### Dynamical Equations

#### Uninfected cells.

The in vitro dynamics of the uninfected cell population is governed by
[29] where *λ* and
*μ* are the proliferation and death rates of
uninfected cells in vitro, *k*_{0} is the
second-order infection rate of uninfected cells, and
is the total viral load. Equation 1 is constrained by the initial
condition that the uninfected cell population at the onset of infection
(*t* = 0) is *T*_{0}.

#### Infected cells.

The singly infected cell subpopulations are determined by the integral
equations: Here,
*k*_{0}*T*(*t −
s*)*V _{jh}*(

*t − s*)

*ds*is the number of uninfected cells that are first infected by virions

*V*in an infinitesimal interval of time

_{jh}*ds*near time

*t − s*≥ 0, where

*t*= 0 marks the onset of infection. We define

*R*(

_{i}*jh*) as the probability that provirus

*i*results from the recombination of genomes

*j*and

*h*, where

*i*,

*j*, and

*h*∈ {1, 2, 3, 4} and

*j*≤

*h*. Thus,

*k*

_{0}

*T*(

*t − s*)

*R*(

_{i}*jh*)

*V*(

_{jh}*t − s*)

*ds*is the expected number of uninfected cells first infected by virions

*V*in the interval

_{jh}*ds*near time

*t − s*and in which recombination results in provirus

*i*. We assume that reverse transcription occurs rapidly following infection. Summation of

*k*

_{0}

*T*(

*t − s*)

*R*(

_{i}*jh*)

*V*(

_{jh}*t − s*)

*ds*over

*j*and

*h*therefore yields the total number of uninfected cells that are first infected with a single provirus

*i*in the interval

*ds*near

*t − s*. The probability that these latter cells survive until time

*t*is exp(−

*δs*), where

*δ*is the death rate of infected cells. We define

*M*(

*i*,

*t*|

*i*,

*t − s*) as the probability that a cell that is first infected with provirus

*i*at time

*t − s*remains singly infected with the provirus

*i*at time

*t*given that the cell survives the intervening interval of duration

*s*. The integrand in Equation 2a thus represents the number of uninfected cells that are first infected in the infinitesimal interval

*ds*near time

*t − s*and survive with a single provirus

*i*at time

*t*. Integration over

*s*from 0 to

*t*gives the total number of cells containing a single provirus

*i*at time

*t*.

The doubly infected cell subpopulations are determined in an analogous
manner: where
*M*(*ii*,*t* |
*i*,*t − s*) is the probability
that a cell that is first infected with provirus *i* at time
*t − s* contains an additional provirus
*i* at time *t* given that the cell
survives the intervening interval of duration *s*.

For cells coinfected with two different kinds of proviruses, we write
where *i* ≠ *j* and
*M*(*ij*,*t* |
*i*,*t − s*) is the probability
that a cell that is first infected with provirus *i* at time
*t − s* contains an additional provirus
*j* at time *t* given that the cell
survives the intervening interval of duration *s*. The two
integrals in Equation
2c correspond to the two ways of acquiring the two proviruses:
*i* followed by *j* and *j*
followed by *i*, respectively.

#### Multiple infections.

To evaluate the conditional probabilities *M*, which
characterize multiple infections, we consider a cell first infected with
provirus *i* at time *t − s*. For
times *τ* > *t − s*,
the rate of infection of the cell reduces exponentially because of CD4
down-modulation [28], so that [29] where *k*_{0} is the infection rate of
an uninfected cell and *t*_{d} is the timescale of
CD4 down-modulation. Three viral genes, *nef*,
*env*, and *vpu*, acting via independent
pathways, together induce nearly 100% down-modulation of CD4
molecules from the surface of an infected cell [26]. Of the
three genes, the predominant influence is by *nef*
[26], which induces rapid down-modulation of CD4 receptors
following infection [28]. The latter down-modulation profile is
well-described by an exponential decline (of timescale
*t*_{d}) and is extended to include the influence
of *env* and *vpu* [29]. How the
susceptibility of a cell to new infections declines with CD4 down-modulation
remains unknown. Here, we follow Dixit and Perelson [29] and assume
that the infection rate *k* is directly proportional to the
CD4 expression level and hence declines exponentially with time following
the first infection.

Assuming that the cell, following its first infection with provirus
*i* at time *t − s*, does not die,
the probability that it contains the provirus *i* alone at
time *τ* is by definition
*M*(*i*,*τ* |
*i*,*t − s*). In a subsequent
infinitesimal interval Δ*τ*, in which at
most one infection may occur, the probability that the cell is not infected
is (1 –
*kV*(*τ*)Δ*τ*),
where *k* is given by Equation 3 and
is the total viral load. The probability that the cell remains singly
infected with provirus *i* at time *τ*
+ Δ*τ* is therefore Subtracting
*M*(*i*,*τ* |
*i*,*t − s*), dividing by
Δ*τ*, and letting
Δ*τ* → 0, we obtain
with the initial condition that
*M*(*i*,*t* −
*s* | *i*,*t* −
*s*) = 1.

Alternatively, a cell first infected with provirus *i* at
time *t − s* may contain two proviruses,
*i* and *j*, at time *τ
+* Δ*τ* if it contains
the provirus *i* alone at time *τ* and
acquires an additional provirus *j* in the interval
Δ*τ*, or if it contains both the
proviruses *i* and *j* at time
*τ*. (We ignore more than two infections of
cells.) In the interval Δ*τ*, the
probability that the cell acquires a second provirus *j* is
(see above), so that Subtracting
*M*(*ij*,*τ* |
*i*,*t − s*), dividing by
Δ*τ*, and letting
Δ*τ* → 0, we obtain
with the initial condition
*M*(*ij*,*t* −
*s* | *i*,*t* −
*s*) = 0.

Substituting *j* by *i* in Equation 4b
yields the corresponding evolution equation for
*M*(*ii*,*τ* |
*i*,*t* − *s*)
with the initial condition
*M*(*ii*,*t* −
*s* | *i*,*t* −
*s*) = 0.

#### Recombination.

We next determine the probability
*R _{i}*(

*jh*) that provirus

*i*results from the recombination of genomes

*j*and

*h*. For homozygous virions,

*V*, reverse transcription yields the genome

_{ii}*i*alone so that (Note that we ignore mutations here.) For heterozygous virions, we consider first those combinations where the two genomes

*j*and

*h*differ in a single position, which happens when

*j*is either 1 or 2 and

*h*is either 3 or 4 (Figure 1B). Because the difference is in a single position, reverse transcription yields either of the two genomes with equal probability. Thus,

Finally, when *j =* 1 and *h
=* 2 and when *j =* 3 and
*h =* 4, the two genomes differ in two
positions; we consider these combinations explicitly. Let *j
=* 1 and *h =* 2. Recall
that genome 1 has a mutation at position *l*_{1} and
genome 2 at *l*_{2} and that
*l*_{2} – *l*_{1}
= *l* (Figure 1A). Recombination between genomes
1 and 2 yields genome 1 if the mutation on genome 1 is included in the
resulting provirus and that on genome 2 is excluded (Figure 1B). Because reverse transcription
is equally likely to begin on either genome, the probability that reverse
transcriptase is on genome 1 at the position *l*_{1},
i.e., the probability that the mutation on genome 1 is included in the
resulting provirus, is 1/2. Given that the mutation on 1 is included, the
mutation on 2 will be excluded if an even number of crossovers occurs
between *l*_{1} and *l*_{2}.

Let *n* be the average number of crossovers during reverse
transcription of the viral genome of length *L*. We define
the crossover frequency, or the recombination rate, as
*ρ* =
*n*/*L* crossovers per position. Assuming that
crossovers occur independently, the probability
*P*(*x*) that *x* crossovers
occur in a length *l* of the genome follows the Poisson
distribution [14,30] where *x*! =
*x*(*x −* 1)...2.1. The
probability that an even number of crossovers occurs in the length
*l* is therefore the sum Thus, the probability that genome 1 results from
recombination of genomes 1 and 2 is

Similarly, genome 2 results from genomes 1 and 2 if the mutation on genome 1
is excluded and an even number of crossovers occurs between
*l*_{1} and *l*_{2} so
that the mutation on 2 is included. It follows that
*R*_{2}(12) =
*R*_{1}(12).

Genome 3 results from genomes 1 and 2 if the mutation on genome 1 is included
and an odd number of crossovers occurs between
*l*_{1} and *l*_{2} so that
the mutation on genome 2 is also included. Following the above arguments, we
find that and that *R*_{4}(12) =
*R*_{3}(12).

#### Virions.

Finally, we write equations for the time evolution of the various viral
populations: and where we recognize that cells *T _{i}*
and

*T*produce homozygous virions

_{ii}*V*, and cells

_{ii}*T*produce homozygous virions

_{ij}*V*and

_{ii}*V*and heterozygous virions

_{jj}*V*in the proportions 1/4, 1/4, and 1/2, respectively.

_{ij}*N*is the viral burst size and

*δ*is the death rate of infected cells, both assumed to be independent of the multiplicity of infection [29]. Equations 6a and 6b are constrained by the initial condition that the viral population at the onset of infection is composed of equal subpopulations of the homozygous virions

*V*

_{11}and

*V*

_{22}alone, i.e.,

*V*

_{11}=

*V*

_{22}=

*V*

_{0}at

*t =*0.

Equations 1–6 present a model of HIV dynamics with multiple infections of cells and recombination.

### Model Predictions

We solve Equations
1–6 (Methods) using the following
parameter estimates drawn from in vitro studies [29,31]: the birth and death rate of
target cells, *λ* = 0.624
d^{−1} and *μ* = 0.018
d^{−1}; the death rate of infected cells,
*δ* = 1.44 d^{−1}; the
viral burst size, *N* = 1,000; and the clearance rate
of free virions, *c* = 0.35 d^{−1}.
We let an initial target cell population, *T*_{0}
= 10^{6}, be exposed to two equal viral populations,
*V*_{11} = *V*_{22}
= *V*_{0}, which we vary over the experimental
range, 2*V*_{0} = 10^{6} to
10^{10} [7]. The infection rate constant,
*k*_{0}, the timescale of CD4 down-modulation,
*t*_{d}, and the recombination rate,
*ρ*, are not well-established, and we vary these
parameters over ranges that define their best current estimates. We choose
*l*, the separation between the mutations on genomes 1 and 2,
in accordance with experiments (see below).

#### Virus and cell dynamics.

In Figure 2, we present
the time evolution of uninfected cells, *T*, the total
infected cell population,
,
and the total viral load,
,
for the parameter values 2*V*_{0} =
10^{8}, *k*_{0} = 2
× 10^{−10} d^{−1},
*t*_{d} = 0.28 d, ρ
= 8.3 × 10^{−4} crossovers per
position (see below), and *l* = 408 base pairs. We
find that *T*, *T ^{*}*, and

*V*evolve in two dominant phases, an initial rise and a subsequent fall. The initial rise in

*T*is due to the net proliferation of uninfected cells at the rate (

*λ−μ*)

*T*(Equation 1), which in the initial stages of infection is large compared to the loss of uninfected cells by infection at the rate

*k*

_{0}

*VT*. The latter infection process causes

*T*to rise. The rise in

^{*}*T*and hence viral production results in an increase in

^{*}*V*. When

*V*becomes large, the loss of uninfected cells by infection dominates cell proliferation and induces a decline in

*T*. In Figure 2,

*T*reaches a maximum at time

*t*≈ 6 d after the onset of infection. The decline in

*T*lowers the formation of infected cells and

*T*decreases at the death rate

^{*}*δ*. Finally, the loss of

*T*lowers viral production and induces a decline in

^{*}*V*at the clearance rate

*c*. This overall two-phase dynamics is similar to the T cell dynamics observed in vitro [7].

The time evolution of the number of uninfected cells,
*T*, the total number of infected cells,
*T ^{*}*, and the total viral load,

*V*, following the onset of infection obtained by the solution of Equations 1–6 with the following parameter values: the initial target cell number,

*T*

_{0}= 10

^{6}; the initial viral load, 2

*V*

_{0}= 10

^{8}; the birth and death rates of uninfected cells,

*λ*= 0.624 d

^{−1}and

*μ*= 0.018 d

^{−1}; the death rate of infected cells,

*δ*= 1.44 d

^{−1}; the viral burst size,

*N*= 1,000; the clearance rate of free virions,

*c*= 0.35 d

^{−1}; the infection rate constant of uninfected cells,

*k*

_{0}= 2 × 10

^{−10}d

^{−1}; the CD4 down-modulation timescale,

*t*

_{d}= 0.28 d; the recombination rate,

*ρ*= 8.3 × 10

^{−4}crossovers per position; and the separation between the mutations on genomes 1 and 2,

*l*= 408 base pairs.

In Figure 3A, we present
the distribution of the infected cells, *T ^{*}*, in
Figure 2 into
various singly and doubly infected cell subpopulations. We find that the
various infected cell subpopulations also follow the two-phase dynamics
above. The relative prevalence of the latter subpopulations is coupled to
that of the corresponding viral subpopulations, which we present in Figure 3B. Because virions

*V*

_{11}and

*V*

_{22}alone are employed at the onset of infection, their numbers are larger than those of other viral subpopulations. When target cells are abundant, CD4 down-modulation ensures that singly infected cells occur more frequently than doubly infected cells. Thus, during the first phase of the dynamics following the onset of infection, cells singly infected with the infecting genomes, i.e.,

*T*

_{1}and

*T*

_{2}, are the most prevalent. Note that because

*V*

_{11}

*= V*

_{22}=

*V*

_{0}at the time of infection, at all subsequent times

*T*

_{1}=

*T*

_{2}. Next in prevalence are cells infected twice with genome 1 and/or 2. Because coinfection by genomes 1 and 2 is twice as likely as double infection by either 1 or 2, cells

*T*

_{12}are more prevalent than

*T*

_{11}(=

*T*

_{22}). The population of heterozygous virions,

*V*

_{12}, increases because of viral production from the coinfected cells

*T*

_{12}.

The time evolution of the various singly (solid lines) and doubly
(dashed lines) infected cell (left panels) and homozygous (solid
lines) and heterozygous (dashed lines) viral subpopulations (right
panels) following the onset of infection. Note that
*T*_{1} =
*T*_{2}, *T*_{11}
= *T*_{22},
*T*_{3} =
*T*_{4}, *T*_{33}
= *T*_{44},
*T*_{13} =
*T*_{23} =
*T*_{14} =
*T*_{24}, *V*_{11}
= *V*_{22},
*V*_{33}
*= V*_{44}, and
*V*_{13}
*= V*_{23}
*= V*_{14}
*= V*_{24}. The parameter values
employed are the same as those in Figure 2 except that
*t*_{d} = 2.8 d in (C) and (D)
and *ρ* =
10^{−3} crossovers per position in (E) and
(F).

Infections by *V*_{12} give rise to cells
*T*_{3} and *T*_{4},
infected singly with the recombinant genomes, which in turn produce virions
*V*_{33} and *V*_{44},
respectively. Coinfection by genomes 1 and 3 yields cells
*T*_{13}, whose numbers are larger than those of
the doubly infected cells *T*_{33} (=
*T*_{44}) because of the small population of
*V*_{33} compared to
*V*_{11}. Again, because coinfection by genomes 3 and
4 is twice as likely as double infection by either 3 or 4, cells
*T*_{34} are larger in number than
*T*_{33}. Yet, homozygous virions
*V*_{33} are more prevalent than heterozygous
virions *V*_{34} because cells
*T*_{3}, *T*_{33}, and
*T*_{34} produce *V*_{33},
whereas cells *T*_{34} alone produce
*V*_{34}.

In the second dynamical phase, infected cell subpopulations decline because
of cell death at rate *δ*. Singly infected cell
subpopulations decline additionally because of second infections. Viral
populations decline at the clearance rate *c*. The overall
two-phase dynamics and the relative prevalence of various infected cell
subpopulations are again in agreement with in vitro experiments
[7].

Changes in the initial viral load, 2*V*_{0}, or the
infection rate, *k*_{0}, do not alter the dynamics
above qualitatively (unpublished data) [7,29]. Importantly, the CD4
down-modulation timescale, *t*_{d}, does not
influence the overall dynamics in Figure 2 (unpublished data). We assume
here that viral production from cells is independent of the number of
infections cells suffer, which is expected when viral production is limited
by cellular rather than viral factors. Changes in
*t*_{d} then alter the distribution of infected cells
into various multiply infected cell subpopulations but do not alter the
total population of infected cells, *T ^{*}*, or,
consequently, the overall viral dynamics [29]. In Figure 3C and 3D, we present the calculations in Figure 3A and 3B with

*t*

_{d}= 2.8 d. A higher value of

*t*

_{d}implies slower CD4 down-modulation, which renders infected cells susceptible to further infections for longer durations and hence increases the relative prevalence of multiply infected cells. Accordingly, we find that doubly infected and coinfected cell subpopulations are higher in Figure 3C than in Figure 3A. (The faster decline of singly infected cells in the second phase in Figure 3C compared to that in Figure 3A is due to increased second infections in the former.) Correspondingly, the relative prevalence of heterozygous and recombinant virions increases upon increasing

*t*

_{d}(Figure 3B and 3D).

Interestingly, the recombination rate, *ρ*, also does
not influence the dynamics in Figure 2 (unpublished data). We assume here that viral fitness
is not affected by the mutations at the positions
*l*_{1} and *l*_{2}.
Consequently, an increase in *ρ* increases the
relative prevalence of recombinant genomes in the viral population but not
the total viral load or the frequency of multiple infections. In Figure 3E and 3F, we present the
calculations in Figure
3A and 3B with
*ρ* = 10^{−3}
crossovers per position. Note that the numbers of cells singly and doubly
infected with genomes 1 and/or 2 are identical to those in Figure 3A, indicating that
the frequency of multiple infections remains unaltered by the increase in
*ρ*. The relative prevalence of cells infected
with genomes 3 and 4, however, and that of the recombinant virions
*V*_{33} (=
*V*_{44}) is higher in Figure 3E and 3F than in Figure 3A and 3B, respectively, because of the enhanced
frequency of recombination in the former.

#### Scaling.

We examine next whether the above dynamics captures the scaling relationships
between the different infected cell subpopulations observed experimentally
[7]. In Figure
4A, we present parametric plots of the percentage of cells
coinfected with genomes 1 and 2, *p*_{12}
=
100*T*_{12}*/*(*T ^{*}*
+

*T*), versus the total percentage of infected cells,

*p*= 100

^{*}*T*(

^{*}/*T*+

^{*}*T*), for different initial viral loads and with the parameter values employed in Figure 2. Remarkably, we find that for all the initial viral loads considered,

*p*

_{12}is proportional to (

*p*)

^{*}^{2}. The scaling behavior is observed over the entire period of infection (

*t*= 10 d) including both the phases of the overall dynamics of Figure 2. Further, the parametric plots of

*p*

_{12}versus (

*p*)

^{*}^{2}for different viral loads are superimposed, in agreement with the robust scaling observed in experiments [7].

Parametric plots of (A) the percentage of cells coinfected with
genomes 1 and 2, *p*_{12}, versus the total
percentage of infected cells, *p ^{*}*, and
(B) the percentage of cells infected with the recombinant 4,

*p*

_{4}, versus

*p*

_{12}, obtained by solving Equations 1–6 for different initial viral loads, 2

*V*

_{0}= 10

^{6}(green), 10

^{7}(cyan), 10

^{8}(blue), 10

^{9}(purple), and 10

^{10}(red). The dashed lines are scaling patterns predicted by Equation 7 . The insets show the parametric plots for the individual cases, 2

*V*

_{0}= 10

^{6}(green) and 10

^{7}(cyan).

In Figure 4B, we present
the corresponding variation of the percentage of cells infected with the
recombinant 4,
,
with the percentage of coinfected cells, *p*_{12}.
Interestingly, we find two scaling regimes. When
*p*_{12} is small, *p*_{4} is
proportional to (*p*_{12})^{2}, and the
parametric plots are distinct for different values of
*V*_{0}. For larger values of
*p*_{12}, *p*_{4} is
linearly proportional to *p*_{12} and independent of
*V*_{0}. Thus, the parametric plots in the latter
regime are again superimposed, as observed in experiments [7].

We explain the origins of the above scaling regimes by considering two
limiting scenarios in our model (Methods). First, for times small compared to the CD4 down-modulation
timescale, i.e., *t ≪ t*_{d}, and when changes
in viral and cell numbers are small, we find that Second, for times large compared to the time required for
viral load evolution to reach pseudo steady state, i.e., *t ≫
t*_{eq}, we obtain where we define *k*_{1} as the mean
rate of the second infection of singly infected cells. As we show in Figure 4, Equation 7a and
7b capture
the scaling regimes predicted by our simulations.

Remarkably, the scaling
(Equation
7a) is independent of model parameters and viral and cell numbers.
Further, the quadratic scaling between *p*_{12} and
*p ^{*}* continues to hold for

*t > t*

_{eq}(Equation 7b ), with the proportionality constant lower than 1/400 by a factor

*k*

_{1}

*/k*

_{0}. We notice thus that a transition from the small time (

*t ≪ t*

_{d}) scaling, , to the large time (

*t*≫

*t*

_{eq}) scaling, , occurs in the parametric plots in Figure 4A. The transition occurs at larger values of

*p*with increasing initial viral load. We find that a value of

^{*}*k*

_{1}= 1.4 × 10

^{−10}d

^{−1}captures the long-time scaling for all initial viral loads considered. On the other hand, the scaling between

*p*

_{4}and

*p*

_{12}for

*t*≪

*t*

_{d}, , depends on model parameters and the initial viral load (Figure 4B). Interestingly, however, the linear scaling between

*p*

_{4}and

*p*

_{12}, , for

*t > t*

_{eq}is independent of the initial viral load.

For very large viral loads (2*V*_{0} ≥
10^{10}) and/or infection rates (*k*_{0}
≥ 2 × 10^{−9} d^{−1};
unpublished data), we find that rapid infection and the consequent death of
infected cells preempts the establishment of pseudo steady state between
viral production and clearance in the first phase of infection, so that the
linear scaling relationship between *p*_{4} and
*p*_{12} is not observed (Figure 4B). Below, we compare model
predictions with experiments.

### Comparison with Experiments

Available in vitro experiments, where cells are simultaneously exposed to two distinct kinds of viral genomes, may be segregated into two categories. First, single-round infection experiments employ replication-incompetent (heterozygous) virions to infect cells, and measure the fraction of cells that contain recombinant proviral genomes [3,7,13,14]. Second, viral dynamics experiments employ replication-competent (homozygous) virions and determine the time evolution of populations of cells infected with recombinant genomes [7]. We employ our description of the recombination probability (Equation 5b ) to predict data from single-round infection experiments and our entire model (Equations 1–6) to describe the latter viral dynamics experiments.

#### Single round of infection.

We consider single-round infection experiments, where target cells are
exposed to a mixed viral population comprising homozygous virions,
*V*_{11} and *V*_{22}, and
heterozygous virions, *V*_{12}, in the proportions
1/4, 1/4, and 1/2, respectively. Small viral loads are employed so that
multiple infections of cells are rare. Following infection, cells in which
recombinant proviruses result are identified. Rhodes et al. [14] varied the
separation *l* between the distinguishing mutations on
genomes 1 and 2 (Figure
1A) and measured the fraction, *f*, of infected cells
that contained the recombinant genome 4, which carries neither of the
distinguishing mutations on genomes 1 and 2 (Figure 1B). Rhodes et al. report the
latter fraction as a percentage of the theoretical maximum fraction,
*f*_{max}, attained at arbitrarily large
separations and/or recombination rates (see below). We reproduce the
experimental data of Rhodes et al. in Figure 5A.

(A) The ratio of the percentage of cells infected with the
recombinant 4, *f*, and the theoretical maximum
percentage, *f*_{max}, as a function of the
separation, *l*, between the mutations on genomes 1
and 2 (see Figure
1) determined by Rhodes et al. [14]
(circles) and by Equation 8 (line) with
*ρ* = 8.3 ×
10^{−4} crossovers per position.

(B) The percentage of GFP^{+} cells as a function of
the crossover frequency, *n*, determined by Equation
9 (line), on which are mapped the experimental percentages
(circles) obtained by Levy et al. [7] with HeLa CD4,
Jurkat, and primary T cells (PBL). The inset shows the prediction of
Equation
9 over a larger range of values of
*n*.

We estimate the percentage of cells infected with genome 4 in the experiments
of Rhodes et al. as follows. We recognize that cells infected with
heterozygous virions *V*_{12} alone may possess the
recombinant provirus 4. With the above distribution of the viral
subpopulations, the probability that an infection is due to a heterozygous
virion is 1/2. Following infection by a heterozygous virion, the probability
that recombination yields genome 4 is *R*_{4}(12)
(Equation
5). Thus, the fraction, *f*, of infected cells that
contain genome 4 is expected to be (1/2)*R*_{4}(12)
=
(1/4)exp(−*ρl*)sinh(*ρl*).
This fraction attains a maximum value, *f*_{max}, of
1/8 (or 12.5%) as *ρl* →
∞. (When *ρl* → ∞, a
large number of crossovers occurs between *l*_{1} and
*l*_{2}; the mutations at
*l*_{1} and *l*_{2} are
then selected independently, each with a probability 1/2, so that
*R*_{4}(12) → 1/4.) Thus, according to
our model, *f*/*f*_{max} =
[(1/2)*R*_{4}(12)] / (1/8),
which upon combining with Equation 5 yields

We fit predictions of Equation 8 to the experimental data of
*f/f*_{max} versus *l* using
*ρ* as an adjustable parameter (Figure 5A). Our model
provides a good fit to the data, representing a successful test of our
description of the recombination probabilities
*R _{i}*(

*jh*) (Equation 5). The best-fit estimate of

*ρ*= 8.3 × 10

^{−4}crossovers per position indicates that

*n*≈ 8 crossovers occur on average (95% confidence interval: 6–10) in a genome of length

*L*= 9,700 nucleotides. This estimate of

*n*is in excellent agreement with a direct estimate from sequence analysis of ∼7.5 crossovers in a genome of 9,700 nucleotides [7]. We employ the best-fit estimate of

*ρ*= 8.3 × 10

^{−4}crossovers per position in our calculations above.

Levy et al. [7] also performed single-round infection experiments, where they exposed target cells simultaneously to homozygous reporter viruses containing either the cyan fluorescent protein (CFP) gene or the yellow fluorescent protein (YFP) gene, and heterozygous viruses with one strand containing the CFP gene and the other the YFP gene. The CFP and YFP genes were obtained by introducing specific mutations in the green fluorescent protein (GFP) gene. Thus, recombination events between the CFP and YFP genes that omit both the CFP and the YFP mutations yield the GFP gene. In addition, Levy et al. [30] observed that the CFP gene has certain distinguishing mutations between nucleotide positions ∼440 and ∼500. When recombination includes both the critical CFP and YFP mutations, and also the latter distinguishing mutations on the CFP gene, the resulting genome exhibits green fluorescence. When the latter mutations are not included, however, the resulting genomes remain undetected. Levy et al. determined the percentage of infected cells that exhibited green fluorescence as a measure of the recombination rate.

To compare the observations of Levy et al. with our model predictions, we let
genome 1 (Figure 1A)
represent the reporter virus with the CFP gene carrying the critical CFP
mutation at *l*_{1} = 201 and genome 2 the
virus with the YFP gene carrying the critical YFP mutation at
*l*_{2} = 609, so that
*l* = *l*_{2} −
*l*_{1} = 408 [7]. We redefine
genome 4 to encompass all genomes capable of green fluorescence. Thus,
genome 4 includes genomes with the GFP gene, which contains neither of the
mutations at *l*_{1} and
*l*_{2}, and also those genomes that contain both the
mutations at *l*_{1} and
*l*_{2} and the contents of genome 1 from positions
440 to 500. Accordingly, genome 3 includes those genomes that carry both the
mutations at *l*_{1} and
*l*_{2} but not all of the contents of genome 1 from
positions 440 to 500. With these definitions of genomes 3 and 4, we
recalculate the recombination probabilities of Equation 5d and find The first term on the right-hand side of Equation 9a is
the probability that recombination excludes both the mutations at
*l*_{1} and *l*_{2} and is
given by Equation
5d. The second term represents the contribution to
*R*_{4}(12) that arises from recombination events
that include both the mutations at *l*_{1} and
*l*_{2} and the contents of genome 1 from
positions 440 to 500. The latter contribution is determined as follows. The
probability that reverse transcription begins on genome 1 at position
*l*_{1} = 201 is 1/2. Given that the
mutation at *l*_{1} is included, reverse
transcriptase would be on genome 1 at position 440 if an even number of
crossovers occurred between positions *l*_{1} and
440, which happens with the probability
exp(−ρ*l*_{a})cosh(ρ*l*_{a}),
where *l*_{a} = 440 –
*l*_{1}. For the contents of genome 1 between
positions 440 and 500 to be included in the resulting provirus, no
crossovers must occur between positions 440 and 500, the probability of
which is exp(−ρ60). Finally, the mutation at
*l*_{2} = 609 on genome 2 is included
if an odd number of crossovers occurs between positions 500 and
*l*_{2}, which happens with the probability
exp(−ρ*l*_{b})sinh(ρ*l*_{b}),
where *l*_{b} =
*l*_{2} – 500. Multiplying the latter
probabilities and recognizing that *l*_{a} +
*l*_{b} + 60 =
*l* yields the second contribution to
*R*_{4}(12) above. Similarly, we find that

We determine the fraction of infected cells that fluoresce following exposure
of cells to homozygous CFP and YFP virions and heterozygous CFP/YFP virions
in the proportions 1/4, 1/4, and 1/2, respectively, as follows. (Fluorescent
cells are detected in the experiments as infected.) When single infections
of cells predominate, half of the infections are due to homozygous virions,
which cause cells to fluoresce regardless of recombination. The other half
of the infections, which are due to heterozygous virions, induce
fluorescence when recombination yields genome 1, 2, or 4. Levy et al. ignore
GFP^{+} cells in their estimate of the total fraction
of infected cells [30]. The latter fraction is thus 1/2 +
(1/2)(*R*_{1}(12) +
*R*_{2}(12)). The experimentally determined
fraction, *f*_{g}, of infected cells that are
GFP+ is therefore
(1/2)*R*_{4}(12)/[(1/2) +
(1/2)(*R*_{1}(12) + R2(12))],
which simplifies to where *R*_{3}(12) and
*R*_{4}(12) are determined using Equations 9a and
9b,
respectively.

Levy et al. [7] report the mean percentage of infected cells that are
GFP+ to be 8.0 with Jurkat T cells, 5.5 with HeLa CD4 cells, and
9.1 with primary CD4^{+} T cells. We compare these
percentages with our prediction of *f*_{g} and
estimate the recombination rate in the respective cell types (Figure 5B). We find that
the mean number of crossovers in a genome of 9,700 nucleotides is 7.1 in
Jurkat T cells, 4.6 in HeLa CD4 cells, and 8.3 in primary
CD4^{+} T cells. Direct sequence analysis from Jurkat T
cells showed a mean crossover frequency of 7.5 (range 3–13)
[7], in excellent agreement with the estimate obtained here
and from our analysis of the experiments of Rhodes et al. [14] above.
Whereas the mean crossover frequency in HeLa cells is lower, that in primary
CD4^{+} T cells is again in excellent agreement with
the estimate for Jurkat T cells and that from the data of Rhodes et al.
[14].

With macrophages, Levy et al. [7] found that
∼29% of infected cells are GFP^{+}. We
note that *f*_{g} defined in Equation 9c is a
non-monotonic function of *ρ*: increasing
*ρ* increases the probability of the accumulation
of both the critical YFP and CFP mutations at *l*_{1}
and *l*_{2} but lowers the probability that no
crossovers occur between the positions 440 and 500 on genome 1. As a result,
the second contribution to *R*_{4}(12) in Equation 9a
increases first and then decreases upon increasing
*ρ*. Thus, upon increasing
*ρ*, *f*_{g} increases
(*f*_{g} = 0 when
*ρ* = 0), reaches a maximum value of
∼21% at *ρ* = 0.007
crossovers per position (∼68 crossovers in 9,700 nucleotides), and
declines to an asymptotic value of ∼16.7% as
*ρ* → ∞ (Figure 5B, inset). The 29%
GFP^{+} cells observed with macrophages is thus higher
than the maximum value of *f*_{g} predicted by our
model. We note that a higher percentage of GFP^{+} cells
than the theoretical maximum of ∼21% may result if cells
are multiply infected, which we ignore in our description of single-round
infection experiments. Indeed, Levy et al. [30] observed that a large
percentage of macrophages are coinfected despite the low viral loads
employed. (Levy et al. reanalyzed their experiments [7] by accounting
for double and triple infections of cells and estimated
*ρ* [30]; the differences in their
estimates of *ρ* and our estimates above may be
attributed to the occurrence of multiple infections, which we ignore.) In
contrast, Chen et al. and Rhodes et al. found no significant distinction
between different cell types in their experiments [13,14]. Whether nonrandom infection
processes [4,5] favored enhanced multiple infections of macrophages in
the experiments of Levy et al. [7] remains to be
ascertained.

#### Dynamics and scaling.

We next compare our predictions with the dynamical and scaling patterns that
Levy et al. [7] observed in their experiments with replication-competent
viruses. Levy et al. employed equal populations of homozygous CFP and YFP
reporter viruses to infect ∼10^{6} CD4^{+}
T cells and detected the total percentage of cells infected (i.e., that
fluoresced), *p ^{*}*, the percentage of cells that
were coinfected with CFP and YFP genomes,

*p*

_{12}, and the percentage of cells that were GFP

^{+},

*p*

_{4}, with time following the onset of infection. The quantities evolved in two distinct phases—an initial rise and a subsequent fall. Our model captures the two-phase dynamics qualitatively, as we demonstrate in Figure 4 (see “Model Predictions” above), and elucidates the origins of the two phases and of the observed relative prevalence of different infected cell subpopulations. Quantitative comparisons with the dynamical data are precluded by the possible presence in the experimental cultures of cells not susceptible to infection, which we discuss below. We focus here on the corresponding scaling relationships observed by Levy et al. [7]. In Figure 6A, we reproduce the experimental scaling relationship observed between

*p*

_{12}and

*p*, and in Figure 6B, the relationship between

^{*}*p*

_{4}and

*p*

_{12}.

Model predictions (thick lines) obtained by solving Equations
1–6, but withEquation
5d replaced by Equations 9a and 9b,
compared with experimental scaling relationships (symbols) between
(A) the percentage of coinfected cells
(YFP^{+}/CFP^{+}) and the
total percentage of infected cells, and (B) the percentage of
GFP^{+} cells and the percentage of coinfected
cells. The different symbols represent experiments conducted with
cells from different donors [7]. Parameters employed
for calculations are identical to those in Figure 2 except that for the red
lines *t*_{d} = 2.8 d in (A) and
*ρ* =
10^{−3} crossovers per position in (B). The thin
black line in (A) is the experimental best-fit line [7].

In Figure 6, we also
present model predictions of *p*_{12} versus
*p ^{*}* and

*p*

_{4}versus

*p*

_{12}for the initial viral load 2

*V*

_{0}= 10

^{8}and with the parameters employed in Figure 4. In the calculations in Figure 6, however, we replace Equation 5d for

*R*

_{3}(12) and

*R*

_{4}(12) by Equations 9a and 9b and ignore cells infected with genomes 3 alone, i.e.,

*T*

_{3}and

*T*

_{33}, in our count of the total number of infected cells,

*T*, because the latter cells do not fluoresce and remain undetected in the experiments [7]. We recognize that unlike in single-round infection experiments, the other recombination probabilities involving genome 4,

^{*}*R*(

_{i}*j*4) in Equation 5 must also be altered to differentiate between the two kinds of GFP

^{+}genomes (see above) present in the experiments. Based on the relative magnitudes of the two contributions to

*R*

_{4}(12) in Equation 9a (for

*ρ*= 8.3 × 10

^{−4}crossovers per position, the values of the two terms are ∼0.12 and ∼0.03, respectively), we expect, however, that a majority of the GFP

^{+}genomes are those that contain neither of the mutations on genomes 1 and 2, i.e., as shown in Figure 1B. We therefore employ, as an approximation, the remaining recombination probabilities as defined in Equation 5.

We find that our model captures the quadratic scaling,
*p*_{12} ∼
(*p ^{*}*)

^{2}, qualitatively. Our model predicts that for small values of

*p*, the scaling relationship holds (Equations 9a and 7a), and that it transitions to for larger values of

^{*}*p*(7b). Because

^{*}*k*

_{1}

*/k*

_{0}< 1, the parametric plot of

*p*

_{12}versus

*p*exhibits a parallel shift to lower values of

^{*}*p*

_{12}at large values of

*p*. Indeed, this shift is also observed in experiments, where the data lie on the experimental best-fit scaling relationship, , for small

^{*}*p*, but below the best-fit line for large

^{*}*p*.

^{*}Quantitatively, our model underpredicts the percentage of coinfected cells
*p*_{12} compared to the experiments: the
experimental proportionality constant relating
*p*_{12} and *p ^{*}*, 1/40, is
an order of magnitude larger than that estimated by our model, 1/400. One
reason for this discrepancy might be the presence in the experimental
cultures of cells not susceptible to infection. Hypothesize, for instance,
the presence of a population,

*T*

_{ns}, of non-susceptible cells in culture. The percentage of infected cells,

*p*, then becomes 100

^{*}*T*(

^{*}/*T*+

^{*}*T + T*

_{ns}), where

*T*is now the susceptible target cell population governed by Equation 1, and

*T*is the total population of infected cells. Similarly, the percentage of coinfected cells,

^{*}*p*

_{12}, becomes 100

*T*

_{12}

*/*(

*T*+

^{*}*T + T*

_{ns}). The resulting proportionality constant, is greater than that determined by our model (

*T*

_{ns}= 0) by the factor 1 +

*T*

_{ns}/(

*T*

^{*}+

*T*). An estimate of the latter factor is obtained by noting that the maximum percentage of cells infected in experiments is ∼20% for the two highest initial viral loads employed [7]. At the peak infection, we may assume that nearly all susceptible cells are infected, i.e.,

*T*≈ 0, so that

*T*

^{*}/(

*T*

^{*}+

*T*

_{ns}) ≈ 1/5. Thus, the factor above, 1 +

*T*

_{ns}/

*T*

^{*}≈ 5, explains at least in part the difference between the experimental proportionality constant and that derived from our model. Further, uncertainties exist in our knowledge of the CD4 down-modulation timescale,

*t*

_{d}, of the cells in culture [26,28,29]. A larger value of

*t*

_{d}may enhance the frequency of multiple infections and increase

*p*

_{12}for a given value of

*p*. Indeed, our model predictions assuming

^{*}*t*

_{d}= 2.8 d appear to be in better agreement with the experimental scaling between

*p*

_{12}and

*p*(Figure 6A). We note, however, that

^{*}*t*

_{d}= 2.8 d implies that

*k*

_{1}≈

*k*

_{0}throughout, so that several assumptions underlying the scaling relations in Equation 7 are not expected to hold. In particular, we find that for large

*p*, the proportionality constant relating

^{*}*p*

_{12}and

*p*is higher than 1/400, the value of the constant for small

^{*}*p*, in contrast to that predicted for smaller values of

^{*}*t*

_{d}and observed in the experimental data. Nonetheless, quantitative comparison with the experimental scaling between

*p*

_{12}and

*p*requires a description of the dynamics of the non-susceptible cell population including possible transitions from non-susceptibility to susceptibility and vice versa due to stimulation by regular IL-2 addition and loss of cell activation, respectively, which is beyond the scope of the present paper.

^{*}The presence of non-susceptible cells, however, does not influence the linear
scaling relationship between *p*_{12} and
*p*_{4}. Given that
,
the proportionality constant for the linear scaling,
*p*_{4}/*p*_{12} ≈
,
is independent of *T*_{ns}. Our model predicts that
for small *p*_{12}, *p*_{4} is
proportional to (*p*_{12})^{2} and for large
*p*_{12}, *p*_{4} is
proportional to *p*_{12} (Equation 7;
Figure 4B). In the
experiments, the quadratic scaling at small *p*_{12}
is not observed [7]. For low initial viral loads, the transition from the
quadratic to the linear scaling regime occurs at small values of
*p*_{12} that may lie below experimental
detection limits (Figure
4B). Upon increasing the initial viral load, the value of
*p*_{12} at the transition increases but the
quadratic scaling is short-lived. Thus, for larger viral loads, the
transition to the linear scaling regime appears to occur before the first
measurement following the onset of infection is made (at *t
≈* 2 d). Consequently, the linear scaling regime may alone be
accessed in experiments.

We find remarkably that our model quantitatively captures the experimental
linear scaling between *p*_{4} and
*p*_{12} (Figure 6B). For small values of
*p*_{12}, the model is in excellent agreement
with the data. Interestingly, the same recombination rate
(*n* = 8 crossovers in a genome of 9,700
nucleotides) obtained from single-round infection experiments is employed in
the latter predictions. (The latter predictions, however, are not adequately
sensitive to changes in the recombination rate; calculations with a higher
recombination rate, *ρ* = 0.001 crossovers
per position [*n* = 10 crossovers in a
genome of 9,700 nucleotides], yield only a marginal improvement in
the comparison between model predictions and experiment [Figure 6B].) For
large values of *p*_{12}, the model slightly
underpredicts the experimental data, possibly because of the increased
likelihood of more than two infections of cells, which we ignore.
Nonetheless, the quantitative agreement between model predictions and the
experimental scaling relationship and the consistency of the predictions
with the recombination rate estimated from independent single-round
infection assays indicate that our model accurately captures the underlying
dynamics of recombination during HIV infection.

## Discussion

The emergence of recombinant forms of HIV that are resistant to multiple drugs often underlies the failure of current antiretroviral therapies for HIV infection. Yet, the dynamics of the emergence of recombinant genomes in individuals infected with HIV remains poorly understood. Current models of HIV dynamics are unable to explain available experimental data of the frequency of occurrence and the time evolution of recombinant HIV genomes quantitatively. We developed a model that describes the dynamics of the emergence of recombinant forms of HIV and quantitatively captures key experimental observations. Mimicking recent experiments [5,7,14], we considered target cells exposed simultaneously to two kinds of homozygous virions. We constructed integral equations that predict the time evolution of the population of cells coinfected with both kinds of viruses. Following the first infection of a cell, viral gene expression induces CD4 down-modulation, which lowers the susceptibility of the cell to further infections. Because cells are infected asynchronously, determination of the frequency of multiple infections requires accounting for the different susceptibilities of individual cells to further infections at any given time based on the different times elapsed from their respective first infections, which is accomplished by the integral equation formalism [29]. Coinfected cells produce heterozygous progeny virions, which infect cells and yield recombinant proviral genomes. We developed a probabilistic description of template switching during reverse transcription and predicted the frequency with which heterozygous virions give rise to recombinant genomes. We integrated our descriptions of multiple infections of cells and recombination into standard models of HIV dynamics [15–17] and formulated dynamical equations that predict the time evolution of the populations of uninfected, singly infected, and multiply infected cells, and of homozygous, heterozygous, and recombinant viruses.

Model predictions are in agreement with the T cell dynamics observed in vitro. Levy
et al. [7]
found that following the onset of infection, the infected cell subpopulations evolve
in two phases, an initial rise and a subsequent fall. Further, the percentage of
cells infected by recombinant genomes, *p*_{4}, is a small
fraction of the percentage of coinfected cells, *p*_{12},
which in turn is a small fraction of the total percentage of cells infected,
*p ^{*}*. The two-phase dynamics and the relative
prevalence of various infected cell subpopulations are in agreement with our model
predictions. Our model also captures the scaling patterns relating the frequency of
infection, coinfection, and recombination observed experimentally. Levy et al.
[7] found
remarkably that

*p*

_{12}is proportional to (

*p*)

^{*}^{2}and that

*p*

_{4}is proportional to

*p*

_{12}, independent of the initial viral load and the time following the onset of infection. Our model predicts both these scaling patterns and that the patterns are independent of the initial viral load and the time following the onset of infection. Quantitative comparison between our model predictions and the experimental scaling relationship between

*p*

_{12}and (

*p*)

^{*}^{2}is precluded by the poorly characterized dynamics of cells not susceptible to infection by HIV that may be present in the experimental cultures. We showed, however, that the presence of non-susceptible cells does not influence the linear scaling relationship between

*p*

_{4}and

*p*

_{12}. Indeed, our model predictions are in quantitative agreement with the experimental scaling relationship between

*p*

_{4}and

*p*

_{12}. The quantitative agreement indicates that our model captures the underlying dynamics of HIV recombination accurately.

Our model also captures data from single-round infection experiments on the frequency
of the accumulation by recombination of distinct mutations present on the two RNA
strands within a virion. From comparisons of model predictions with the experiments
of Rhodes et al. [14], we estimate that ∼8 template switches, or crossovers,
occur on average (95% confidence interval: 6–10) during the
reverse transcription of an entire HIV genome of ∼10^{4}
nucleotides. This number is in agreement with independent estimates from direct
sequence analysis by Levy et al. [7], who observed ∼7.5 crossovers
(range 3–13) on average. Comparison of our model predictions with the
single-round infection assays performed by Levy et al. yields crossover frequencies
of ∼7.1 in Jurkat T cells, ∼4.6 in HeLa CD4 cells, and ∼8.3
in primary CD4^{+} T cells. Whereas the crossover frequency in HeLa
cells is lower, the frequency in the other two cell types is in agreement with the
estimate obtained from our analysis of the experiments of Rhodes et al.
[14].
Further, the scaling relationship between *p*_{4} and
*p*_{12} described above is also consistent with a
recombination rate of ∼8 crossovers per ∼10^{4} nucleotides.

The power law scaling that the number of doubly infected cells is proportional to the
square of the total number of infected cells is also predicted by the model of HIV
dynamics with multiple infections developed by Dixit and Perelson [29]. In contrast to
experiments, however, the predicted scaling is dependent on the time following the
onset of infection and the initial viral load. Here, by considering percentages
rather than numbers of infected cells and by distinguishing between cells doubly
infected by a single kind of genome and coinfected with distinct genomes, we mimic
experimental quantities more accurately and find that the scaling relationship is
independent of the time following infection or the initial viral load, as observed
in experiments [7]. Fraser argues that the quadratic scaling observed between the
percentage of doubly infected cells and the total percentage of infected cells, the
latter predominantly singly infected, may imply a deviation from mass action
kinetics for the second (and perhaps further) infections of cells and suggests,
motivated by the scaling, that the rate of second infection of cells may be
proportional to the square of the viral load (*r* ∼
*kV*^{2}) [22]. Here, we find that the quadratic
scaling emerges without deviations from mass action kinetics (*r*
∼ *kV*). Further, to address possible differences between the
rates of first, second, and third infections, due, for instance, to CD4
down-modulation, Fraser postulates the use of different values of the rate
constants, *k*, for successive infections. In our model, the
differences in the rate constants for multiple infections follow naturally from our
description of CD4 down-modulation (Equation 3). The latter description facilitates
accurate estimation of the frequency of multiple infections under varying viral
loads: when the viral load is high, for instance, the second infection of a cell may
occur rapidly after its first infection, in which case the rate constants for the
first and second infections are expected to be similar due to negligible CD4
down-modulation in the intervening interval. A fixed rate constant for second
infection (independent of the viral load) would then tend to underestimate the
frequency of double infections. The dependence on the viral load of the variation of
the apparent infection rate constant with the number of infections implies that when
viral load changes are rapid, the likelihood of a cell suffering multiple infections
would depend on the instant of its first infection. For instance, in patients
undergoing efficacious antiretroviral therapy, a cell first infected at the start of
therapy, when the viral load is large, is expected to have a higher rate constant
for second infection than a cell that is first infected a day after the onset of
therapy, when the viral load is significantly reduced. Our integral equation
formalism, which accounts for the asynchronous first infections of cells, allows
accurate determination of the frequency of multiple infections and consequently the
influence of recombination throughout the infection period.

If the distinction between different viral genomes is ignored, our model reduces to the model of HIV dynamics with multiple infections developed by Dixit and Perelson [29] when more than two infections of cells are rare. Indeed, changes in the total viral load, target cell numbers, and total infected cell numbers predicted by our model (Figure 2) are identical to those predicted by the latter model. Importantly, the latter model reduces to the standard model of HIV dynamics [15–17] when viral production from cells is independent of the number of infections cells suffer. Our model is thus consistent with the standard model of HIV dynamics, which successfully predicts viral load changes in patients following the onset of antiretroviral therapy [15–17]. Further, infections in SCID-hu mice also show scaling patterns similar to those observed in vitro [7] and predicted by our model, reinforcing the notion that our model may be applied to describe the dynamics of recombination in vivo.

Several advances of our model are essential, however, to predict the emergence of recombinant genomes in vivo. First, that infected splenocytes from two HIV patients harbored 3–4 proviruses per cell on average [6] suggests that multiple infections of cells may be more prevalent in vivo than is assumed in our model. Second, multiple infections in vivo may be orchestrated by cell–cell transmission as well as by free virions [32,33]. Third, HIV has a high mutation rate [34], which introduces genomic variations in vivo that may subsequently be accumulated by recombination. Indeed, the high mutation and turnover rates of HIV in vivo [17,35] suggest that the likelihood of the preexistence of individual drug resistance mutations in patients is high [36]: approximately half of the HIV patients in the United States are estimated to be infected with genomes that possess resistance to at least one of the currently available drugs [37]. Our model describes how preexisting mutations may become associated by recombination. Determination of the existence of individual mutations, however, requires a description of the HIV mutation process, which we ignore. Fourth, fitness interactions between mutations [24] modulate the relative prevalence of recombinant genomes, whereas we assume all viral genomes to be equally fit. We note that incorporating fitness selection enables our present model to describe additionally in vitro serial passage experiments of the emergence of drug resistance via recombination [2,38]. Finally, whereas we consider two loci, a description of recombination between more than two loci is essential in vivo, as more than two mutations are typically responsible for resistance to individual drugs [37]. With the above advances, some of which are suggested in currently available models [18,20,22,23], our model may facilitate prediction of the emergence of multi-drug–resistant strains of HIV in infected individuals.

## Methods

### Solution of dynamical equations.

We non-dimensionalize Equations 1–4 and 6 using the following dimensionless quantities: and obtain and

We solve the dimensionless Equations 5 and 11–Equation 15 as follows. We recognize that
the equations are strongly coupled because of the integral equation formalism
(Equation 12)
employed; for instance, evaluation of the integral in Equation 12a to
determine
requires knowledge of
at all times from 0 to
,
which in turn depends on
through Equation
15. Using the initial conditions, we first integrate the differential
equations for
and
(Equations 11
and 15) for a
small time step *θ*, i.e., from
to
.
Next, we integrate the differential equations for the conditional probabilities
*M* (Equation 14) by discretizing
,
which can vary from 0 to *θ*, into intervals of length
*θ _{m}* and determining
,
where

*α*assumes integer values from 0 to

*θ/θ*, by linear interpolation between and . We then evaluate the integrals in Equation 12 to determine , , and . We march forward in time and evaluate and by integrating Equation1 11 and 15 from time to , integrate Equation 14 by allowing to vary from 0 to 2

_{m}*θ*, and evaluate the integrals in Equation 12 to determine , , and . We repeat the procedure until (i.e.,

*t*= 10 d). The solution is implemented by a computer program written in Fortran 90.

### Scaling analysis.

We derive below the scaling relationships mentioned in Equation 7.
Following the onset of infection, for times small compared to the timescale of
CD4 down-modulation, i.e., *t ≪ t*_{d}, because
*k* ≈ *k*_{0}, we write the
dynamics of singly infected cells as where the first term on the right-hand side is the rate of
formation of *T*_{1} by the infection of uninfected
cells, and the second and third terms are the losses of
*T*_{1} due to further infections and cell death. At the
start of infection, the dominant viral populations are
*V*_{11} and *V*_{22} (Figure 3B), of which infection
by the former alone yields *T*_{1}. Further, because
*T*_{1} is small (Figure 3A), the loss terms, which are linear
in *T*_{1}, are negligible. Equation 16 then simplifies to For lengths of time short compared to the timescales over which
*V* and *T* change, we let *T*
≈ *T*_{0} and *V*_{11}
≈ *V*_{0} and integrate Equation 17 to obtain
where we use the initial condition
*T*_{1}(0) = 0. With assumptions similar to
those employed in obtaining Equation 17, we find that the coinfected cell
population evolves according to the following equation: where we recognize that *T*_{1}
= *T*_{2} and *V*_{11}
= *V*_{22} ≈
*V*_{0}. Substituting for *T*_{1}
from Equation 18
and integrating with the initial condition *T*_{12}(0)
= 0, we get The evolution of the heterozygous viral population,
*V*_{12}, is given by where we note that 1/2 of the virions produced from cells
*T*_{12} are heterozygous. We ignore viral clearance
because *V*_{12} is expected to be small. Substituting
for *T*_{12} from Equation 20 and integrating with the initial
condition *V*_{12}(0) = 0, we obtain
The time evolution of the cell population infected with
recombinants, *T*_{4}, is then given by where we recognize that because *V*_{12}
≫ *V*_{44} ≫
*V*_{14}, most of the cells
*T*_{4} are formed due to infection by
*V*_{12} followed by recombination. Substituting for
*V*_{12} from Equation 21, and integrating with the initial
condition that *T*_{4}(0) = 0, we find

Because the total infected cell population comprises largely cells
*T*_{1} and *T*_{2}, which in
turn are significantly smaller in number than uninfected cells (Figures 2 and 3A), we obtain the total
percentage of infected cells, Similarly, the percentage of coinfected cells, and the percentage of cells infected with recombinants,
Combining Equations 24–26, we obtain We thus find that early during infection, the scaling laws
*p*_{12} ∼
(*p ^{*}*)

^{2}and

*p*

_{4}∼ (

*p*

_{12})

^{2}hold.

We next consider times longer than the timescale over which viral production and
clearance reach pseudo steady state, *t > t*_{eq}.
The magnitudes of the viral subpopulations still follow
*V*_{11} = *V*_{22}
≫ *V*_{12} ≫
*V*_{44} (Figure 3B). Similarly, for the infected cell subpopulations, we have
*T*_{1} = *T*_{2}
≫ *T*_{12} ≫ *T*_{4}
(Figure 3A). The
relevant evolution equations may then be written as
and where we let *k*_{1} be the
“mean” infection rate of singly infected cells. We note that
*k*_{1} is a function of the CD4 down-modulation
timescale, *t*_{d}. If *t*_{d} is
large, for instance, then *k*_{1} ≈
*k*_{0}. Applying the pseudo steady state
approximation for the viral populations yields Substituting for *V*_{11} from Equation 33 in Equation 28, we
obtain Assuming that changes in the target cell population,
*T*, and the total viral load, *V*, occur slowly
compared to changes in *T*_{1}, which is expected in the
initial stages of infection, we integrate Equation 34 to obtain where
and
is the value of *T*_{1} when *t*
= *t*_{eq}. Note that in Equation 35,
*t* ≫ *t*_{eq}. Substituting
for *V*_{11} and *T*_{1} in Equation 29 yields
which upon integrating with the initial condition
when *t = t*_{eq} and recognizing that
gives Combining Equations 30, 33, and 36, we get Integrating Equations 37 and assuming
yields Again, assuming that the singly infected cells are predominant in
the infected cell population, the percentage of total infected cells is
where the small percentage of infected cells allows us to write
*T ^{*}* +

*T*≈

*T*. The percentage of coinfected cells is then where the last approximation follows from the sharp rise in

*T*

_{1}following the establishment of pseudo steady state (Figure 3A), which according to Equations 34 implies that . Finally, the percentage of cells

*T*

_{4}infected with recombinants is Thus, later in the infection (

*t > t*

_{eq}), the scaling laws

*p*

_{12}∼ (

*p*)

^{*}^{2}and

*p*

_{4}∼

*p*

_{12}hold.

## Author Contributions

NMD conceived and designed the experiments. GWS performed the experiments. GWS and NMD analyzed the data and contributed reagents/materials/analysis tools. NMD wrote the paper.

## References

- 1. Blackard JT, Cohen DE, Mayer KH (2002) Human immunodeficiency virus superinfection and recombination: Current state of knowledge and potential clinical consequences. Clin Infect Dis 34: 1108–1114.
- 2. Moutouh L, Corbeil J, Richman DD (1996) Recombination leads to the rapid emergence of HIV-1 dually resistant mutants under selective drug pressure. Proc Natl Acad Sci U S A 93: 6106–6111.
- 3. Rhodes T, Wargo H, Hu WS (2003) High rates of human immunodeficiency virus type 1 recombination: Near-random segregation of markers one kilobase apart in one round of viral replication. J Virol 77: 11193–11200.
- 4. Chen J, Dang Q, Unutmaz D, Pathak VK, Maldarelli F, et al. (2005) Mechanisms of nonrandom human immunodeficiency virus type 1 infection and double infection: Preference in virus entry is important but is not the sole factor. J Virol 79: 4140–4149.
- 5. Dang Q, Chen JB, Unutmaz D, Coffin JM, Pathak VK, et al. (2004) Nonrandom HIV-1 infection and double infection via direct and cell-mediated pathways. Proc Natl Acad Sci U S A 101: 632–637.
- 6. Jung A, Maier R, Vartanian JP, Bocharov G, Jung V, et al. (2002) Multiply infected spleen cells in HIV patients. Nature 418: 144–144.
- 7. Levy DN, Aldrovandi GM, Kutsch O, Shaw GM (2004) Dynamics of HIV-1 recombination in its natural target cells. Proc Natl Acad Sci U S A 101: 4204–4209.
- 8. Jetzt AE, Yu H, Klarmann GJ, Ron Y, Preston BD, et al. (2000) High rate of recombination throughout the human immunodeficiency virus type 1 genome. J Virol 74: 1234–1240.
- 9. Galetto R, Negroni M (2005) Mechanistic features of recombination in HIV. AIDS Rev 7: 92–102.
- 10. Shriner D, Rodrigo AG, Nickle DC, Mullins JI (2004) Pervasive genomic recombination of HIV-1 in vivo. Genetics 167: 1573–1583.
- 11. Charpentier C, Nora T, Tenaillon O, Clavel F, Hance AJ (2006) Extensive recombination among human immunodeficiency virus type 1 quasispecies makes an important contribution to viral diversity in individual patients. J Virol 80: 2472–2482.
- 12. McCutchan FE (2006) Global epidemiology of HIV. J Med Virol 78: S7–S12.
- 13. Chen J, Rhodes TD, Hu WS (2005) Comparison of the genetic recombination rates of human immunodeficiency virus type 1 in macrophages and T cells. J Virol 79: 9337–9340.
- 14. Rhodes TD, Nikolaitchik O, Chen JB, Powell D, Hu WS (2005) Genetic recombination of human immunodeficiency virus type 1 in one round of viral replication: Effects of genetic distance, target cells, accessory genes, and lack of high negative interference in crossover events. J Virol 79: 1666–1677.
- 15. Perelson AS (2002) Modelling viral and immune system dynamics. Nat Rev Immunol 2: 28–36.
- 16.
Nowak MA, May RM (2000) Virus dynamics: Mathematical principles of immunology and virology. New York: Oxford University Press.
- 17. Simon V, Ho DD (2003) HIV-1 dynamics in vivo: Implications for therapy. Nat Rev Microbiol 1: 181–190.
- 18. Althaus CL, Bonhoeffer S (2005) Stochastic interplay between mutation and recombination during the acquisition of drug resistance mutations in human immunodeficiency virus type 1. J Virol 79: 13572–13578.
- 19. Bocharov G, Ford NJ, Edwards J, Breinig T, Wain-Hobson S, et al. (2005) A genetic-algorithm approach to simulating human immunodeficiency virus evolution reveals the strong impact of multiply infected cells and recombination. J Gen Virol 86: 3109–3118.
- 20. Bretscher MT, Althaus CL, Muller V, Bonhoeffer S (2004) Recombination in HIV and the evolution of drug resistance: for better or for worse? Bioessays 26: 180–188.
- 21. Carvajal-Rodriguez A, Crandall KA, Posada D (2007) Recombination favors the evolution of drug resistance in HIV-1 during antiretroviral therapy. Infect Genet Evol 7: 476–483.
- 22. Fraser C (2005) HIV recombination: what is the impact on antiretroviral therapy? J R Soc Interface 2: 489–503.
- 23. Rouzine IM, Coffin JM (2005) Evolution of human immunodeficiency virus under selection and weak recombination. Genetics 170: 7–18.
- 24. Bonhoeffer S, Chappey C, Parkin NT, Whitcomb JM, Petropoulos CJ (2004) Evidence for positive epistasis in HIV-1. Science 306: 1547–1550.
- 25. Kouyos RD, Althaus CL, Bonhoeffer S (2006) Stochastic or deterministic: what is the effective population size of HIV-1? Trends Microbiol 14: 507–511.
- 26. Chen BK, Gandhi RT, Baltimore D (1996) CD4 down-modulation during infection of human T cells with human immunodeficiency virus type 1 involves independent activities of vpu, env, and nef. J Virol 70: 6044–6053.
- 27. Lama J (2003) The physiological relevance of CD4 receptor down-modulation during HIV infection. Curr HIV Res 1: 167–184.
- 28. Piguet V, Gu F, Foti M, Demaurex N, Gruenberg J, et al. (1999) Nef-induced CD4 degradation: a diacidic-based motif in Nef functions as a lysosomal targeting signal through the binding of beta-COP in endosomes. Cell 97: 63–73.
- 29. Dixit NM, Perelson AS (2005) HIV dynamics with multiple infections of target cells. Proc Natl Acad Sci U S A 102: 8198–8203.
- 30. (2005) Correction for Levy et al., From the cover: dynamics of HIV-1 recombination in its natural target cells. Proc Natl Acad Sci U S A 102: 1808–1808. PNAS 2004 101: 4204-4209.
- 31. Speirs C, van Nimwegen E, Bolton D, Zavolan M, Duvall M, et al. (2005) Analysis of human immunodeficiency virus cytopathicity by using a new method for quantitating viral dynamics in cell culture. J Virol 79: 4025–4032.
- 32. Dixit NM, Perelson AS (2004) Multiplicity of human immunodeficiency virus infections in lymphoid tissue. J Virol 78: 8942–8945.
- 33. Sato H, Orenstein J, Dimitrov D, Martin M (1992) Cell-to-cell spread of HIV-1 occurs within minutes and may not involve the participation of virus-particles. Virology 186: 712–724.
- 34. Mansky LM, Temin HM (1995) Lower in-vivo mutation-rate of human-immunodeficiency-virus type-1 than that predicted from the fidelity of purified reverse-transcriptase. J Virol 69: 5087–5094.
- 35. Perelson AS, Neumann AU, Markowitz M, Leonard JM, Ho DD (1996) HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span, and viral generation time. Science 271: 1582–1586.
- 36. Ribeiro RM, Bonhoeffer S (2000) Production of resistant HIV mutants during antiretroviral therapy. Proc Natl Acad Sci U S A 97: 7681–7686.
- 37. Clavel F, Hance AJ (2004) Medical progress: HIV drug resistance. N Engl J Med 350: 1023–1035.
- 38. Kellam P, Larder BA (1995) Retroviral recombination can lead to linkage of reverse-transcriptase mutations that confer increased zidovudine resistance. J Virol 69: 669–674.