Figure 1.
Schematic diagram of the HIV-1 sequence evolution model.
(A) Each sequence is represented with a sequence index, d (the number within the circle), equal to the distance from the founder strain. N(d,t) denotes the total number of sequences at distance d at time t. F(d) is the total number of sequences produced per unit time per sequence. A sequence d at time t generates either sequence d+1 with probability M(d) by a mutation, or sequence d with probability 1−M(d) at time t+1. M(d) is the proportion of offspring that are mutants. (B) The divergence is defined as the mean and the diversity is defined as the standard deviation of the distribution of P(d,t) in Eq. (2). The position of the mean (divergence) is shown as the vertical line of each P(d,t) at year 1, 4, and 10, respectively. The standard deviation (diversity) is shown as the horizontal line of each distribution. (C) The profile of M(d) for the general (full) sequence evolution model (left panel), submodel 1 (middle panel), and submodel 2 (right panel). Here ds denotes the distance from which M(d) starts to decline and dmax denotes the distance point of M(d) = 0.
Figure 2.
Two measures of diversity dynamics of 9 longitudinally followed patients.
Diversity dynamics of each subject from the definition of average pairwise distance [red line] among all the sequences sampled at the same time point and the standard deviation [blue line] of the distribution of the tree distances of all the sequences at the same time point from the founder strain multiplied by a constant factor. The constant factors are 20, 20, 25, 25, 27, 40, 40, 30, 70 for S-P1 to S-P11, respectively. The two measures of diversity are proportional to each other.
Figure 3.
The fit of the full model to dynamics of divergence and diversity.
Dynamics of divergence and diversity fitted with the full model (right panel in Figure 1C). We calculated divergence by first measuring the tree distance between a sequence sampled at time t and a strain found at the initial sample time point. Then we averaged all the pairwise tree distances between the sequences at t and the sequence sampled at the earliest time point. Likewise, the diversity was calculated from the data by averaging pairwise tree distances over all the sequences obtained at time t. We fixed parameters as m = 0.9 and f = 1 estimated ds and dmax using a non-linear least squares method based on the Levenberg-Marquardt algorithm [59] and calculated 95% C.I. of these estimated parameters based on bootstrap sampling of the residuals [60]. The result of the fit is summarized in Table 1.
Table 1.
Model fitting to divergence and diversity dynamics.
Figure 4.
Dynamics of divergence and diversity with linear increase of fitness profile.
Divergence (A) and diversity (B) as a function of time for f(d) = f1+f2d and M(d) = m [submodel 1] for different values of f2/f1. The value of m is chosen as 0.5 and f1 = 1. The scale factors 300 and 20 for the divergence and the diversity are introduced to make comparable to the absolute values of measured divergence and diversity. Divergence (C) and diversity (D) as a function of time for f(d) = f1+f2d and M(d) = 1−d/dmax for d≤dmax and M(d) = 0 for d>dmax [submodel 2]. The value of dmax is 40 and f1 = 1. The scale factor for the divergence is 500 and that for the diversity is 50.
Figure 5.
Dynamics of divergence and diversity with fitness reduction.
Dynamics of divergence and diversity when fitness is reduced to 50% of its original value for d>dc = 50 mutations. For d≤50, f = 1 and for d>50, f = 0.5, and M(d) = 0.5 for all d. The saturation of divergence and the decrease of diversity are observed.
Figure 6.
Dynamics of divergence and diversity with emergence of X4 viruses.
Dynamics of divergence and diversity when imposing a greater level of fitness for certain types of viruses which emerge and persist, for example, by acquiring X4 tropism. The X4 viruses appear at d = 50 with greater level of fitness Fhigh = 1.5 in comparison to R5 viruses with fitness F = 1. The fraction of X4 viruses out of the total virus population with d≥50 is given by α. The rapid transient increases both in divergence and diversity upon the emergence of X4 viruses are observed. The scale factor for the divergence is 500, that for the diversity is 100, and M(d) = 0.5 for all d.
Figure 7.
Evolutionary rate as a function of the distance from the root of the maximum likelihood tree of each patient.
(A) Maximum likelihood tree for the viral sequences sampled from patient S-P6 over 6 years [13]. (B) Evolutionary rate as a function of the distance from the root of the tree for 9 patients from Ref. [13] and 6 patients from Ref. [22] (black lines). The evolutionary rate between sequence i and j is estimated by the distance difference, dj−di, divided by the sampling time difference, tj−ti. The evolutionary rate at a certain distance from the root d was averaged over all possible sequence pairs (i, j) within a sliding window. The distance from the root for a particular window d ̅ is the average distance for all the sequences within that window. The size of the window (Δ) was 0.09 substitutions per site for S-P1 to S-P11 and 0.02 for W-P1 to W-P6. Error bars indicate ±1 standard deviation. The fitted rate of evolution with the full model to the divergence and diversity dynamics of each patient is depicted as blue line.
Figure 8.
Dynamic correlation between the rate of HIV-1 evolution and the rate of CD4+ T cell count decline.
(A) Evolutionary rate and CD4+ T-cell level as a function of time relative to seroconversion. Based on the estimation of the evolutionary rate as a function of distance to the root (Figure 1A), the evolutionary rate is plotted as a function of time (average sampled time point of all the sequences within the window). Error bars indicate ±1 standard deviation. The dynamics of the evolutionary rate is linked to that of the CD4+ T-cell count: While the CD4+ T-cell level is stable, the evolutionary rate is stable or increasing; the evolutionary rate starts to decrease when the CD4+ T-cell population is depleted. In patients S-P1 to S-P11, the dashed line indicates the stage when stable CD4+ T-cell count starts to decline. CD4+ T-cell counts were provided by J. Mullins and J. Learn. Red horizontal line denotes the period of antiretroviral therapy for each patient. (B) Correlation between the slope of CD4+ T cell count and the slope of the evolutionary rate (r = 0.68, P = 0.0014). For patients S-P1 to S-P11, the slopes are calculated separately before and after the dashed line. For W-P1 to W-P6, the slopes are measured over the whole range of the data. Note that the slope of the evolutionary rate for W-P6 is very large due to tight sampling, and the slope of the CD4+ T cell count is also high in the corresponding time interval, leading to W-P6 becoming an outlier. The inset shows the average evolutionary rate for different rates of disease progression. Each subject's average evolutionary rate is measured as the ratio between the root distance difference and the sampling time difference, averaged over all the sequence pairs in each tree. The error bars indicate ±1 standard deviation. Because we rooted our trees using a sequence from the initial time point, and not the clade B consensus as done by Wolinsky et al. [22], our calculated evolutionary rate differs from theirs. Subjects S-P2, S-P3, S-P7, S-P9, S-P11, W-P5, and W-P6 were classified as slow disease progressors; S-P1, S-P5, S-P6, S-P7, S-P8, W-P3, and W-P4 as intermediate progessors; and W-P1 and W-P2 as rapid progressors.
Figure 9.
Dynamics of synonymous and nonsynonymous evolutionary rates.
Synonymous (blue lines) and nonsynonymous (black lines) evolutionary rates as a function of the distance from the root of the tree for 9 patients from Ref. [13]. Synonymous and nonsynonymous rates were calculated using maximum likelihood trees based on only synonymous and non-synonymous substitutions, respectively, which were inferred using HyPhy with optimized MG94xREV models [30].
Table 2.
Polymorphism and population recombination parameters of the studied sequence data.
Figure 10.
Dynamics of divergence and diversity from the model when the proportion of mutant offspring is set to zero after 7 years.
Divergence and diversity dynamics calculated under an alternative model with a constant probability of mutation, 0.5, before time τ followed by zero after τ. Here τ is chosen as 7 years. Since evolution of total population stops at τ, divergence and diversity stay constants afterwards.