Immune Activation Promotes Evolutionary Conservation of T-Cell Epitopes in HIV-1

HIV, unlike other viruses, may benefit from immune recognition by preserving the sequence of its T cell epitopes, thereby enhancing transmission between cells.

The reasons underlying the relatively low genetic diversity of Tcell epitopes in HIV-1 remain poorly understood. One proposed explanation is epitope detection bias [23,28], whereby mismatches between the peptides used in epitope screening studies and the actual sequence of the assayed viruses tend to produce false negative results in highly variable regions of the viral genome, creating an artificial negative association between immunogenicity and variability. It has also been suggested that epitope conservation may be determined by host factors. The immuno-proteasome preferentially processes hydrophobic residues, and these should tend to show relatively low variability because they often occupy internal regions of the protein that are important for correct folding [29,30]. Finally, it has been suggested that regions of the viral genome where functional constraint is weaker may have evolved generalized immune escape at the global host population level and thus show fewer extant epitopes than other, more constrained regions [23,31]. However, analysis of HIV-1 sequences spanning several decades was not consistent with this hypothesis [32], and there is little phylogenetic evidence supporting global escape in HIV-1 [33]. Furthermore, all the above hypotheses fail to explain why no systematic T-cell epitope conservation has been observed in other highly variable and prevalent human viruses such as influenza, hepatitis C, and dengue viruses [34][35][36][37].
Here, we first carried out a sequence variability analysis to validate and further characterize epitope conservation in HIV-1. Confirming previous findings, sites in the viral genome mapping to both T H and CTL epitopes were consistently less variable than those not mapping to any described T-cell epitopes. In contrast, T-cell epitopes tended to be associated with increased variability levels when this same analysis was carried out for hepatitis C virus (HCV). We also found that HIV-1 epitope conservation was probably determined by intrapatient evolutionary processes and was evident in Gag p24 and Nef proteins even after accounting for epitope detection bias. Based on this, we hypothesized that T-cell epitope conservation may result from the particular interactions established between HIV-1 and the immune system. Although epitope recognition triggers an anti-HIV immune response, the virus replicates more efficiently in activated T H cells [38][39][40][41][42][43][44][45][46]. Therefore, the variability of T-cell epitopes may be determined by the balance between two opposite selective pressures, one favoring immune escape and another favoring immune activation. To tackle this issue, we developed a mathematical model of the intrahost infection dynamics and T-cell responses. We found that sequence conservation may be favored at T H epitopes or CTL epitopes co-mapping with T H epitopes, whereas immune escape should be selected otherwise. The model suggested that epitopes triggering vigorous (immunodominant) T H -cell responses should be more conserved than those triggering weak or moderate responses. This is consistent with the fact that epitope conservation was better supported for highly immunogenic proteins such as Gag p24 and Nef [10,20]. Furthermore, we predict that epitope conservation may be favored if T H cells frequently become infected in the process of being activated by professional antigenpresenting cells (pAPCs) (transinfection). Since transinfection appears to be an important mechanism for viral dissemination in the lymph nodes during the chronic stage of the disease [47][48][49], our model may help to explain why escape rates tend to slow down as the infection progresses [50][51][52]. Finally, our findings suggest that vaccines that do not elicit HIV-specific T H cell activation may have improved efficacy.

Empirical Evidence for Epitope Conservation in HIV-1
To confirm widespread T-cell epitope conservation in HIV-1, we downloaded 100 full-length subtype B sequences from different patients and 220 experimentally validated epitopes (CTL or T H ) from the Los Alamos HIV-1 database. The epitope list included the ''A list'' of 88 best-defined epitopes CTL epitopes and also 132 T H epitopes. Using Shannon's entropy (H) to quantify variability at each amino acid site, we found that sites mapping to T-cell epitopes tended to be more conserved than those not mapping to any of these epitopes ( Figure 1A). This association appeared to be mainly driven by CTL epitopes in Env (two-way ANOVA: p,0.001) and Nef (p,0.001) proteins, and by T H epitopes in Gag (p = 0.005). However, the separate effects of T H and CTL epitopes are difficult to ascertain because they tend to co-map in the HIV genome (Fisher's exact test: p,0.001) [53] and, also, because epitopes currently classified as CTL-only may actually be T H epitopes as well, since the latter group has been less extensively studied. The most consistent conservation pattern was observed when comparing sites that mapped to both CTL and T H epitopes (H = 0.14660.016) with those not mapping to any of these epitopes (H = 0.25560.01; nested ANOVA: p,0.001).
To check that the results were not dependent on how epitopes have been curated, we repeated the analysis using the complete list of 741 CTL epitopes instead of the ''A list.'' This confirmed T-cell epitope conservation throughout the genome (nested ANOVA: p,0.001). Although the above analyses accounted for differences in variability across genes, we further checked whether epitope conservation may be a by-product of other selective factors in two ways. First, we included RNA structure in the analysis, a major factor constraining HIV variability [27,54]. We found that nucleotide sites mapping to T-cell epitopes were more conserved than those not mapping to these epitopes regardless of whether they were involved in establishing base-pairs in the genomic RNA structure of the virus (nested ANOVA: p,0.001). Second, we verified that epitope conservation was not a byproduct of 59R39 variability gradients by introducing genome position as a covariate in the analysis.
To assess whether epitope conservation is determined by intrapatient or host population-level evolutionary processes, we downloaded $10 HIV-1 subtype B sequences from each of 100 patients and calculated the average intrapatient amino acid entropy at each amino acid site. Since HIV transmission typically involves one or a few viral particles [55][56][57], the intrapatient sequence entropy largely reflects the variability accumulated over the course of an individual infection. We again observed that sites mapping to T-cell epitopes (CTL ''A list'' and T H ) tended to be more conserved (H = 0.01260.001) than those not mapping to any of these epitopes (H = 0.01860.001; nested ANOVA: p,0.001; Figure 1B). Indeed, changes in entropy associated with the presence of T-cell epitopes were qualitatively very similar to those observed at the host population level ( Figure 1A versus 1B). This suggests that T-cell epitope conservation in HIV-1 is determined at the intra-patient level.
If T-cell epitope conservation was a methodological artifact (e.g., epitope detection bias) or produced by host factors (e.g., selective peptide processing), it should also be evident in other highly variable human viruses. HCV provides a convenient test case because, similar to HIV, it is a rapidly evolving pandemic virus, establishes chronic infections in humans, and is strongly targeted by T-cell immunity [58][59][60][61][62][63]. We aligned 100 HCV subtype 1a polyprotein sequences, calculated the per-site amino acid entropy as above, and downloaded experimentally defined HCV 1a T H or CTL epitopes from the Immune Epitope Database (IEDB). We found that, throughout the genome, amino acid sites mapping to at least one T-cell epitope were significantly more variable (H = 0.73060.007) than those not mapping to any of these epitopes (H = 0.63660.004; nested ANOVA: p,0.001), the

Author Summary
A key component of the immune response against viruses and other pathogens is the recognition of short foreign protein sequences called epitopes. However, viruses can escape the immune system by mutating, so epitopes should accumulate high levels of genetic variability. This has been documented in several human viruses, but in HIV, unexpectedly, epitopes tend to be relatively conserved. Here, we propose that this is a consequence of the peculiar interactions that occur between HIV and the immune system. As with other viruses, recognition of HIV epitopes promotes the activation of cytotoxic and helper T lymphocytes, which then orchestrate a cellular immune response. However, HIV infects helper T lymphocytes as their target cell in the body and does so more efficiently when these cells have been activated to participate in an immune response. Mathematical modeling showed that, in some cases, HIV may take advantage of immune activation, thus favoring epitope conservation. This should be more likely to occur with epitopes that trigger more vigorous T-cell responses, and during the process known as ''trans-infection,'' in which helper T lymphocytes are infected while being activated. Our results highlight the potential advantages of an HIV vaccination strategy based on epitopes that stimulate cytotoxic T lymphocytes without specifically stimulating helper T lymphocytes. association being most evident for genes E2 and NS4b ( Figure 1C). This pattern contrasts with the results obtained for HIV.
To further characterize T-cell epitope conservation in HIV, we used a dataset from a high-throughput study in which T-cell responses were determined for a large number of individuals infected with HIV-1 subtype C using the IFNc enzyme-linked immunospot assay [18]. Thus, epitopes were empirically verified for each patient. These assays involved a battery of synthetic peptides evenly distributed throughout the viral genome, thus eliminating potential problems of region oversampling. Furthermore, the full genome sequence of the infecting virus was available for 113 patients, allowing us to identify every mismatch between the assay peptides and the viral sequence and, thus, to systematically discard epitope detection bias. Among these 113 patients, peptides showing at least one positive immune response were less variable (H = 0.16560.014) than nonimmunogenic peptides (H = 0.21060.008; one-way ANOVA: p = 0.005), thus confirming epitope conservation. This difference was significant for Gag p24 (one-way ANOVA: p = 0.001) and Nef (one-way ANOVA: p,0.001), whereas it was nonsignificant for Gag p17, Pol, and Env ( Figure 2). Qualitatively equivalent results were obtained using the number of amino acid substitutions per codon (d N ) instead of entropy, whereas we found no association between immunogenicity and the number of synonymous substitutions (d S ) ( Table 1). The latter lack of association further shows that epitope conservation is unlikely to stem from selective pressures acting on RNA structure or from 59R39 conservation gradients. Consistently, nonimmunogenic peptides were richer in positively selected codons (d N /d S .1) than immunogenic peptides in both Gag p24 and Nef. Finally, we note that these results are probably more reliable for Gag p24 than for Nef since they are based on a larger number of assays (1,485 versus 357).

The Immune Activation Model
We sought to develop a model that could account for the following observations made above: T-cell epitopes are less variable than other regions of the viral genome; epitope conservation appears to be determined by intrapatient evolutionary processes; after ruling out possible confounders, the conservation signal is found mainly in highly immunogenic proteins; HCV and other human viruses do not show widespread epitope conservation. Also, since both T-cell escape and epitope conservation have been documented in HIV-1, it becomes important to identify key factors determining which of these outcomes should take place.
We suggest that, since HIV-1 replicates more efficiently in activated T H cells, epitope conservation may provide payoffs to the virus by increasing the pool of virus-susceptible cells. On the other hand, epitope conservation is costly because it triggers an anti-HIV CTL response. To explore how the complex interactions established between HIV-1 and the cellular immune system may favor or select against epitope escape, we built a mathematical model involving T H cells, CTLs, and pAPCs ( Figure 3A). We included pAPCs because T H cell activation is mediated by MHC II epitopes, which are only present in pAPCs. Also, pAPCs are an important viral transmission vehicle (see below). In the model, T H cells could be activated by HIV epitopes or other, non-HIV, antigens (e.g., from microbial translocation). We denoted the latter background activation. Dendritic cells are the main type of pAPCs in the context of an HIV infection, but macrophages also fall into this category. CTL activation required recognition of an HIV epitope presented by the MHC I of an infected T H cell or a pAPC, and co-stimulatory cytokines released from active T H cells. Cytokine co-stimulation was not epitope-specific, meaning that it could also come from background-activated T H cells. Background activation of CTLs is not relevant in the context of the model and was thus not considered. Activated CTLs lysed infected T H cells after recognizing an MHC I epitope. However, error-prone replication gave rise to progeny virions carrying escape mutations in their CTL or T H epitopes. We defined CTL escape mutants as those failing to trigger MHC I-mediated CTL activation and cell killing, T H escape mutants as those failing to trigger MHC IImediated T H cell activation, and T-cell escape mutants as those escaping both types of response.
Critically, HIV-1 replication and viral load are dependent on levels of T H cell activation [38][39][40][41]. In nonactivated cells, the efficiency of reverse transcription is lower than in activated cells because of the limited dideoxynucleotide availability , low ATP levels hamper nuclear transport of the pre-integration complex, and gene expression is less well-supported by key transcription factors such as NF-kB and NFAT [42][43][44]. As a result, the infection cycle can be arrested at the reverse transcription step and the incomplete viral DNA degraded unless the cell undergoes activation within days following viral entry [42,45,46]. T H cells can become infected in the process of being activated by pAPCs, though. A mechanism for this is transinfection, whereby dendritic . Mean 6 SEM entropy (H) is shown for sites not mapping to any T-cell epitopes (white) and for those mapping to T H epitopes (blue), CTL epitopes (red), or both (purple). In (A) and (C) amino acid entropy was quantified at the host population level (100 sequences from different patients), whereas in (B) it was quantified at the intrapatient level (average from 100 patients containing $10 sequences each). For HIV, only Gag, Pol, Env, and Nef are shown because they contain the vast majority of T-cell epitopes. No significant differences in variability associated with T-cell epitopes were found in other genes. Regions with overlapping reading frames were excluded from the analysis. For HCV, only genes with at least five sites in each category were plotted. Notice that the y-axis is broken to accommodate the extremely variable epitopes in E2. doi:10.1371/journal.pbio.1001523.g001 cells can transmit virions bound to their DC-SIGN or L-SIGN lectins to the T H cells with which they establish MHC II-type immune synapses [48,49]. Another possible mechanism is cisinfection, whereby virions released from infected dendritic cells or macrophages are transmitted to synapsing T H cells. However, pAPCs are not major viral producers [64], and therefore, we neglected viral replication in these cells and cis-infection for simplicity. The contribution of HIV-specific immune activation and transinfection to viral spread and pathogenesis is supported by the observation that HIV-specific T H cells are more readily infected than other subpopulations of active T H cells [65], reducing their life span and compromising the generation of effective anti-HIV responses [66].
We also built a ''control'' model in which the virus infects nonimmune cells (denoted C) instead of T H cells, but which was otherwise identical to the HIV model ( Figure 3B). This allowed us to parallel the comparison between HIV and HCV made above ( Figure 1). To address whether T H and CTL escape mutants should be capable of outgrowing the wild-type virus given the intrapatient selective forces imposed by the cellular immune response in the HIV and control models, we considered a single immune response-escape cycle. Successive cycles ultimately leading to immune exhaustion and AIDS have been modeled previously and are important for understanding the natural history of the infection and pathogenesis [67][68][69]. Some of the model parameter values could be adjusted based on available empirical evidence. We chose a cell division rate (r) of 0.05 day 21 (i.e., a doubling time of 13.9 days) and a death rate (C) of 0.005 day 21 for resting cells (a half-life of t 1/2 = 139 days) and of 0.1 day 21 (t 1/2 = 6.9 days) for activated cells [70]. The homeostatic T H cells concentration was C 0 4 r ½ ~1 ,000 cells=mL [71]. We assumed that cellular division was suppressed above the homeostatic value, and this was modeled using a unit step function  (denoted H). The death rate constant of infected cells was C 4 i ½ = 1 day 21 [72][73][74][75][76][77], and the viral production rate of infected cells (a) was chosen such that the burst size was a/C 4 i ½ = 5,000 particles/ cell, a value which falls within the realistic range of 1610 3 -5610 4 particles/cell [78][79][80][81]. The rate constant for CTL-mediated cell killing was k = 0.1 day 21 = C 4 i ½ /10, such that at equilibrium approximately 10% of virus-induced cell death was due to CTL activity [52]. Published virion clearance constants (C V ) vary amply, from 0.3 day 21 [72,76] to .30 day 21 [82], and we chose an intermediate value of 15 day 21 . The HIV-1 mutation rate is approximately m = 3610 25 per nucleotide site per cell infection [83]. To consider a single escape mutant, we set the mutation rate to m = 10 25 . For some parameters, such as the in vivo rate of viral absorption to T H cells (s) and pAPCs (s D ), we did not find empirical data and these were adjusted to produce realistic peak titers, set-point titers, and fractions of infected cells. . Thus, wild-type infection rates decreased as the density of escape viruses increased and vice versa, reflecting competition among viruses for cells. The full list of variables and parameters and the systems of ordinary differential equations defining the models are shown in the Appendix S1. We started simulations with one infected cell and homeostatic values of resting target cells (T H or C). We also provided a large pool of HIV-susceptible active T H cells for the primary infection to mimic the initial spread of HIV through the mucosa or gutassociated lymphoid tissue (GALT). The model captured the typical HIV infection dynamics, in which viral load increases rapidly until reaching a peak days or weeks after transmission and subsequent exhaustion of the initial pool of susceptible cells and CTL activation make the viral load drop but fail to eradicate the infection (Figure 4). A dynamic equilibrium or set point was reached in which the virus continued to replicate, the immune system remained activated, and viral loads showed stable values within the typical range of 10 3 to 10 5 viral copies/mL [84] for the parameter values used. The control model produced similar dynamics.
CTL (non-T H ) escape mutants always became dominant in both the HIV and control models, whereas T H (non-CTL) cell escape mutants were neutral in the control model and neutral or , and pAPCs divide at rates, r 4 , r 8 , r D and die at rates C 4 r ½ , C 8 r ½ , and C D r ½ , reaching homeostatic concentrations C 0 4 ½r , C 0 8 ½r , and C 0 D½r , respectively. Activated T H cells become infected through contact with free virions at a rate constant s, whereas resting cells are assumed to be non-susceptible to the virus. T H cells are activated at a rate constant a 4 after contacting a pAPC with an HIV epitope or by other antigens at rate constant b (background activation). T H cells establishing synapses with pAPCs have a probability d of being concomitantly infected with the same viral type. Infected cells release virions at a rate constant a and die at a rate constant C 4½i . CTL pre-activation occurs after contacting infected cells (a 8 ) or pAPCs (a 8D ). A co-stimulatory signal from activated T H cells is necessary for completing CTL activation (a 89 ). Infected cells are lysed by CTLs at a rate constant k. Death rate constants for activated T H cells (C 4 ½a ), CTLs (C 8 a ½ ), and pAPCs (C D a ½ ) and virion inactivation rates (C V ) are not shown for simplicity. The full list of variables and parameters is available in Appendix S1, which also provides references to empirical work justifying the parameter values used (see also main text). A fraction m of the virions released in each cell infection become escape mutants. Avoidance of CTL activation or CTL-mediated killing leads to CTL escape (red bars), whereas avoidance of T H cell activation leads to T H escape (blue bars). The model allows full T-cell (purple), T H -only (blue), and CTL-only (red) escape mutants. (B) Control model in which the virus targets a nonimmune cell type C (e.g., hepatocytes, epithelial cells, etc.) instead of T H cells. Two key differences with the HIV model are that viral replication is not dependent upon immune activation and that transinfection does not take place. Variables, parameters, and equations for this model are also shown in Appendix S1. doi:10.1371/journal.pbio.1001523.g003 deleterious in the HIV model (not shown). The reason why T H cell escape does not per se provide a fitness advantage is that these mutants can be targeted by CTLs activated by the wild-type virus or other antigens. In addition, in the HIV case, T H escape mutants act as a sink of susceptible cells and are thus dependent on cells activated by the wild-type virus or on background activation for replicating, making them potentially deleterious. Therefore, considering CTLs and T H cells together, full T-cell (T H and CTL) escape may be favored or selected against depending on the balance between the benefits of CTL escape and the potential costs of T H escape. We are able to find parameter values that produced T-cell escape rates similar to those reported in studies of patient serial samples ( Figure 4) [50,67]. The timing of escape could also be varied from weeks postinoculation to years. In contrast, other parameter combinations disfavored T-cell escape mutants, and epitopes remained invariant if the infection was initiated with the wild-type, or they reverted to the wild-type if the infection was initiated with escape mutants (Figure 4C, F). In these cases, epitope conservation was promoted. In contrast, in the control model, T-cell escape occurred systematically and the rate of escape was faster than in the HIV case for the same parameter values. This showed that epitope conservation can be explained by the particular nature of the HIV infection, in which T H escape can be costly for the virus.
A central goal of our HIV model was to identify factors determining whether T-cell escape or epitope conservation should take place. We found that transinfection probability and immune activation levels were two such factors ( Figure 5). Since transinfection implies a temporal and spatial association between T H cell activation and infection, there should be some correlation between the type of epitope (wild-type or escape mutant) presented by a pAPC and the type of virion transmitted to synapsing T H cells. We denoted this correlation d. In the absence of transinfection or if every pAPC contained equal amounts of wild-type and T-cell escape virions, then d = 0, and therefore, T H cells activated by the wild-type virus would be fully accessible to T-cell escape viruses. As a result, T H escape should not be detrimental to the virus, and considering the benefit of evading CTLs, the net effect of T-cell escape should be positive. In the control model, since there was no possible transinfection, T-cell escape was always advantageous. In the HIV model, in contrast, as d increased, T-cell escape mutants had less and less access to T H cells activated by the wild-type virus, and since they could not produce their own pools of activated T H cells, these mutants had a selective fitness disadvantage and failed to outgrow the wild-type virus. The magnitude of this disadvantage depended inversely on levels of background activation, because the latter is a source of susceptible cells equally accessible to the wild-type and the escape mutant. Also, for d.0, the outcome depended on the strength of epitopespecific immune activation relative to background activation. If the epitope failed to produce T-cell activation (anergy), there was obviously no advantage associated with escape. Simulations also showed, however, that if the epitope triggered a strong T-cell activation (immunodominance), the escape mutant was disfavored too because the pool of activated T H cells to which the escape mutant had limited access represented a large portion of total susceptible cells. Therefore, the model suggests that, in HIV, immune escape should preferentially take place among epitopes triggering weak to moderate T-cell responses.

Conclusions
If immune avoidance is beneficial for a virus, escape mutants should tend to accumulate throughout the course of the infection unless they incur fitness costs (i.e., defects in other steps of the infection cycle not related to immune evasion) exceeding the benefits of escape. According to this, fast mutating viruses eliciting strong T-cell responses and establishing chronic infections should show the highest frequencies and rates of escape. Upon transmission to new hosts with different HLA types, however, the selective advantage of these mutants disappears and the virus should tend to revert to the wild-type if the escape mutation has some fitness cost, as has been amply documented in HIV [85][86][87][88][89][90][91]. This indeed constitutes a particular instance of antagonistic pleiotropy, a frequent process among RNA viruses whereby selectively advantageous mutations in one environment become deleterious in other environments [92]. As a result of this alternating selective regime, viral sequence variability at the population level should be promoted and epitopes should tend to be more variable than other genome regions. Previous work and our sequence data analysis ( Figure 1) show that HCV fits well into this pattern, whereas HIV-1 does not. The immune activation model provides a possible explanation for the unexpected epitope conservation in HIV-1. The model predicts that T-cell escape can be selected against depending on factors such as transinfection probability, immune activation levels, or epitope strength. Although CTL escape should per se be advantageous for the virus, CTL epitopes may be conserved if they co-map with T H epitopes. As shown here and in previous work [53], CTL/T H epitope co-mapping occurs more often than expected by chance, probably because these two types of epitopes share common cellular pathways. Depending on the above factors, thus, T-cell escape may occur only in some genome regions or only in certain individuals. In this sense, the model contributes to resolving the apparent paradox between T-cell epitope conservation and the large body of evidence showing that T-cell immunity is an important selective factor promoting HIV variability. These disparate findings are unlikely to result from use of different datasets or methodologies. For instance, in one study, it was found that that there was a general positive association between specific HLA types and the occurrence of escape mutations, but that this association was negative in some cases (i.e., epitopes were significantly more conserved among patients with the relevant HLA type than among those with nonmatching HLAs) [93]. It is also noteworthy that some escape mutations are rapidly favored, whereas others become dominant only after years of intrahost replication [52,94]. This is often interpreted in terms of the fitness costs of escape mutations [51,67,[88][89][90]. However, the immune activation model can also account for variable rates of escape even in the absence of fitness costs. Indeed, our model did not assume any fitness costs for escape mutations. Such costs may explain why some escape mutations increase in frequency more slowly than others, fail to be selected, or do not spread in the host population [26]. However, they cannot explain why genome regions containing T-cell epitopes tend to be more conserved than those not containing epitopes, since costs should equally apply to both.
The combination of parameters for which we predicted T-cell epitope conservation should be more likely in the lymph nodes at the chronic stage of the disease, during which HIV-specific activation of T H cells contributes to sustaining the infection and pAPC-mediated coupled activation-infection of T H cells should be frequent [47][48][49]. In contrast, the GALT and other mucosa contain large pools of background-activated and recent memory T H cells which are exploited by HIV during primary infection, thus making the virus less dependent on its own ability to activate T H cells [95][96][97]. It has been previously shown that T-cell escape rates are higher during primary infection than in the chronic stage of the disease [50][51][52]. Again, this is often interpreted in terms of fitness costs, since escape mutations paying weak fitness costs should be selected faster and be detected at earlier disease stages than those paying strong costs. Another interpretation is that epitopes triggering more vigorous T-cell responses tend to experience faster escape due to the stronger selective pressure exerted. Our model offers yet another possible interpretation: Tcell escape may actually be slowed down whenever HIV depends on its own ability to activate T H cells for replicating, and this is more likely to occur during the chronic stage than during primary infection. If the timing of escape was determined by fitness costs, then late escapes should tend to revert faster than early escapes upon transmission to new individuals with different HLA types because of their greater deleteriousness, whereas if the timing of escape was determined by immune activation levels, the reverse should be true. These predictions offer a way of testing the above alternative explanations for why rates of escape differ during the primary and chronic stages of the disease. Sequence analysis revealed that, after accounting for dataset biases, epitope conservation occurred mainly in Gag p24 and Nef ( Figure 2). As expected from the immune activation hypothesis, these proteins contain several immunodominant epitopes [20,98], whereas they are not necessarily more likely to exhibit fitness costs than other HIV genes. Finally, our model suggests that vaccines based on conserved T H epitopes might be counterproductive. By creating a pool of HIV-responsive T H cells, they may pave the way for viral replication in certain body compartments. This might contribute to understanding the unexpected results of the STEP vaccine trial, in which the vaccinated group was found to be at higher risk of infection than the placebo-treated control group, although there are many other possible explanations [99,100]. According to our model, an efficient approach to HIV vaccination may be to use CTL epitopes that do not stimulate T H cells. These epitopes may be combined with non-HIV T H epitopes that would co-stimulate CTLs without providing the immune system with a pool of HIVspecific memory T H cells. The idea that immune activation can favor pathogen replication and that, consequently, vaccines based on conserved epitopes may be counterproductive has also been proposed for Mycobacterium tuberculosis [101], although the mechanisms at play have not been elucidated in this case and may potentially differ from those of HIV. Interestingly, tuberculosisspecific T H cells are also preferentially depleted in HIV-infected individuals, whereas this is not observed in cytomegalovirus, another opportunistic pathogen [102,103]. It is possible that, by triggering immune activation, M. tuberculosis may benefit from HIV-mediated depletion of T H cells and subsequent immune impairment.

Sequence Analysis
A BLAST of the entire reference subtype B sequence (HXB2) was performed using the HIV Sequence Database search tool (www.hiv.lanl.gov/components/sequence/HIV/search/search.html), restrictfing the search to one sequence per patient. For each Gag, Pol, Env, Vif, Vpu, Vpr, and Nef, 100 translated sequences were aligned using the MUSCLE algorithm implemented in MEGA v5 (megasoftware.net) and HXB2 as reference sequence. Sequences with premature stop codons or partial readings were removed. Protein Shannon's entropy was calculated for each site of the alignment as S~P pln p ð Þ, where p denotes the frequency of each of the amino acids present at this site. Gaps were treated as another amino acid. These calculations were carried out using Entropy-one tool of the HIV Sequence Database with default options. To estimate synonymous (d S ) and nonsynonymous (d N ) substitutions rates from nucleotide sequences, we first used the Datamonkey server (www.datamonkey.org) to select the best substitution model and to identify significant recombination breakpoints using the GARD algorithm [104], using default parameters except for the inferred substitution model. Using this output, we run the SLAC algorithm [105] implemented in the HYPHY package [106] to identify codons under significantly positive or negative selection at a 0.1 probability threshold, and to estimate d N and d S . For the intrapatient entropy analysis, HIV-1 subtype B sequences were downloaded for each Gag, Pol, Env, and Nef from the HIV Sequence Database using the intrapatient search tool and restricting the search to patients with at least 10 sequences available and with known time since infection/seroconversion or Fiebig stage. Entropy values were obtained as above and, for each amino acid site, the within-host entropy was averaged over 100 patients. For HCV, 100 full-length subtype 1a genomes were downloaded from GenBank and translated to polyprotein sequences. Alignments and subsequent analyses were done as above, using the H77 sequence as reference.

Epitope Mapping and Analysis
HIV CTL and T H epitopes were downloaded from the HIV Molecular Immunology Database (www.hiv.lanl.gov/content/ immunology/tables/tables.html). For CTL epitopes, we used the full set of 741 entries or a curated list of 88 best defined epitopes (''A list''), whereas for T H epitopes we used the 132 available entries (no curated list has been defined for this group). Each epitope was aligned to the HXB2 sequence, and HXB2 genome sites were classified based on whether they mapped to at least one epitope. HCV subtype 1a epitopes were downloaded from the IEDB (www.immuneepitope.org) selecting the specific epitope type (MHC I for CTL epitopes and MHC II for T H epitopes), peptides from proteins as structure type, and Homo sapiens as host organism. This yielded 101 CTL and 27 T H epitopes. Genome-wide differences in amino acid entropy associated with the presence of T-cell epitopes were tested using a fixed-factor nested ANOVA (presence/absence of T-cell epitope nested within gene). For HIV, the contribution of RNA structure to the observed variability was tested using nucleotide instead of amino acid sites and adding this factor to the above ANOVA design (paired/nonpaired site according to published structure, nested within gene). The effect of 59R39 gradients was tested by including genome position as a covariate in the model. Separate effects of CTL and T H epitopes were tested using two-way ANOVAs.

High-Throughput Dataset Analysis
We used data from a study in which 396 synthetic peptides were tested for immunogenicity in a HIV-1 subtype C cohort from South Africa [18]. The dataset is freely available at www.hiv.lanl.gov/ content/immunology/hlatem/study3/index.html. We restricted the analysis to 113 individuals for which the full-length genome of the infecting virus was available. Of the total 44,748 assays considered (113 patients6396 peptides), the peptide matched exactly the amino acid sequence of the virus only in 13,127 cases (29.3%). The straightforward correction for epitope detection bias would be to restrict the analysis to this subset. However, 118 of the total 179 positive T-cell reactions (65.9%) corresponded to nonmatching peptides. These positives may be due to crossreactivity, but given that IFN responses can persist for years [107], they could also correspond to cases of immune escape. Therefore, to avoid missing a significant fraction of immune-driven viral variability, we also included these nonmatching T-cell-positive peptides as valid assays. If any, this should produce an artificially positive association between variability and immunogenicity. We then classified peptides in two categories according to whether or not they produced at least one positive T-cell reaction and tested for differences between these two groups using a one-way ANOVA in which the number of valid assays per peptide was included as a covariate in the model. Amino acid entropy values for each group correspond to marginal means estimated from the ANOVA model.

Immune Activation Model
We developed a system of ordinary differential equations describing how T H cell, CTL pAPC counts, and viral loads vary with time as described in the text and Figure 3A. We also build a control model including a nonimmune viral target cell type C ( Figure 3B). The full list of variables and parameters, a detailed description of the model, and the systems of ordinary differential equations are available in Appendix S1. Simulations were performed in Mathematica 8 (Wolfram Research). SBML files describing the model have been deposited in the BioModels Database (MODEL1302180001 for immune activation model and MODEL1302180002 for control model)-

Supporting Information
Appendix S1 Model variables, parameters, and systems of differential equations describing the dynamics of wild-type and T-cell (T H /CTL) escape viruses for the HIV and control models. (PDF)