Transmission of Single HIV-1 Genomes and Dynamics of Early Immune Escape Revealed by Ultra-Deep Sequencing

We used ultra-deep sequencing to obtain tens of thousands of HIV-1 sequences from regions targeted by CD8+ T lymphocytes from longitudinal samples from three acutely infected subjects, and modeled viral evolution during the critical first weeks of infection. Previous studies suggested that a single virus established productive infection, but these conclusions were tempered because of limited sampling; now, we have greatly increased our confidence in this observation through modeling the observed earliest sample diversity based on vastly more extensive sampling. Conventional sequencing of HIV-1 from acute/early infection has shown different patterns of escape at different epitopes; we investigated the earliest escapes in exquisite detail. Over 3–6 weeks, ultradeep sequencing revealed that the virus explored an extraordinary array of potential escape routes in the process of evading the earliest CD8 T-lymphocyte responses – using 454 sequencing, we identified over 50 variant forms of each targeted epitope during early immune escape, while only 2–7 variants were detected in the same samples via conventional sequencing. In contrast to the diversity seen within epitopes, non-epitope regions, including the Envelope V3 region, which was sequenced as a control in each subject, displayed very low levels of variation. In early infection, in the regions sequenced, the consensus forms did not have a fitness advantage large enough to trigger reversion to consensus amino acids in the absence of immune pressure. In one subject, a genetic bottleneck was observed, with extensive diversity at the second time point narrowing to two dominant escape forms by the third time point, all within two months of infection. Traces of immune escape were observed in the earliest samples, suggesting that immune pressure is present and effective earlier than previously reported; quantifying the loss rate of the founder virus suggests a direct role for CD8 T-lymphocyte responses in viral containment after peak viremia. Dramatic shifts in the frequencies of epitope variants during the first weeks of infection revealed a complex interplay between viral fitness and immune escape.


Section I. Basic supporting figures and tables
. Integration of previously reported and newly available basic clinical data regarding SUMA, WEAU, and CH40 with 454 sampling timeline. Table S2. Conventional sequencing variants and previously available immunological data regarding escape. Table S3. Aligned amino-acid sequences of the epitope regions with variant frequencies, organized by subtype, escape form, and time point. Figure S1. Annotated example of format for Table S3.   Section II. Immune Escape Dynamics Table S6. Estimates of accumulation rates of dominant viral variants. Section III. Subtype B consensus: reversion and escape

Section I Basic supporting figures and tables
These tables (discussed in the main text) provide a comprehensive summary of immunological data on these patients. New data generated in the course of this study were compiled and integrated with data from previous publications [1][2][3][4][5].

Section II Immune Escape Dynamics
We applied previously developed methods to quantify rates of viral escape from the CTL response [6][7][8]  Given our estimates on the rate of accumulation of the dominant viral escapes, we also estimated the time when the selection to avoid the CTL response started (see below for details and the assumptions).
The day when the initial frequency of the selected variant was predicted to be 5 × 10 −5 (i.e., between 10 −5 and 10 −4 ) was used as the estimate of the day (range of days) on which selection was initiated.
By this measure, we obtained the following estimates for the start of selection (in days relative to day 0 in our data): WEAU Env AY9, 4.9 (1.3 to 6.5; estimated escape rate ε = 0.44 day −1 ; doubling time  Table S6 shows the accumulation or loss rates of various escape forms. To calculate the 95% confidence intervals (CIs) for the estimated rate of accumulation of different escape variants we used a bootstrap approach to regenerate escape data [9]. The model for the dynamics of escape of a virus from a single CTL response has been described in detail previously [6][7][8]. Under several assumptions, change in the frequency of the escape variant f (t) is given by the formula where f 0 is the frequency of the escape mutant in the population at some time which we arbitrarily call t = 0 and ε is the rate of accumulation of the escape variant in the population. From Equation 1 it follows that the ratio of a given escape variant to all other variants in the population: changes exponentially over time: where z(t) and z 0 are the ratio of the frequency of the escape mutant to the frequency of the wild type virus in the population at some time t and at time t = 0, respectively.
Note that this model assumes that selection is constant during the whole period of selection. Equa- Section III Subtype B consensus: reversion and escape As discussed in the main text, there were a number of positions where the transmitted virus did not match the B subtype consensus in these three subjects. These are indicated in the alignment in Table S3. Fig. 8 summarizes the B consensus amino acid frequencies in each patient at each time point in the epitope regions, and Table S7 provides a more explicit breakdown of the data in the epitopes and V3 regions.
Within epitopes (shown in red) there was selection for the B consensus in 6/8 positions. 4/6 of these B consensus substitutions diminished in frequency over time. There was no selection for B consensus amino acids outside of the epitopes (blue), hence no evidence for rapid reversion on this time scale. In contrast, RIER, with a chronic infection, often carried common B consensus variants ( Fig. 8 in green, and Table S7 A and B): in 3 positions, over 30% of the sequences matched the consensus, at 3, the consensus was found at low but clearly replicating circulating levels (1-15%). This left only one position with undetectable levels of B consensus amino acids (green). In 4 chronic-infection patients in our earlier 454 study [10], consensus amino acid frequencies similar to those in RIER were present (See Table S8). Thus during chronic HIV-1 infection, the B consensus amino acids are generally present and replicating even when they are not the most common form in an individual. Tables related to experimental methods   Table S9 presents the inner PCR primers used for specific fragment amplification following half-genome RT-PCR, with multiplex sequence tags. Figure S5 summarizes the amplification protocol, which was designed to maintain diversity in the amplified DNA pools used as sequencing template.  Table S3, which includes aligned amino-acid sequences of the epitope regions with variant frequencies, organized by subtype, escape form, and time point. The subject ID, and count of the number of variants with a given protein sequence are shown. The epitope is in bold, and the array of secondary mutations that are found in conjunction with the N to K substitution are shown; the dominant escape forms have secondary mutations that are consistent with a Poisson distribution, with the exception of the overlapping epitope region in SUMA Tat (Table S5). Table S3 includes complete data for all 4 epitope regions.   Figure S4. Distributions of accumulation rates of viral variants generated during acute infection. In the cases when the frequency of a variant was below the level of detection (i.e., less than 1 sequence per sample), we added the value of 1/N to the variant frequency at that time point. Equation 3 was used to estimate the rate of escape ε for every viral variant. The distribution of escape rates is very wide, with some variants escaping at negative rates (i.e., declining in frequency), and a very few having extremely rapid escape rates. A) WEAU Env AY9 epitope; B) CH40 Nef SR9 epitope; C) SUMA Rev QL9 epitope; D) SUMA Tat FY16 multi-epitope region. Figure S5. Amplification protocol. The protocol was designed with the intent of reducing loss of diversity during PCR amplification by (1) limiting the number of cycles (2) using large amounts of template, and (3) using multiple small amplification reactions which were pooled for sequencing.