Whole Genome Deep Sequencing of HIV-1 Reveals the Impact of Early Minor Variants Upon Immune Recognition During Acute Infection
(A) PCR amplification strategy using four ∼3.2 kb amplicons spanning gag through nef of the HIV-1 genome. Amplicons were then pooled, sheared, barcoded by patient or time point, and batched for library construction and single-molecule 454 pyrosequencing. (B) AssembleViral454 v1.0 outperforms other algorithms in its ability to assemble de novo continuous consensus contigs that span the complete target region. Results are shown for 67 acute, chronic, or controller patient samples that had successful amplification of all four amplicons and at least 10-fold sequence coverage (sequencing reads per site) across >70% of the target genome. Black lines denote the mean score for each assembler, red line the median, red box ends the 25th and 75th quantiles, and red box whiskers the upper and lower quartiles plus/minus 1.5 times the interquartile range, respectively. (C) ReadClean454 v1.0 corrects for read alignment errors due to various sequence error modes and significantly reduces process error rate. Results shown are for virus from two infectious clones, NL43 (WT) and NL43 (RKLM) containing two point mutations in Gag , sequenced independently to 417- and 189-fold average coverage, respectively. Errors are defined as base calls or InDels that differ from the assembled consensus at a given position, and the read error rate is the total number of errors per total number of NQS passing bases interrogated. Percentage of reads on which a correction was made at each step are shown in parentheses. A final average process error rate of 0.5×10−4 was achieved based on both infectious clones. (D) V-Phaser v1.0, utilizes phasing information to identify a variant pair found in 1.0% of the reads covering both loci when there are 200 such reads; without phase, a three-fold increase in coverage is required to achieve the same 1.0% detection threshold. A variant at a frequency of 0.1% can be detected when phased coverage is 2999-fold.