Deep Sequencing Analysis of HCV NS3 Resistance-Associated Variants and Mutation Linkage in Liver Transplant Recipients

Viral variants with decreased susceptibility to HCV protease inhibitors (PIs) occur naturally and preexist at low levels within HCV populations. In patients failing PI monotherapy, single and double mutants conferring intermediate to high-level resistance to PIs have been selected in vivo. The abundance, temporal dynamics and linkage of naturally occurring resistance-associated variants (RAVs), however, have not been characterized in detail. Here, using high-density pyrosequencing, we analyzed HCV NS3 gene segments from 20 subjects with chronic HCV infection, including 12 subjects before and after liver transplantation. Bioinformatics analysis revealed that Q80 substitution was a dominant variant in 40% of the subjects, whereas other RAVs circulate at low levels within quasispecies populations. Low frequency mutation linkage was detectable by Illumina paired-end sequencing in as low as 0.5% of the mock populations constructed from in vitro RNA transcripts but were uncommon in vivo. We show that naturally occurring RAVs are common and can persist long term following liver transplant at low levels not readily detectable by conventional sequencing. Our results indicate that mutation linkage at low levels could be identified using the Illumina paired-end approach. The methods described here should facilitate the analysis of low frequency HCV drug resistance, mutation linkage and evolution, which may inform future therapeutic strategies in patients undergoing direct acting antiviral therapies.

30 or 35 cycles of denaturation at 94°C for 30 s, annealing at 48°C for 30 s, and then extension at 72°C for 1 min; and final extension at 72°C for 10 min. Samples with a viral load of greater than 5x 10 7 copies/mL were amplified for 30 cycles, whereas 35 cycles were used for samples with lower viral loads.
To estimate the technical error rates for our procedure, control RNA of known sequence was generated by in vitro transcription of linearized plasmid containing a T7 promoter and full-length H77C genotype 1a sequence using MEGAscript T7 kit (Ambion, Austin, TX). Prior to in vitro transcription, the plasmid was linearized using EcoRI (Promega, Madison, WI) that cuts downstream of HCV NS3. Control RNA transcripts were subjected to identical experimental procedures including RT-PCR and pyrosequencing as patient-derived viral RNA.

Construction of NS3 in vitro RNA transcripts and mock communities:
Site-directed mutagenesis at amino acid positions 54 and 155 of the NS3 gene was performed using the H77C genome as the template and the Quickchange site directed mutagenesis kit, following manufacturer"s instructions (Agilent, Santa Clara, CA). Plasmids containing the mutations were linearized and transcribed in vitro as described above. Mock RNA communities were constructed using different concentrations of wild-type, single (T54A) and double (T54A/R155K) mutant transcripts, and were subjected to identical amplification and deep sequencing procedures as the extracted RNA from clinical specimens.
Population and clonal sequencing: Direct sequencing of the NS3 gene was performed using primers (2943F and 4231R, Supplementary Table S1) upstream and downstream from the Roche/454 and Illumina primer binding sites. This allowed determination of both the NS3 consensus, which specifies the most predominant nucleotide at each position, and the sequences at the primer binding sites. For clonal sequencing, RT-PCR products amplified using 454 fusion primers were cloned into pCR2-TOPO vectors and transformed into chemically competent TOP10 cells (Invitrogen, Carlsbad, CA).
Up to 32 clones per sample were sequenced. Phylogenetic analysis based on the population sequencing data was performed using Geneious and the Phangorn phylogenetic analysis package in R.
Roche/454 pyrosequencing: RT-PCR products amplified using 454 fusion primers were separated by agarose gel electrophoresis, and the fragments of expected size excised and extracted using the Qiagen gel extraction kit (Qiagen, Valencia, CA). Gel-purified PCR products were quantified using Qubit (Invitrogen, Carlsbad, CA), pooled by equimolar concentrations, and subjected to bidirectional pyrosequencing using the titanium chemistry.

Illumina paired-end sequencing:
The Illumina platform uses dye-terminated primer extension to sequence DNA. The algorithm for base calling relies on fluorescent intensities from the first several nucleotides incorporated to normalize the fluorescent signals for subsequent nucleotide extension. To reduce the likelihood that adjacent clusters on the Illumina solid support would be scored as one amplicon during sequencing, we engineered barcodes that varied between 4 and 8 nucleotides in length. In addition, we chose barcode sequences to ensure that at least three different nucleotides are represented. Amplified NS3 gene segments (from 5 clinical samples and 2 control in vitro transcripts, Table 2) were purified from electrophoresis gel slices, quantified using Qubit (Qiagen, Valencia, CA) and pooled at equimolar amounts.
Purified amplicons were then tailed with flow cell adaptors (12 cycles of PCR amplification, Figure 1). The prepared library was quantified using Kapa Library Quantification kit (Kapa Biosystems, Woburn, MA) and sequenced on Illumina Genome Analyzer IIx at the University of Florida ICBR sequencing core.
Bioinformatic analysis: Pyrosequence reads were processed using the following quality control criteria: (i) an exact match to the barcode and the primer sequences, (ii) > 360 bases for forward reads; > 290 bases for reverse reads in length before trimming the barcode and primer sequences, and (iii) no ambiguous bases (Ns). Reads were grouped based on barcodes, then barcodes and primer sequences were trimmed. Each read was further trimmed to a final length of ~337 nucleotides for forward reads and ~264 nucleotides for reverse reads. The trimmed reads were aligned to the H77C reference sequence using global multiple sequence alignment (POA -"Partial Order Alignment" multiple sequence aligner) (1), then the codon positions associated with resistance to protease inhibitors were identified (custom R scripts were created) (2). Of the reads that contain codon-changing nucleotide substitutions, pairwise sequence alignments (ClustalW2) (3)were performed followed by manual inspection of the aligned sequences. This second quality control step ensured that the observed nucleotide substitutions did not arise from sequence alignment errors. Clustering and OTU formation were carried out using the USEARCH/UCLUST suite (4). Phylogenetic analysis was performed using UPGMA based on the population sequencing data and the Phangorn phylogenetic analysis package in R (2).
To estimate the technical error rates, PCR products amplified from in vitro transcripts of known NS3 sequences (as described above) were subjected to pyrosequencing. The overall mean error rate including insertions, deletions and substitutions was ~0.5%. To distinguish authentic variants at drug resistance sites from technical artifacts due to nucleotide misincorporation during PCR amplification and pyrosequencing, position-specific background error rate was calculated to define authentic drug resistance mutations using a chi-square test.
Illumina paired-end reads were processed using the following criteria: (i) exact match to barcode and primer sequences; (ii) no ambiguous bases; (iii) minimum length of 75 bases after trimming so it would cover amino acid positions V36 through V55 in forward amplicon (PE1), and R155 through I170 in reversed amplicon (PE2) and (iv) both reverse and forward sequences passed all previous quality steps (no reads with unknown "B" quality scores and no reads that failed Illumina quality check "0"). The filtered, trimmed reads were then aligned to H77C reference sequence, and the codon positions associated with PI resistance were identified and the frequency of resistance mutations calculated (custom R scripts were created).

Supplementary Table S1
: Primers used for PCR amplification of HCV followed by 454 pyrosequencing or Illumina paired-end sequencing. H77C genome coordinates for each forward and reverse primer is shown in the numerical portion of the primers name. "xxx" denotes the location of unique barcode sequences (4 or 8 bp).