Use of Four Next-Generation Sequencing Platforms to Determine HIV-1 Coreceptor Tropism

HIV-1 coreceptor tropism assays are required to rule out the presence of CXCR4-tropic (non-R5) viruses prior treatment with CCR5 antagonists. Phenotypic (e.g., Trofile™, Monogram Biosciences) and genotypic (e.g., population sequencing linked to bioinformatic algorithms) assays are the most widely used. Although several next-generation sequencing (NGS) platforms are available, to date all published deep sequencing HIV-1 tropism studies have used the 454™ Life Sciences/Roche platform. In this study, HIV-1 co-receptor usage was predicted for twelve patients scheduled to start a maraviroc-based antiretroviral regimen. The V3 region of the HIV-1 env gene was sequenced using four NGS platforms: 454™, PacBio® RS (Pacific Biosciences), Illumina®, and Ion Torrent™ (Life Technologies). Cross-platform variation was evaluated, including number of reads, read length and error rates. HIV-1 tropism was inferred using Geno2Pheno, Web PSSM, and the 11/24/25 rule and compared with Trofile™ and virologic response to antiretroviral therapy. Error rates related to insertions/deletions (indels) and nucleotide substitutions introduced by the four NGS platforms were low compared to the actual HIV-1 sequence variation. Each platform detected all major virus variants within the HIV-1 population with similar frequencies. Identification of non-R5 viruses was comparable among the four platforms, with minor differences attributable to the algorithms used to infer HIV-1 tropism. All NGS platforms showed similar concordance with virologic response to the maraviroc-based regimen (75% to 80% range depending on the algorithm used), compared to Trofile (80%) and population sequencing (70%). In conclusion, all four NGS platforms were able to detect minority non-R5 variants at comparable levels suggesting that any NGS-based method can be used to predict HIV-1 coreceptor usage.


Introduction
The discovery that human immunodeficiency virus type 1 (HIV-1) requires a co-receptor to enter target cells, mainly the chemokine receptors CCR5 or CXCR4 [1,2], was not only crucial to better understand HIV-1 transmission and pathogenesis but opened the door for the designing of novel antiretroviral drugs targeting host cell entry. Multiple strategies to block the replication of CCR5-or CXCR4-tropic (R5 or X4, respectively) viruses have been studied [3], leading to the approval for clinical use of the first CCR5-receptor antagonist (maraviroc, Selzentry/Celsentri, Pfizer, NY) in 2007 [4]. Like other co-receptor antagonists in development, maraviroc's activity is very specific, showing no direct activity against viruses able to use CXCR4 to enter the target cell [5,6]. Thus, an HIV-1 tropism test should be performed prior to initiation of maraviroc-containing regimens to rule out the presence of detectable non-CCR5-tropic (non-R5) virus [7,8,9]. Several phenotypic and genotypic assays have been developed to assess HIV-1 co-receptor usage or tropism [7]. Most phenotypic assays involve the generation of patient-derived env-recombinant viruses to determine their ability to infect reporter cell lines expressing HIV-1 receptors and co-receptors [10,11,12,13]. The new version of Trofile TM (Monogram Biosciences, South San Francisco, CA) [11], i.e., the Enhanced Sensitivity Trofile Assay (ESTA) [14] is currently the most widely used HIV-1 coreceptor tropism assay. Nonetheless, phenotypic assays share a few practical limitations such as high cost and long turnaround time, which restrict their use and consequentially hinder access to future CCR5 antagonists/agonists. Genotypic tests are a faster, less expensive alternative to inferring HIV-1 coreceptor tropism from env sequences [7,8]. Considerable effort has been made to develop genotypic assays able to predict HIV-1 co-receptor usage based on just the V3 region of the env gene [7,9,15], which seems to be the principal determinant of HIV-1 tropism [16,17]. However, genotypic tests based on bulk capillary electrophoresis (Sanger) sequencing of a population of V3 sequences lack the sensitivity to detect minority variants present below 20% of the viral population [18,19,20,21]. For that reason, several studies have evaluated the use of nextgeneration (NGS) or deep sequencing to detect minority non-R5 HIV-1 variants [22,23,24,25,26,27,28] or low frequency drugresistant variants that could lead to treatment failure [29,30,31,32,33]. Prediction of HIV-1 coreceptor usage by deep sequencing is highly concordant with phenotypic assays (82% to 87%) [25,28,34], has improved sensitivity for detecting non-R5 variants over population sequencing [22,25,27,35,36], and predicts the success of maraviroc-based antiretroviral regimens [25,28].

Clinical samples
Twelve RNA specimens, derived from plasma samples collected from HIV-infected individuals prior to enrollment in the (i) maraviroc expanded access program (EAP) at multiple centers in Europe or (ii) ALLEGRO trial, a multicenter study to assess the Figure 1. Schema summarizing the strategy followed in this study to compare the use of the four next-generation sequencing platforms (454 TM , IlluminaH, PacBioH, and Ion Torrent TM ) to determine HIV-1 coreceptor tropism (see text for full details). doi:10.1371/journal.pone.0049602.g001 NextGen Sequencing and HIV-1 Tropism PLOS ONE | www.plosone.org prevalence of R5 HIV-1 variants in Spain [39], were obtained from the Hospital Carlos III (Madrid, Spain) [21]. Phenotypic HIV-1 coreceptor tropism was determined at baseline using the original version of the Trofile TM assay (Monogram Biosciences), which had a reported non-R5 variant detection limit of 5 to 10% [11]. Written informed consent was obtained from the patients before participation in the study as previously described [21,39].

RT-PCR amplification
Viral RNA was reverse-transcribed using AccuScript High Fidelity Reverse Transcriptase (Stratagene Agilent; Santa Clara, CA) and the corresponding antisense external primer in 20-ml reaction mixture containing 1 mM dNTPs, 10 mM DTT and 10 units of RNase inhibitor. Viral cDNA was then PCR amplified using a series of external and nested primers with defined cycling conditions. Using the same external (first-round) PCR reactions that covered the entire HIV-1 envelope (env) gene (2,830 nt), two different PCR fragments were amplified due to intrinsic requirements of each NGS platform (Fig. 1). These twelve samples are part of a larger cohort analyzed in a separate study (Weber and Quinones-Mateu, submitted for publication). In that study 105 patient-derived 337 bp amplicons corresponding to a short region around the HIV-1 V3 loop were analyzed with the 454 TM system. Here, we were able to sequence the same amplicons using the PacBioH RS platform; however, the IlluminaH and Ion Torrent TM systems used in this study are better suited to sequence longer DNA regions, shearing them into small fragments (150 to 200 bp). Processing and sequencing the small 337 bp PCR products may have been difficult with these two platforms. Thus, 337-nucleotide (nt) fragments encompassing the V3 region were generated in single nested (second-round) PCR reactions. These small amplicons were sequenced using 454 TM and PacBioH RS sequencing systems. A larger fragment corresponding to the env gene was amplified as a 2,302 nt fragment, that is, all the surface glycoprotein (gp120) and most of the transmembrane glycoprotein (gp41), missing only 321 nt of the gp41 cytoplasmic domain.
These amplicons were sequenced using IlluminaH and Ion Torrent TM platforms. External PCR reactions were carried out in a 50-ml mixture containing 0.2 mM dNTPs, 3 mM MgCl 2 and 2.5 units of Pfu Turbo DNA Polymerase (Stratagene). Nested PCR reactions for population sequencing analysis were carried out in 50-ml mixture containing 0.2 mM dNTPs, 0.3 units of Pfu Turbo DNA Polymerase and 1.9 units of Taq Polymerase (Denville Scientific; Metuchen, NJ), then purified with the QIAquick PCR Purification kit (Qiagen) and quantified with Quant-iT PicoGreen dsDNA kit (Invitrogen). Nested PCR reactions for deep sequencing analysis were customized for each NGS system as described below.
Population (standard Sanger) sequence analysis PCR products corresponding to the gp120/gp41-coding regions of HIV-1 were purified with the QIAquick PCR Purification kit (Qiagen) and the V3 region sequenced (population or global sequence) using AP Biotech DYEnamic ET Terminator cycle with Thermosequenase II (Davis Sequencing LCC, Davis, CA) ( Fig. 1). Nucleotide sequences were analyzed using DNASTAR Lasergene Software Suite v.7.1.0 (Madison, WI). EAP, Expanded Access Program for the use of maraviroc at multiple centers; Allegro, multicenter study to assess the prevalence of R5 HIV-1 variants in Spain [39]. b Virologic response at week 12 following a maraviroc-based antiretroviral regimen, defined as a decrease in plasma viral load below 400 copies/ml. End of study, HIVinfected individuals did not enter the study following the detection of non-R5 viruses at baseline (see below); n.a., not available. C HIV-1 coreceptor tropism was determined at baseline using a phenotypic test, i.e., the original Trofile TM assay (Monogram Biosciences) [11]. R5, CCR5-tropic virus; D/M, dual mixed, that is a combination of CCR5-and CXCR4-tropic viruses; n.d., not determined. doi:10.1371/journal.pone.0049602.t001 products were clonally amplified on capture beads in water-oilemulsion micro-reactors. A total of 500,000 HIV-1 env enriched-DNA beads were deposited in the wells of a 454 GS FLX instrument (454 Life Sciences/Roche) and pyrosequenced in forward direction using 200 cycles in a ten-hour sequencing run. -Illumina: A 2,302 nt fragment of the env gene was amplified from an external PCR product as described above for the population sequencing, then purified (QIAquick PCR Purification, Qiagen) and quantified (Quant-iT PicoGreen dsDNA, Invitrogen). Paired-end DNA libraries were prepared using Nextera DNA sample prep kit (Epicentre Biotechnologies, Madison, WI) according to manufacturer's protocol. Briefly, 50 ng of PCR product were subjected to tagmentation with high-molecular-weight buffer and Nextera enzyme mix at 55uC for 5 minutes followed by limited nine-cycle PCR using half of the purified tagmentation reaction with Nextera PCR mix. Ion Torrent. A 2,302 nt fragment of the env gene was amplified from an external PCR product, then purified (QIAquick PCR Purification, Qiagen) and quantified (Quant-iT PicoGreen dsDNA, Invitrogen) as described above for Illumina. The Ion Xpress TM Fragment Library Kit (Life Technologies, Carlsbad CA) was used to construct a library for shotgun sequencing on the Ion Personal Genome Machine (PGM, Ion Torrent/Life Technologies). Briefly, amplicon DNA was randomly fragmented using the Ion Shear TM Plus Reagent (Life Technologies). The P1 adapter  Finally, it is important to note that the 454 TM sequencing was performed in the laboratory of Dr. Hendrik Poinar (McMaster University, Hamilton, Canada), while for IlluminaH, PacBioH, and Ion Torrent TM the PCR products were sent for sequencing at the respective company. V3 nucleotide sequences obtained by deep sequencing using any of the NGS platforms (as described below) have been submitted to the Los Alamos National Laboratory HIV-DB Next Generation Sequence Archive (http://www.hiv.lanl. gov/content/sequence/HIV/NextGenArchive/Archer2012).

Read mapping, variant sensitivity and phylogenetic inference
To minimize the amount of data loss due to high sequence variability and to allow for interpatient indel variation across the V3 region, sample-specific reference sequences were constructed as previously described [24]. First, sequences corresponding to the HIV genomic region spanning positions 6,900 to 7,400 on the HXB2 reference strain (accession no: K03455) were extracted, followed by replacement with the V3 population sequence derived from each sample. For each sample, reads derived from the four NGS platforms were then independently mapped to the respective sample-specific reference sequence (Fig. 1) and all indel and substitution information in relation to the reference sequenced stored as described [40]. For each dataset, reads spanning the V3 region (coordinates 210 to 315 within the reference templates) were extracted, truncated and translated for genotyping. Within each dataset only one representative of any identical variant was maintained, but the overall frequency stored. Finally, all variants with a frequency .5 within the population, calculated by each platform, were combined and clustered using a neighbor-joining algorithm implemented within Segminator II [40].

HIV-1 coreceptor tropism determination
HIV-1 co-receptor tropism was predicted from population and extracted V3 read data as described above using several bioinformatics tools (Fig. 1). In the case of global sequences, nucleotide mixtures were considered when the second highest peak in the electropherogram was above 25%, and then these nucleotide mixtures were translated into all possible permutations. The algorithms used to infer HIV-1 tropism from V3 amino acid sequences were: (i) Geno2Pheno [41], with false positive rates (FPR, i.e., predicted frequency of classifying an R5 sequence as non-R5 virus) based on optimized cutoffs for determining HIV-1 coreceptor usage (3.5%) as previously described [25,28,42] or the recommendation from the European Consensus Group on clinical management of HIV-1 tropism testing (10%) as described in the Geno2Pheno website (http://coreceptor.bioinf.mpi-inf.mpg.de/ index.php), (ii) Web PSSM using the subtype B x4r5 matrix [43], and (iii) the 11/24/25 charge rule [44,45] implemented within our analysis pipeline. Finally, plasma samples were classified as containing non-R5 viruses if at least 2% of the individual sequences, as determined by deep sequencing, were predicted to be non-R5 [25,28].

Statistical analyses
Descriptive results are expressed as median values, interquartile ranges, and standard deviations. Pearson correlation coefficient was used to determine the strength of association between categorical variables. All differences with a P value of ,0.05 were considered statistically significant. The kappa coefficient, which assesses a chance-adjusted measure of the agreement between any number of categories, was calculated using ComKappa2 v.2.0.4 [46]

Clinical samples
Twelve representative specimens were selected from 167 patients analyzed in a study comparing phenotypic and genotypic HIV-1 coreceptor tropism assays [13] based on the level of concordance among the different tests. Eleven samples corresponded to patients participating in the maraviroc EAP, with three failing to enter the study following the detection of non-R5 (D/M) variants at baseline and five out of the remaining eight patients responding to the maraviroc-based regimen at week 12 (Table 1). Population sequencing was performed not only to infer HIV-1 tropism but also to verify sample identity, showing a 99.9% homology with the V3 nucleotide sequences published for these specimens [21] (data not shown).

Data processing and NGS platform error rates
For each platform, the number of reads containing successfully mapped and complete V3 sequences, as well as those that were subsequently translated, were calculated. All insertions, deletions, and substitutions relative to the sample-specific reference sequences were tabulated during read mapping, and prior to any filtering based on correctly translating entire V3 sequences ( Table 2). In general, all platforms showed low substitution (1.8, 1.7, 1.3, and 1.7 mean #/read), deletion (0.2, 0.01, 0.4, and 0.5 mean #/read), and insertion (2.1, 1.7, 1.9, and 2.4 mean #/read) rates across all samples for 454 TM , IlluminaH, PacBioH, and Ion Torrent TM , respectively ( Fig. 2 and Table 2). As expected, there was interpatient variability; for example, sample 10-172 showed slightly higher substitution (5.1, 5.1, 2.5, and 5.1 mean #/read) and insertion (4.5, 4.0, 2.9, and 5.5 mean #/read) rates across all platforms compared with the other samples. Nevertheless, the overall low rates of indels and substitutions resulted in a high number of successfully translated V3 sequences from the original V3 spanning reads, suggesting that each one of the NGS platforms could be used for genotyping of complex HIV-1 populations.  Figure 4. Comparison of the clustering of variants across platforms. The ten most common nucleotide V3 sequences from samples 10-65, 10-69, and 10-73 -obtained with each of the four NGS platforms (454 TM , IlluminaH, PacBioH, and Ion Torrent TM )-were aligned against the respective population (sanger) sequence from the respective patient using Clustal X 2.0 [76]. For each patient, every unique variant is identified by the NGS platform used and the number of sequences (frequency) obtained, e.g., 454#1290. For each position only those nucleotides that differ from the HIV-1 population diversity All major virus variants were detected by each platform with similar frequency, with the exception of sample 10-137 where Ion Torrent detected a different dominant variant than the other three platforms or in sample 10-172 where an insertion of three nucleotides was observed in most variants sequenced with 454, Illumina, and PacBio but not with Ion Torrent (Figs. 3, 4, and 5). All the amplicons, either the 337-nt fragments encompassing the V3 region or the 2,302 nt fragments covering most of the env gene, were obtained from the same external PCR products; however, the amplicons sequenced by Ion Torrent were generated seven months later than the products analyzed by the other NGS platforms. Thus, it is possible that in some cases (such as with sample 10-137) a different majority variant within the quasispecies population may have been selected during PCR amplification. Moreover, in some cases and at lower frequencies, unique variants were platform-dependent, probably related to platform-dependent error rates and/or stochastic PCR errors (Fig. 3). Nevertheless, and in general, all platforms were able to identify the same major variants within the population and similar proportion of low frequency variants (i.e., at a frequency ,0.5%) (Fig. 3, inserts). For example, mixtures of at least two predominant populations were accurately identified by all four NGS platforms in samples 10-80, 10-133, and 10-180 (Figs. 3, 4, 5, and 6). Phylogenetic trees constructed by combining V3 sequences obtained with each platform confirmed these findings (Figs. 4, 5, 6, and 7).

HIV-1 tropism determination by deep sequencing relative to Trofile TM and population-based sequencing
The main goal of this study was to evaluate the ability of the four NGS platforms to determine HIV-1 coreceptor tropism. For that, the ten samples with known virologic response at week 12 (Table 1) were selected to compare the results from phenotypic and genotypic (population and deep sequencing) HIV-1 tropism assays, the latest using four different algorithms to predict HIV-1 coreceptor usage. Minority non-R5 variants were detected at comparable levels with a few exceptions mainly linked to the algorithm used to infer HIV-1 tropism rather than the NGS platform. For example, PacBioH (samples 10-91, 10-176, and 10-180; 11/24/125 rule) and Ion Torrent TM (samples 10-180 and 10-69; 11/24/25 rule and Geno2Pheno, respectively) detected a higher frequency of non-R5 variants than the other NGS systems (Fig. 8).

Discussion
The use of CCR5 antagonists to block HIV-1 replication has accelerated the development of HIV-1 coreceptor tropism assays [7,8,9] and stressed the need for novel, sensitive, and more affordable tests to increase treatment with this drug class. Although Trofile is the most commonly used phenotypic assay for HIV-1 coreceptor tropism, less sensitive genotypic tests based on HIV-1 population (Sanger) sequencing are frequently used in Europe, leading to the rapid adoption of deep sequencing technologies in genotypic HIV-1 coreceptor tropism protocols [22,23,24,25,27,28,34,35,36,47]. Based on the need for these NGS-based genotypic assays, we have compared the ability of four NGS platforms (454 TM , IlluminaH, PacBioH, and Ion Torrent TM ) to detect minority variants, and to infer the presence of non-R5 viruses within the HIV-1 population.
Next generation sequencing has been used in a multitude of biological fields, from the sequencing of whole genomes of animals, plants, and microbes, to targeted studies on polymorphisms related to various genetic disorders and cancer, most of them based on IlluminaH [48,49,50,51] and 454 TM [52,53,54] platforms and more recently using PacBioH [55,56,57] and Ion Torrent TM [58,59,60] systems. To date all published HIV-related studies have used the 454 TM platform, due in part to being one of the first NGS systems to provide longer read lengths [53,61]. As expected each deep sequencing platform differs in terms of the chemistry, read length, yield, error rate, turn-around time, and overall cost [62,63]. Here, we sequenced the HIV-1 V3 region from the same RNA aliquots obtained from 12 patients and showed that all four NGS platforms had similar substitution and insertion rates (ranging from 1.3 to 1.8 and 1.7 to 2.4 mean #/ read, respectively), while IlluminaH had the fewest deletions per read (0.01 versus a range of 0.2-0.5 mean #/read for the other three platforms). This is consistent with the reduced number of indels reported for IlluminaH when compared with 454 TM during the genome sequencing of Gallus gallus [64] or influenza virus [40] and the sequencing of a strain of Escherichia coli using 454 TM and Ion Torrent TM [65].
We observed differences in the number of V3 sequences and mean read length among the NGS platforms, which were due to both the size of the PCR product selected for sequencing and intrinsic characteristics of the sequencing method. Short amplicons (337 nt) containing the V3 region were sequenced using 454 TM and PacBioH; however, library preparation and shotgun sequencing was performed on the entire PCR-amplified env gene (2,302 nt) using IlluminaH and Ion Torrent TM . In addition, the 454 TM sequencing was performed in-house using barcoded sequencing primers while for IlluminaH, PacBioH, and Ion Torrent TM the amplicons were sent for sequencing at the respective company. Despite these differences, all NGS platforms were able to detect the same higher frequency variants but showed slightly variations population sequence are depicted. Dashes indicated the same nucleotide as the population sequence while gaps introduced to maintain the alignment are indicated by dots. Relative clustering of the data from the NGS platforms was inferred by neighbor-joining, phylogenetic analyses determined using MEGA 5.05 [77] and displayed in a circle with topology only to facilitate their interpretation. Bootstrap resampling (1,000 data sets) of the multiple alignments tested the statistical robustness of the trees, with percentage values above 60% indicated by an asterisk. The size of the circles in the phylogenetic trees correlates with the frequency of the unique sequence determined by each NGS platform (color) in the logarithmic scale. The black box denotes the population (sanger) sequence for each sample. doi:10.1371/journal.pone.0049602.g004   in the detection of low frequency variants (,0.5%), which had limited implications for HIV-1 tropism. It is important to note that based on HIV-1 clonal sequencing the error rate for in-house 454 TM sequencing assays has been calculated to be between 0.1% and 0.5% [23,34,66,67]; therefore, we only used variants present at $1% of the viral population for diversity and HIV-1 tropism analyses.
Multiple studies have compared the efficacy of phenotypic and genotypic HIV-1 tropism assays to detect non-R5 variants [68,69,70,71,72]. In general, population-based sequencing tests are less sensitive and less specific than phenotypic assays [7,73], although a few studies have shown significant concordance and predictive values [28,72,74]. More sensitive deep sequencing methods for HIV-1 coreceptor tropism assays resulted in the detection of minor variants, which correlated well with both phenotypic assays [13,25,28,34] and virological response to maraviroc [25,28]. Here we have shown that all four NGS platforms provide equal and sensitive detection of minority non-R5 viruses in 12 patients, with minor differences depending on the bioinformatic algorithm used to infer HIV-1 tropism. However, it is important to stress that maraviroc was combined with at least two other antiretroviral drugs and increased viral loads could be due to several factors including poor or selective drug adherence and resistance to other drugs while maintaining partial maraviroc suppression. Nevertheless, all four NGS platforms showed similar concordance with virologic response at week 12 (ranging from 75% to 80% depending on the algorithm used), compared to Trofile (80%) and population sequencing (70%). Despite the limited number of samples, these results are comparable to previous studies where deep sequencing had a good concordance with phenotypic HIV1-tropism tests (82% to 87%) [25,28,34] and matched Trofile TM in predicting the success of maraviroc-based antiretroviral regimens [28].
In conclusion, this is the first study comparing the ability of the four current leaders in deep sequencing (454 TM , IlluminaH, PacBioH, and Ion Torrent TM ) to detect minority variants, and to infer the presence of non-R5 viruses, within the HIV-1 population. Despite minor differences in error rates and profiles (types of errors), all four NGS platforms successfully detected the same unique viral variants present at high frequencies, which are the sequences relevant for the clinical determination of HIV-1 coreceptor tropism. Further studies with larger number of patients and the latest chemistry and software for each NGS system will be needed to corroborate our findings; however, despite intrinsic parameters to each NGS platform (e.g., read length, error rates, cost per run, and turn-around time) we suspect that any of the current NGS platforms will be effective in a genotypic test to predict HIV-1 coreceptor usage.  [11]; R5, CCR5-tropic virus; D/M, dual mixed. (B) Virologic response at week 12 of a maraviroc-based antiretroviral regimen. Y or N corresponds to plasma viral load below or not 400 copies/ml at week 12, respectively. E.S., end of study (patient did no enter the study following the detection of non-R5 variants at baseline using Trofile TM ). (C) Quantification of non-R5 variants detected by deep sequencing as predicted using four HIV-1 tropism algorithms, i.e., 11/24/25 rule [24], Geno2Pheno 3.5% FPR [25,28,42], Geno2Pheno 10% FPR, and Web PSSM using the subtype B x4r5 matrix [43]. Dotted line represents the $2% suggested cutoff for the minimal amount of non-R5 sequences to be present in the viral population in order to classify a given virus as non-R5 [25,28]. doi:10.1371/journal.pone.0049602.g008