Utility of Metagenomic Next-Generation Sequencing for Characterization of HIV and Human Pegivirus Diversity

Given the dynamic changes in HIV-1 complexity and diversity, next-generation sequencing (NGS) has the potential to revolutionize strategies for effective HIV global surveillance. In this study, we explore the utility of metagenomic NGS to characterize divergent strains of HIV-1 and to simultaneously screen for other co-infecting viruses. Thirty-five HIV-1-infected Cameroonian blood donor specimens with viral loads of >4.4 log10 copies/ml were selected to include a diverse representation of group M strains. Random-primed NGS libraries, prepared from plasma specimens, resulted in greater than 90% genome coverage for 88% of specimens. Correct subtype designations based on NGS were concordant with sub-region PCR data in 31 of 35 (89%) cases. Complete genomes were assembled for 25 strains, including circulating recombinant forms with relatively limited data available (7 CRF11_cpx, 2 CRF13_cpx, 1 CRF18_cpx, and 1 CRF37_cpx), as well as 9 unique recombinant forms. HPgV (formerly designated GBV-C) co-infection was detected in 9 of 35 (25%) specimens, of which eight specimens yielded complete genomes. The recovered HPgV genomes formed a diverse cluster with genotype 1 sequences previously reported from Ghana, Uganda, and Japan. The extensive genome coverage obtained by NGS improved accuracy and confidence in phylogenetic classification of the HIV-1 strains present in the study population relative to conventional sub-region PCR. In addition, these data demonstrate the potential for metagenomic analysis to be used for routine characterization of HIV-1 and identification of other viral co-infections.


Introduction
Molecular characterization of human immunodeficiency virus type 1 (HIV-1) has revealed an exceptional level of sequence diversity [1].Several factors contribute to the overall genetic complexity, including high replication rates in an infected individual, error-prone replication by viral reverse transcriptase, and frequent inter-subtype recombination in high prevalence populations where more than two HIV strains are co-circulating.Phylogenetic analysis of full-length genomic sequences has classified HIV-1 into four distinct and highly divergent groups: M (major), O (outlier), N (non-M, non-O), and P, with group M strains, representing the pandemic branch, subdivided into nine subtypes (A-D, F-H, J and K).In addition to pure subtypes, more than 70 circulating recombinant forms (CRFs) of HIV-1 have also been described [2].
While a limited number of subtypes and CRFs predominate in any given geographical region, global diversification of HIV is continually being driven by the movement of people around the world and societal changes.Over the past 20 years, Europe and the United States, where HIV-1 subtype B infections are predominant, have seen a gradual increase in non-subtype B infections, primarily due to immigration.In France, non-B strains now account for approximately 50% of newly diagnosed HIV infections [3].In the U.S., non-B infections have slowly increased from less than 1% before 1996 to approximately 4% by 2011 [4][5][6].Social upheaval following the collapse of the Soviet Union also caused a rapid increase in HIV infections in the former Soviet Union (FSU) countries.In Russia, the number of infections increased from approximately 1000 in 1995 to greater than 255,000 by 2003 [7].This HIV outbreak, driven by injection drug use, resulted in the emergence of CRF03_AB, a recombinant between the predominant subtype A strain in the FSU countries and subtype B [8]. CRF03_AB spread rapidly and by 2003 accounted for 4% of HIV infections in Russia, comprising 97% of the HIV infections in the Kaliningrad province [7].These examples illustrate the dynamic nature of the HIV epidemic and the need for continuing viral surveillance.
The inherent capacity to generate sequence variation confers HIV with an ability to adapt to selective pressures applied by the host immune system and has immediate ramifications for pathogenesis including transmission, response to antiretroviral therapy, drug resistance, escape mutants and disease progression [1,9].The rapid evolution of HIV-1 also has significant implications from the perspective of screening, diagnostic testing, patient monitoring (e.g.viral load assays), and vaccine development.Surveillance is essential to monitor global diversification of HIV and to identify newly emerging strains.Typically surveillance has been conducted based on subtype-specific peptide immunoassays [10], heteroduplex mobility assays [11] or by Sanger sequencing of viral sub-genomic region(s) from many specimens [12] or the complete genome for a select few [13].Next-generation sequencing (NGS), with unprecedented depth and coverage at a fraction of the cost and time of traditional sequencing, allows several specimens to be sequenced in parallel and thus has the potential to be leveraged to conduct viral surveillance [14].Viral NGS can be applied for surveillance using target-specific [15] or unbiased metagenomic approaches [16].By generating cDNA libraries using random priming, one can recover the sequence of entire genomes directly from primary clinical specimens.Thus, metagenomic NGS can accurately identify any particular HIV strain in a specimen, as well as the presence of additional co-infecting pathogens such as human pegivirus (HPgV, formerly GBV-C) [17], a common non-pathogenic virus that has been controversially linked to delayed progression to AIDS in the setting of co-infection with HIV-1 [18][19][20].
Considering the origins of the HIV epidemic, the population's proximity to non-human primates, and the high level of strain diversity, Cameroon represents a logical site to conduct HIV surveillance [21,22].In Cameroon, the prevalence of HIV infection is estimated to be 4-5% in the adult population [23][24][25].The Cameroonian HIV epidemic is notable for its high level of strain diversity [12,13].HIV-1 group M accounts for approximately 98% of infections, with CRF02 _AG being the majority (58%) strain present in the population [12].All HIV-1 subtypes, a wide range of complex circulating recombinant forms (CRFs) and unique recombinant forms (URFs) of HIV-1 circulate in the population, with secondary recombination adding to the sizeable level of strain diversity [26].Cameroon holds the distinction of being endemic for group O (accounts for 1-2% of infections; [12]), the rare group N [27,28], the recently described group P [29,30], and non-human primates that harbor the simian immunodeficiency viruses (SIV) most closely related to the HIV-1 groups [31][32][33][34].In the current study, we apply NGS for full-genome viral sequencing and characterization of HIV-1 recombinant strains and HPgV from blood donors in Cameroon.

Pre-extraction filtering and DNase treatment
Plasma was thawed and spun at 4,000 x g for 10 minutes to pellet cell debris.Supernatants were passed through a 0.22 μm filter (Millipore, Billerica, MA) by spinning at 5000 x g for 5 minutes.For a 400 μl total volume, 331.2 μL of clarified plasma was combined with 20 μL Turbo DNase (Life Technologies, Carlsbad, CA, USA), 40 μL 10X Turbo Buffer, and 8.6 μL Baseline ZERO DNase (Epicentre, Madison, WI).This was incubated at 37°C or room temperature for 30 minutes with mixing by gentle vortexing after 15 minutes.

RNA extraction
RNA was extracted from pre-treated plasma using the EZ1 Virus Mini Kit v2.0 protocol for serum on a Qiagen robot per manufacturer instructions, with the exception that 10 μl of 5 mg/ ml linear acrylamide (Ambion/Life Technologies, Carlsbad, CA) was substituted for carrier RNA.Nucleic acid was eluted in 60 μl of Qiagen buffer AVE and either used immediately or stored at -70°C until use.

cDNA NGS library preparation
Libraries were constructed from amplified cDNA using a modified TruSeq (Illumina, San Diego, CA) protocol as previously described [35][36][37].Briefly, in Round A, RNA was reverse transcribed with MMLV SuperScript III Reverse Transcriptase (Invitrogen/Life Technologies, Carlsbad, CA) using Sol-PrimerA (5'-GTTTCCCACTGGAGGATA-N 9 -3') that has a random In Round B, Sol-PrimerB was used to amplify the randomly primed library.PCR products were purified, digested with BpmI to remove Sol-PrimerB, re-purified, and ligated to TruSeq adapters, followed by amplification with Illumina index-containing primers (Illumina).Round B reaction conditions were as follows: 10 μL of Round A -labelled cDNA was added to 40 μL of KlenTaq master mix per sample (5 μL 10X KlenTaq PCR buffer,1 μL 12.5 mM dNTP, 1 μL Sol-PrimerB (100 pmol/μl), 1 μL KlenTaq LA (Sigma-Aldrich, St Louis, MO), 32 μL ddH 2 O) and incubated as follows: 94°C for 2 minutes; 25 cycles of 94°C for 30 sec, 50°C for 45 sec, 72°C for 60 sec; 72°C for 5 minutes; hold at 10°C.Following amplification, 50 μl (1X) of Agencourt AMPure XP beads (Beckman Coulter, Brea CA) was added to Round B reaction mixture and incubated at room temperature for greater than 5 minutes.The beads were captured with a magnet and the supernatant was removed.Beads were then washed twice with 200 μL of 75% EtOH and air dried for 5 minutes.Libraries were eluted off the beads in 40 μl of water or elution buffer (Qiagen).For removal of Sol-PrimerB, Sol-PrimerB was cleaved with BpmI restriction enzyme (New England BioLabs, Ipswich, MA).Five μl NEB buffer 3, 5 μl 10xBSA, and 2 μl BpmI were added to 200 ng of Round B cDNA diluted in 38 μl of water and incubated at 37 °C for 2 hr, followed by inactivation at 65 °C for 20 minutes.Bead-based purification was repeated as described above and libraries eluted in 32 μl water or Qiagen EB buffer.

Phylogenetic Analysis
To classify the HIV -1 strains, final genomic consensus sequences were aligned with HIV-1 group M reference sequences [2] using the CLUSTALW method (MegAlign, Lasergene version v11 DNASTAR Inc., Madison, WI).Alignments were converted into PHYLIP format using ForCon (version 1.0 for Windows; J. Raes, University of Ghent, Belgium) and gap-stripped using BioEdit Sequence Alignment Editor (version 5.0.9,Tom Hall, North Carolina State University, Raleigh, North Carolina).Phylogenetic analysis was performed with the PHYLIP software package (version 3.5c; J. Felsenstein, University of Washington, Seattle, WA).Evolutionary distances were estimated with DNADIST (Kimura two-parameter method) and phylogenetic relationships were determined by NEIGHBOR (neighbor-joining method).Branch reproducibility of trees was evaluated using SEQBOOT (100 replicates) and CONSENSE.
For each CRF, bootstrap analysis was performed and phylogenetic trees were constructed for each sub-fragment (data not shown) to verify the predicted recombinant structure.Recombination analysis was performed using SIMPLOT (version 3.5.1;S. Ray, Johns Hopkins University, Baltimore, MD; [39]).Viral sequences were individually evaluated in SIMPLOT for evidence of recombination relative to subtypes and CRFs.If SIMPLOT indicated evidence of recombination, Bootscan and Findsite were performed; recombination was confirmed by constructing phylogenetic trees for each sub-fragment.The 8 Cameroonian HPgV strains in the present study (either the complete genomic sequences or the 5'UTR sequences) were aligned with 47 HPgV reference sequences in the database and analyzed in the same manner as described above for HIV.

Next Generation Sequencing for Viral Characterization
Plasma specimens were obtained from asymptomatic HIV-1 infected blood donors from 2002-2011 in the Cameroonian cities of Yaoundé and Douala.The initial classification of HIV strains was based on RT-PCR amplification of three genome sub-regions (gag p24; pol integrase, and env gp41), followed by traditional Sanger sequencing and phylogenetic analysis.A panel of 35 specimens harboring a variety of strains, including CRFs for which few reference sequences exist and potential URFs not previously reported, as well as different subtypes present in the population, was selected for complete genomic characterization by NGS (Table 1).To maximize the likelihood of obtaining full-length genomes, specimens were chosen with viral loads greater than 4.4 log 10 copies/ml (Table 1).
To reduce background from human host genomic sequences, plasma specimens were pretreated with a cocktail of nucleases and filtered prior to NGS library preparation and sequencing [40].An average of 11.8 ± 4.6 million high-quality reads per specimen were obtained by 2x250 base pair (bp) paired-end sequencing on a MiSeq instrument.Despite the uniformity in input virus titer, the number of reads aligning to a reference HIV genome varied greatly from one library to the next.The number of HIV reads ranged from a low of 140 to a high of 377,128, with a median of 11,168 reads per specimen (Table 1).Even after pre-treatment with nucleases to reduce human nucleic acid background, HIV reads still represented a small proportion of the total sequences obtained from each specimen (0.002-4.51% of reads with a median of 0.14%).However, greater than 90% genome coverage was achieved for 31 (88%) of the 35 specimens, with average sequence depth ranging from 37-to 8,274-fold, permitting fulllength assembly of the viral genome for 25 (71%) specimens.For libraries yielding full-length genomes, consensus sequences were reproducible from run to run ( 99% identity) and agreed with population (Sanger) sequence results.Coverage depth was relatively uniform with no particular bias towards or against any region, with the exception of the 5' and 3' ends (Fig 1).The same is true for libraries with gaps in genome coverage (S1-S7 Figs); regions with few to no reads varied randomly from one library to the next.Despite residual gaps in coverage, phylogenetic trees derived from gap-stripped alignments of these 7 incomplete genomes (80% coverage) showed that each branched with 100% bootstrap values with the subtype expected from sub-genomic sequences (S1-S7 Figs).
Phylogenetic analysis of an alignment of the 25 full-length HIV-1 genomes obtained by assembly of the NGS reads and 92 references representing group M subtypes A-D, F-H, J-K and selected CRFs was performed using group O strain ANT70 as the outgroup (Fig 2, Table 1).Separate phylogenetic trees were also constructed for 7 partially sequenced genomes (82-98% genome coverage; S1-S7 Figs) while 3 samples (419-33, 38-38, and CHU2801) having 78% coverage were not analyzed further.Comparison of strain classification based on a previous algorithm derived from the sequences of the gag, pol, and env sub-regions versus NGS and genome assembly showed that they were concordant in 31 of 35 (89%) cases (Table 1) [21].Recombination analysis revealed that two strains, 789-10 and 920-49, originally classified as URFs, were in fact pure subtype G (Figs 2, 3A and 3B).The env sequence for strain 789-10 grouped strongly with CRF06, which is comprised of subtypes G and J in this region.However, subtype J sequences were not found to be present in the full 789-10 genome sequence (Fig 3A).Similarly, strain 920-49 grouped strongly with CRF43 in gag and env, designated as G in these regions, but recombination analysis confirmed the subtype G classification across the whole length of the genome, indicating that strain 920-49 was not a recombinant (Fig 3B).In addition, for two strains, 876-14 and 1252-11, NGS allowed identification of recombinants that were not revealed by sub-genomic sequences; these two strains showed basal branches within their respective phylogenetic clusters (Figs 2, 3C and 3D).Strain 876-14, originally designated as subtype D, was found to contain subtype G sequences in the vif/vpr   [42,43].As in the CRF13_cpx reference strains, HIV strains 228-10 and 833-62 contained 10 and 7 amino acid insertions, respectively in gag p6 [42,43].
Additional low frequency CRFs were analyzed in the same manner (Fig 4C and 4D).B4043-15 is a CRF18_cpx, a highly complex recombinant consisting of 15 fragments derived from several subtypes (A, F, G, H, K and unclassified).The mosaic composition of B4043-15 strongly resembles CRF18 reference strains from both Cuba and Cameroon (Fig 4C).Strain 1130-39 The next three URFs are the result of recombination between pure subtypes and a CRF.Strain 280-10 is predominantly CRF22_01A1, interrupted by unclassified sequence (nt 2424-2903) from protease to the beginning of RT, and subtype K sequence (nt 2904-4235) from the 3' portion of RT to the beginning of integrase.1252-11 is a CRF11-cpx that contains two short segments of F2 sequence in the 5' portion of RT (nt 2584-2863) and the 3' end of integrase through vif (nt 4872-5401).Strain CHU3903 is all CRF22_01A1 sequence except for 1 kb of subtype F2 (nt 1982-2958) which spans the 3' half of gag through the 5' portion of pol RT.
The third set of three URFs contain sequences from two different CRFs.469-66 consists of 5 fragments that alternate between CRF36_cpx and CRF11_cpx sequences.The next two strains (567-16 and 663-13) consist of alternating stretches of CRF02_AG and CRF22_01A1 sequences but do not have the same recombination breakpoints.Notably, while the majority of these specimens were correctly classified as URFs based on sequences from three sub-regions, the benefit of full genome coverage allowed the recombination breakpoints to be precisely identified.

Detection of HPgV/GBV-C co-infection
The use of random priming for library generation allowed us to interrogate the specimen NGS data for the presence of additional viral agents.Of particular interest is the human pegivirus (HPgV, family Flaviviridae), also known as GB virus-C (GBV-C).NGS data were aligned to HPgV reference genome NC_001710; HPgV reads were detected in 9 of 35 (26%) specimens.Seven specimens yielded complete genome sequences and one with 99% of the genome.For most of these specimens, the percentage of HPgV reads (0.06-5.07%) far exceeded that obtained for HIV (Table 1).No correlation was found between HIV subtype and co-infection with HPgV.
The 8 Cameroonian HPgV sequences were aligned with a total of 46 HPgV reference sequences including genotypes 1-5, and a chimpanzee isolate (GBV-Ctro) as the outgroup.Phylogenetic trees were constructed based on the full genome alignment (Fig 6A).The Cameroonian genomes were found to cluster within the genotype 1 branch consistent with their African origin.Since the most recent common ancestor occupied a basal position on the genotype 1 branch and the sequences were separated from each other by relatively long branch lengths, the genotype 1 classification could be confirmed by phylogenetic analysis of the 5'UTR sequence alignment (Fig 6B) [46].No evidence of recombination was observed (data not shown).Sequence identity between these new isolates varied from 89-94%, confirming that each strain was derived from a unique specimen.

Discussion
The continuing diversification and global dynamics of HIV groups, subtypes and recombinants, as well as the emergence of new strains make it imperative that surveillance of viral diversity be conducted to monitor the dynamic HIV epidemic.While traditional methods (i.e.peptide-based serotyping, heteroduplex mobility assays, and Sanger sequencing of sub-genomic regions) have provided useful data, the metagenomic NGS approach used here enabled full-genome sequencing of HIV-1 with unequivocal strain classification directly from primary clinical samples, with clear applications for routine viral surveillance.Even for specimens with incomplete coverage (e.g.80-95%), enough sequence information was obtained to accurately classify strains.The depth of genome coverage achieved by NGS instills confidence in the consensus sequence generated, while analysis of individual reads affords the potential to identify dual infections, low frequency variants or quasispecies, and the evolution of intrahost populations [47][48][49][50][51].Our analysis did not reveal infection with more than one HIV strain in any given patient, and only one individual possessed a common drug resistance mutation (data not shown).The ability to multiplex specimens also increases the throughput of NGS and lowers the per-patient cost of sequencing.
Given the diversity of HIV, we chose to generate libraries using random primers.This approach resulted in the assembly of complete (71%) or nearly complete (90% coverage) genomes for highly diverse HIV-1 strains, including subtypes A, F2, and G, CRFs CRF01_AE, CRF11_cpx, CRF13_cpx, CRF18_cpx, CRF22_01A1 and CRF37_cpx, and URFs.The depths of genome coverage obtained here are higher than two other recent reports using NGS for viral screening in blood [52,53], and may be potentially be attributed to higher input viral titers.Despite nuclease pre-treatment, removal of host background was far from complete.Coupled to potential biases introduced during cDNA library amplification, host background likely contributed to the variability in % HIV reads and coverage depth observed for specimens with equivalent viral loads (Table 1).For example, 876-14 and 1225-26 had comparable viral loads of 5.38 and 5.36 log 10 copies/ml, respectively, yet % HIV reads and genome coverage were 0.21% and 1.61%, and 100% and 92%, respectively.Similarly, CHU2810 and A1575 had viral loads of 4.47 and 4.45 log 10 copies/mL, yet % HIV reads of 0.002% and 0.04%, and genome coverage of 53% and 100%, respectively.The intent of this study was to establish the feasibility of metagenomic NGS.For this reason, samples with high viral loads (>4.45 logs) were selected.We did not seek to address the sensitivity of the method at lower input viral titers.Additional optimization of the protocol using host depletion or probe enrichment strategies may be warranted to further increase the percentage of viral reads, thereby enhancing sensitivity and limits of detection [40].
With Cameroon at the epicenter of the HIV epidemic, our data contribute a number of fullgenome recombinant HIV-1 sequences to the GenBank database.Reliance on partial genome PCR characterization for surveillance is likely to underestimate the true extent of HIV diversity.Indeed, in the current study, two unique recombinants were identified that would otherwise have been categorized as a pure subtype D (876-14) or CRF11 (1252-11), had it not been for the complete genome sequence.Longer contiguous sequences also facilitate more accurate phylogenetic classification; a segment of strain 789-10 was misclassified as CRF06 based on 677 nucleotide region of env, and strain 920-49 was misclassified as CRF43 based on short segments of gag and env.The availability of whole-genome sequencing for viral surveillance enables rapid determination of whether new strains may be circulating in Cameroon.It bears mentioning that all subtypes and CRFs found in these URFs are endemic to the region.Indeed, many of the observed sites of recombination (Figs 3 and 5) were found in known genomic hotspots in the vif/vpr/vpu/tat and pol (RT) regions [54,55].
NGS with random-primed libraries has been used successfully to characterize the virome of clinical specimens [16,36,56], diagnose infections [57], and discover novel viruses [58,59].To assess the ability of NGS to identify co-infections with other viruses, we searched for HPgV sequences in the NGS data.HPgV sequences were detected in 9 of 35 (26%) specimen libraries, with 8 yielding at least 99% of the viral genome.HPgV has a worldwide distribution and is transmitted sexually, parenterally and by mother-to-child transmission, and as a consequence HIV patients are often co-infected [60].The high rate (25%) of HPgV nucleic acid detection in our specimens is notable, given the documented lower prevalence of ~16% RNA positives in HIV-infected patients (4% in healthy individuals) ( [19] and references therein).
HPgV genotypes exhibit consistent geographical clustering, with US and European strains exclusively genotype 2, and genotypes 3 (Japan and China) and 4 (Vietnam, Japan, and China) found primarily in Asia.Recently, complete genomes from Uganda in east Africa were assigned to either genotype 1 or to genotype 5, previously considered exclusive to South Africa [61].In the present study, the 8 Cameroonian HPgV genomes clustered within genotype 1 along with 2 isolates from Ghana and 1 from Uganda, and more distant from 2 Japanese strains putatively assigned to genotype 6 [61].The 5'UTR phylogenetic analysis supports the whole genomebased classification of the Cameroonian GBV-C strains as genotype 1.
NGS has ushered in a new era of discovery, as well as new challenges [62,63].Applications of this technique are accelerating pathogen discovery [14,64,65] and promise to transform clinical diagnostics [57] as NGS-based assays begin to move into the clinical laboratory.We propose here that NGS has the potential to be applied as a routine yet powerful approach for surveillance of viral diversity, as well as a tool to identify co-infections such as from HPgV.It is anticipated that these technological advances will lead to a more thorough understanding of the HIV epidemic and underlying sequence diversity.

Fig 1 .
Fig 1. HIV genome coverage is uniform and complete but varies in sequence depth.Three representative specimens sequenced by NGS with a wide range in percentage of HIV reads were selected and aligned to the A1-AF004885 reference genome to demonstrate the uniformity of genome coverage regardless of read depth.Coverage is expressed as number of reads at each nucleotide position along the length of the HIV genome.Strain/mean read number: 833-62/1085, green; 886-24/223, orange; B4043-15/7939, blue.doi:10.1371/journal.pone.0141723.g001

Fig 2 .Fig 3 .
Fig 2. Phylogeny of full length genomes obtained by NGS illustrates HIV diversity in Cameroon.A phylogenetic tree of 92 HIV-1 complete genome reference sequences and 25 Cameroonian sequences was constructed from a 7387 bp gap-stripped alignment with bootstrap values indicated at each branch.Group O strain ANT70 was used as the outgroup and the genetic distance scale is indicated.doi:10.1371/journal.pone.0141723.g002

Fig 4 .
Fig 4. Breakpoint analysis for rare, complex recombinants endemic to Cameroon.Bootscan plots are shown for (A) CRF11_cpx, (B) CRF13_cpx, (C) CRF18_cpx and (D) CRF37_cpx isolates.In each panel the profile for a reference strain is shown on top and a representative new strain is on the bottom.Vertical dashed lines indicate recombination breakpoints determined by Find Site; genome structure is diagramed below each plot.Bootscan analysis was performed using a window of 400 base pairs and 20 base pair step.doi:10.1371/journal.pone.0141723.g004

Fig 5 .
Fig 5. Genetic organization of unique recombinant forms obtained by NGS.Nine unique recombinants are shown with the legend at the bottom indicating the classification of each sub-genomic fragment.Genomic coordinates for each recombination breakpoint are described in detail in the Supplemental information.doi:10.1371/journal.pone.0141723.g005

Table 1 .
Phylogenetic classification and next-generation sequencing results for HIV-infected Cameroonian blood donors.
a Sample IDs are preceded by code denoting year of collection and country of origin (i.e., 02CM for 2002 in Cameroon).bViral loads in log 10 cps/ml.c HIV genome coverage.d NGS classifications in bold differ from RT-PCR classifications.e HPgV genome coverage.doi:10.1371/journal.pone.0141723.t001