Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Advanced molecular surveillance approaches for characterization of blood borne hepatitis viruses

  • Michael G. Berg ,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Project administration, Supervision, Writing – original draft, Writing – review & editing

    Affiliation Infectious Diseases Research, Abbott Diagnostics, Abbott Park, Illinois, United States of America

  • Ana Olivo,

    Roles Data curation, Formal analysis, Methodology

    Affiliation Infectious Diseases Research, Abbott Diagnostics, Abbott Park, Illinois, United States of America

  • Kenn Forberg,

    Roles Data curation, Methodology

    Affiliation Infectious Diseases Research, Abbott Diagnostics, Abbott Park, Illinois, United States of America

  • Barbara J. Harris,

    Roles Data curation, Formal analysis

    Affiliation Infectious Diseases Research, Abbott Diagnostics, Abbott Park, Illinois, United States of America

  • Julie Yamaguchi,

    Roles Conceptualization, Methodology

    Affiliation Infectious Diseases Research, Abbott Diagnostics, Abbott Park, Illinois, United States of America

  • Rachel Shirazi,

    Roles Data curation, Methodology

    Affiliation Central Virology Laboratory, National HIV and Viral Hepatitis Reference Center, Public Health Services, Ministry of Health, Tel-Hashomer, Ramat-Gan, Israel

  • Yael Gozlan,

    Roles Formal analysis, Methodology

    Affiliation Central Virology Laboratory, National HIV and Viral Hepatitis Reference Center, Public Health Services, Ministry of Health, Tel-Hashomer, Ramat-Gan, Israel

  • Silvia Sauleda,

    Roles Resources

    Affiliations Transfusion Safety Laboratory, Banc de Sang i Teixits, Servei Català de la Salut, Barcelona, Spain, Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Madrid, Spain

  • Lazare Kaptue,

    Roles Resources

    Affiliation Université des Montagnes, Bangangté, Cameroon

  • Mary A. Rodgers,

    Roles Formal analysis, Supervision, Writing – review & editing

    Affiliation Infectious Diseases Research, Abbott Diagnostics, Abbott Park, Illinois, United States of America

  • Orna Mor ,

    Contributed equally to this work with: Orna Mor, Gavin A. Cloherty

    Roles Project administration, Supervision, Writing – review & editing

    Affiliations Central Virology Laboratory, National HIV and Viral Hepatitis Reference Center, Public Health Services, Ministry of Health, Tel-Hashomer, Ramat-Gan, Israel, Sackler Faculty of Medicine, Tel Aviv University, Tel-Hashomer, Israel

  • Gavin A. Cloherty

    Contributed equally to this work with: Orna Mor, Gavin A. Cloherty

    Roles Project administration, Resources, Supervision, Writing – review & editing

    Affiliation Infectious Diseases Research, Abbott Diagnostics, Abbott Park, Illinois, United States of America


Defining genetic diversity of viral infections directly from patient specimens is the ultimate goal of surveillance. Simple tools that can provide full-length sequence information on blood borne viral hepatitis viruses: hepatitis C, hepatitis B and hepatitis D viruses (HCV, HBV and HDV) remain elusive. Here, an unbiased metagenomic next generation sequencing approach (mNGS) was used for molecular characterization of HCV infections (n = 99) from Israel which yielded full-length HCV sequences in 89% of samples, with 7 partial sequences sufficient for classification. HCV genotypes were primarily 1b (68%) and 1a (19%), with minor representation of genotypes 2c (1%) and 3a (8%). HBV/HDV coinfections were characterized by suppressed HBV viral loads, resulting in sparse mNGS coverage. A probe-based enrichment approach (xGen) aiming to increase HBV and HDV coverage was validated on a panel of diverse genotypes, geography and titers. The method extended HBV genome coverage a median 61% (range 8–84%) and provided orders of magnitude boosts in reads and sequence depth for both viruses. When HBV-xGen was applied to Israeli samples, coverage was improved by 28–73% in 4 samples and identified HBV genotype A1, A2, D1 specimens and a dual B/D infection. Abundant HDV reads in mNGS libraries yielded 18/26 (69%) full genomes and 8 partial sequences, with HDV-xGen only providing minimal extension (3–11%) of what were all genotype 1 genomes. Advanced molecular approaches coupled to virus-specific capture probes promise to enhance surveillance of viral infections and aid in monitoring the spread of local subtypes.


Viral hepatitis represents a significant global health burden, particularly as many cases lead to cirrhosis and liver cancer, which can be fatal. Viral surveillance is essential to understand prevalence and determine appropriate public health measures. In particular, hepatitis B virus (HBV), hepatitis C virus (HCV), and hepatitis delta virus (HDV) are major burdens on human health worldwide that must be monitored. Given the large numbers of genotypes and sub-genotypes for all three of these hepatitis viruses, the wide spectrum of genetic diversity they encompass brings the inherent potential to evade detection by diagnostic tests [13]. While generation of partial sequences for a given viral genome by Sanger sequencing methods has provided the classical means of surveillance, there are several drawbacks to consider with this approach. First, sub-genomic sequences can underestimate true diversity in a population that may contain recombinant strains. Second, focused sequencing in one region may not adequately inform diagnostic assay development that targets un-sequenced regions of the virus. Third, this method requires the design of primers for amplification that may not work for all genotypes. Alternatively, these issues can be avoided by pursuing complete genome sequencing.

The application of next-generation sequencing (NGS) to obtain full genomes is an invaluable epidemiological tool for tracking where strains have traveled, identifying transmission networks, spotting an outbreak, and monitoring for mismatches in diagnostics [48]. Unbiased metagenomics using random priming permits any pathogen to be detected in patient specimens, including viruses, bacteria, parasites, and fungi [9, 10]. However, abundant host background reads can obscure the presence of many of these agents and these methods are challenged by small, diverse, low copy, and or highly structured hepatitis viruses, such as HBV and HDV. Target enrichment offers an opportunity to significantly boost sensitivity, resulting in improved coverage and higher confidence data [11]. Single stranded DNA probes can be hybridized to reads within mNGS libraries to selectively capture and amplify viral sequences and is particularly useful for samples with low viral loads [12, 13]. A post-library capture step (e.g. xGen) has successfully been deployed for blood borne RNA viruses in the 10 kb-range length, like HIV and HCV, although it has not yet been evaluated for small viruses like HBV (3.2 kb) and HDV (1.6 kb) [13, 14].

Molecular surveillance of hepatitis viruses is particularly important in Israel, where immigration and travel rates are high. The countrywide prevalence of chronic HBV (HBsAg+) is estimated at 1.75%, with HDV co-infection at 6.5–7.1% [15, 16]. In general, the eastern Mediterranean region has the highest levels of HCV at nearly 2.5%, but large studies specific to Israel puts country prevalence at 0.5–0.9%, largely due to eastern European and Russian immigrants from the former Soviet Union [1619]. While the incidence of newly-diagnosed cases has been declining and most infections are Genotype 1b, Israel is home to numerous immigrant populations with the capacity to import new strains of HCV [19, 20]. To date, circulating strains for all three viruses have largely been determined by sub-genomic sequencing [15, 21]. Therefore, we applied metagenomic and two target enrichment NGS techniques (xGen and Pan viral probes) to study epidemiological trends of HCV-infected and HBV/HDV co-infected individuals in Israel.

Materials and methods


Patient plasma was collected from individuals seeking treatment at the Israeli National HIV and Viral Hepatitis Reference Center (NHRL). Plasma samples were remains from patients referred to the laboratory for HDV viral load measurements (HBV-HDV co-infections) or for HCV RAS analysis. HDV positivity was defined by real-time PCR [15]. All specimens were de-identified and IRB approval was granted for mNGS. Patients were exempt from signing a consent form by the local IRB (approval numbers for HCV and for HDV are 9329-12-SMC and 2890-15-SMC, respectively). Specimens in the HBV genotype panel were purchased from Boca Biolistics (Pompano Beach, FL) or collected from volunteer blood donors in Spain and Cameroon (HBV/HDV co-infected) to demonstrate probe efficacy on diverse strains. Spanish samples were selected from the Biobank of the Catalonia Blood Bank and de-identified. Participants were recruited from Barcelona and provided written, informed consent. IRB approval was obtained from the Vall d’Hebron Hospital Ethics Committee. Cameroonian samples were from two HBV surveillance studies conducted from 2010–2016 where participants were recruited from blood bank donors, hospitals, and chest clinics in the urban centers of Douala and Yaoundé. Written informed consent was provided and plasma was collected anonymously. Studies were approved by the Ministry of Health of Cameroon, the Cameroon National Ethical Review Board, and the Faculty of Medicine and Biomedical Science IRB. Israeli HDV samples were collected from October 2014-Jul 2017 and HCV samples were collected from June 2015-Jul 2017. Only DAA naïve HCV-RNA positive patients and HBV+ HDV patients with detectable HDV RNA were recruited into the study. Samples were randomly collected from these patients and can be considered representative of a larger HCV positive or HBV/HDV dually positive population. Relevant demographic details are included in S1 Table. Negative controls were normal human plasma (NHP) and positive controls consisted of purified stocks or infected plasma in which virus or bacteria was diluted into NHP to log 4.0 copies/ml. Parvovirus B19, HHV-5, VZV, Influenza A, Adenovirus 7, and Chlamydia stocks were from Exact Diagnostics (Dallas, TX); HIV, HBV, and HCV originated from samples sourced in Cameroon and Spain.

Viral loads

HDV and HBV viral loads (22 samples) were determined by quantitative PCR in Israel [15]. HCV and HBV viral loads were approximated at Abbott using a semi-quantitative multiplex PCR. This research-use assay simultaneously detects HBV, HCV, HIV-1, and HIV-2. Quantitation was extrapolated from relative Ct values of diluted standards.

Specimen pretreatment and extraction

Plasma specimens were pre-treated with Ultra-pure benzonase (Sigma, St. Louis MO) for 3 hrs at 37°C and extracted on an m2000sp (Abbott Laboratories, Des Plaines IL) using the RNA/DNA protocol (500 μl input/50 μl elute).

mNGS library production

Metagenomic libraries (mNGS) were prepared and quantified essentially as described [13]. Briefly, total nucleic acid was concentrated to 10 μl with RNA Clean and Concentrator-5 spin columns (Zymo Research, CA) and RNA was reverse transcribed with random primers using Superscript III (SSRTIII) 1st Strand reagents (Life Technologies), followed by 2nd strand synthesis with Sequenase V2.0 T7 DNA pol (Affymetrix). Double stranded DNA/cDNA was recovered with DNA Clean and Concentrator-5 spin columns (Zymo Research) and -barcoded with Nextera XT indices lacking 5’ biotin tags using 24 cycles of amplification (IDT, Coralville IA; Illumina, Carlsbad CA). Nextera libraries were purified with Agencourt AMPpure XP beads (Beckman Coulter) and quantified by a 2200 TapeStation (Agilent) and Qubit fluorometer (Life Technologies).

Design of HBV and HDV xGen probe sets

Probe sets were designed essentially as described previously for HIV, with each probe 120 nt in length [13]. Briefly, 60 HBV complete genomes including genotypes A-I were aligned in BioEdit. A single consensus (3223 nt) was extracted, with degenerate bases replaced by specific nucleotides, and an initial 53 probes at 2X coverage (e.g. 60 nt overlap) were designed from this sequence. The alignment was surveyed in 120 nt windows to identify regions with <80% identity and include any genotype-specific fragments each 120, 239, or 257 nt in length. An additional 25 probes with 1X coverage (e.g. 1 nt overlap) were designed for a total of 78 HBV probes. For HDV, sequence diversity is much greater, requiring probes designed from separate consensus genomes of genotypes 1–8. Genomes ranged from 1293 nt-1693 nt in length, resulting in 109 probes at approximately 1X coverage. An additional 28 HDV probes in diverse regions were included for a total of 137. HBV and HDV probe sets were combined into one reagent (215 probes) for hybridization.

xGen reagent synthesis and protocol

120 nt probe stocks (3 pmol/probe) modified with a 5’ biotin tag, Nextera barcoding primers lacking a biotin label, and blocking oligos complementary to Nextera Set A i5 and i7 index primers (1 μl/rxn) were all synthesized at IDT. Hybridizations, capture by streptavidin beads, washes and library amplification were essentially as described [13]. Here, after the initial 12 cycles of amplification and elution off streptavidin beads, a repeat KAPA amplification of 10 cycles was performed and libraries were visualized on a 2200 TapeStation and quantified with a Qubit fluorometer using the dsDNA high-sensitivity kit.

Pan-viral enrichment

SSRTIII-Nextera (mNGS) libraries from the HBV diversity panel were pooled together for enrichment with the commercially available Pan Viral probe set (n>600,000 probes). Hybridization and amplification steps were followed according to manufacturer instructions (Twist Biosciences, San Francisco, CA). A complete description of the procedure is included in the S1 Appendix. We note that after DNA purification of library amplification on streptavidin beads, a second 15 cycle PCR ‘off the beads’ was performed.

Next generation sequencing and analysis

HCV dual barcoded libraries were multiplexed according to viral load and either sequenced on a single HiSeq run (n = 72) or batched into 4 runs of 7 (n = 28) on a MiSeq to achieve sufficient read depth. HBV and HDV mNGS libraries (n = 26) were divided over 4 MiSeq runs. xGen libraries were pooled together for a single, separate run, since these share the same barcodes as mNGS libraries. NGS data analysis was performed as described with CLC Genomics Workbench 9.0 software (CLC bio/Qiagen, Aarhus Denmark) and SURPI [22, 23]. Raw data was initially mapped to multiple reference sequences to determine the genotype with the greatest identity. An iterative approach was used to derive the final sequence, using the initial consensus as the reference to refine the consensus upon remapping. To detect possible contaminating reads from barcode hopping, raw data from each sample was individually mapped to the consensus sequences of other samples sequenced on the same run, removing any with ≥99% identity. Unmapped reads (e.g. unique to the sample of interest) were collected and realigned to generate the final consensus.

Phylogenetic analysis

Multiple sequence alignments were performed in MegAlign Pro (Lasergene, DNASTAR Inc., Madison WI) using the MUSCLE algorithm and manually edited in BioEdit Sequence Alignment Editor (v 7.2.5) [24, 25]. Neighbor-Joining phylogenetic inference was performed using PHYLIP (version 3.5c; J. Felsenstein, University of Washington, Seattle, WA). Evolutionary distances were estimated with Dnadist (Kimura two-parameter method) and phylogenetic relationships were determined by Neighbor (neighbor-joining method). Branch reproducibility of trees was evaluated using Seqboot (100 replicates) and Consense. Trees were visualized using FigTree (version 1.4.2; A. Rambaut, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh). Sequences basal to a genotype branch were examined for recombination breakpoints with SimPlot (version 3.5.1; S. Ray, Johns Hopkins University, Baltimore, MD).

Nucleotide sequence accession numbers

Full-length sequences were submitted to GenBank under the following accessions: HCV (MT632105-MT632194), HBV (MT622522-MT622525), and HDV (MT583788-MT583813).


Full genome coverage of HCV without enrichment

A total of 99 HCV RNA positive specimens were collected at the NHRL. A semi-quantitative PCR of libraries indicated that the average viral load was log 5.4 copies/ml, consistent with previously observed trend of most untreated HCV patients having viral loads >4 log IU/ml [26]. A median of 8.3 million reads were obtained per library (S2 Table). Full length HCV sequences (≥97% coverage) were obtained for 82 individuals and ≥90% of the genome was determined for 89/99 samples (Table 1). Five of these had 62–86% coverage despite viral loads ≥ log 5, whereas the remaining 5 with < 50% coverage had either a viral load < log 5 or produced very few total mNGS reads. Numbers of reads and the level of coverage depth provide an indication of the confidence in the consensus sequences. HCV reads were normalized to total reads and expressed as reads/million (rpm). The majority (93%) of mNGS libraries had ≥100 HCV rpm, with most (63%) having ≥1000 HCV rpm (Table 1). A median of 11,466 HCV reads were obtained per library. Due to the high coverage obtained for HCV specimens, no further enrichment was necessary.

Full-length consensus sequences were added to a complete alignment of HCV references and analyzed within neighbor joining phylogenetic trees (Fig 1). HCV genotypes were primarily 1b (68%) and 1a (19%), with minor representation of genotypes 2c (1%) and 3a (8%) (Table 1). Branch lengths indicated all Israeli strains (red) were unique; nevertheless, those branching with high bootstrap values were aligned pairwise to rule out cross-contamination. These sequences shared only 91–94% identity and none shared a barcode. The basal branching pattern of Genotype 1b strain (sample 25–2000618) was evaluated in Simplot but did not show evidence of recombination. Genotype 3a sequences had highest identity (≤95%) with strains originating in Western Europe. These strains generally came from male, injection drug users with a median age of 49 ± 7 years, and not from a specific country. For the 7 partial genomes (20–90% coverage), separate trees were generated from gap-stripped alignments and there was still sufficient sequence to allow for classification.

Fig 1. The majority of Israeli strains are Genotype 1b.

Neighbor-joining phylogenetic tree of full-length HCV sequences. New Israeli sequence branches are in red and reference strains are in black labeled with accession numbers. Bootstrap values of nodes >70 are shown.

mNGS output for all 99 original samples were analyzed with the SURPI metagenomics pipeline to probe for additional viruses, such as HPgV-2, a recently appreciated co-infection of HCV [23, 27]. A variety of viruses (e.g. HIV, HBV, VZV, Influenza, adenovirus, etc.) spiked into normal human plasma each at log 4.0 copies/ml served as a positive control and all were detected. Coverages and reads per million values were similar for the three positive controls included in the separate sequencing runs of HCV libraries (S1 Fig). Human pegivirus-1 (GBV-C) reads were enriched in 5 individuals. No additional blood borne agents or viruses besides HCV were found in any of the 99 mNGS patient libraries.

Target enrichment of HBV and HDV

HDV antibody positive HBV/HDV specimens with detectable HDV RNA by quantitative PCR were prepared in the same manner as HCV for mNGS. Preliminary viral loads measured independently in Israel and at Abbott indicated that levels of co-infecting HBV were very low, with only a minority of the 26 samples registering <33 Ct (1–3 log copies/ml) and most without detectable levels of DNA [15]. Consequently, HBV coverage by mNGS was sparse, if not completely absent (Table 2). HDV by contrast had higher viral loads (4–7 log copies/ml) and was readily detected in 100% of samples, with coverages ranging from 20–100%.

To improve target identification, 5’ biotin-tagged xGen probes each 120 nt in length were designed to include the entire spectrum of HBV (genotypes A-I) and HDV (genotypes 1–8) genetic diversity. Our present samples only required enrichment for HBV, but given that probes to other strains or viruses do not interfere with each other, we combined these into one probe set (n = 215) for future use (Fig 2A). The HBV-HDV xGen probe methodology was first validated on 12 unrelated mono- and co-infected samples from a variety of countries that are known to include a diverse array of genotypes. Since xGen libraries derive from mNGS libraries and thus have the same barcodes, they were sequenced on different runs to be able to discriminate the source of reads. Coverage plots of an HBV genotype F1 (1007-HBV-0036) from Peru with a viral load of 3.6 log copies/ml demonstrate the dramatic improvement with probe enrichment (Fig 2B, left). Only 6% coverage at 1X depth was obtained by mNGS, which increased to 90% at 27X depth with xGen. A modest improvement was observed for an HDV genotype 1 with a 4.59 log cp/ml viral load (U160953A), increasing from 90% to 94% coverage, but primarily with an increase in depth (Fig 2B, right). Note that in this sample, HBV (genotype B2) present at 3.24 log IU/ml increased from 8% coverage with mNGS to 84% with xGen (Fig 2C, Table 3).

Fig 2. HBV/HDV-xGen greatly enhances sensitivity.

(A) HBV/HDV xGen probes tile HBV genotypes A-I with 2X coverage and HDV genotypes 1–8 with 1X coverage. (B) Representative genome coverage plots of NGS data for HBV (left) and HDV (right) strains from Cameroon. For HBV, mNGS reads are shown in orange and xGen reads in blue. For HDV, mNGS reads are shown in green and xGen reads in purple. (C) Histograms of coverage (top) and reads/million (bottom) on co-infections or HBV-only infections. Country, genotype, and viral load are listed beneath each plot with the same color scheme as in Fig 2B.

The boost in sensitivity over a range of viral loads and genetic diversity is summarized for each virus by comparing genome coverage (top) and reads per million (bottom) ± xGen enrichment (Fig 2C, Table 3). While samples with very low viral loads did see some increase for both metrics, those with titers ≥3.5 log saw the most improvement. The median increase in HBV genome coverage was 61% (range 8–84%). The same Nextera mNGS libraries obtained from mono- and co-infected samples from a variety of countries were similarly captured and amplified with a “Pan-viral” probe set which contains >600,000 probes against 1000 human viruses, but only includes probes tiling a single HBV (NC_003977.2: genotype D) and HDV (NC_001653.2: genotype 1) reference sequence. Genome coverage was comparable when titers were high, however at lower levels, sequence diversity impacted HBV detection with the Pan-viral method, stressing the importance of including probes to multiple genotypes (Table 3). For example, both methods obtained 100% coverage of an HBV B2 with a titer of log 5.92 copies/ml, whereas for the B2 strain (log 3.24 copies/ml) that increased from 8% to 84% with xGen, no coverage (0%) was obtained with the Pan viral method.

HBV and HDV cohort of patients from Israel

The HBV/HDV mNGS libraries from Israel were revisited with xGen selection. Given the particularly low titers for HBV, most samples (n = 15) only obtained 2–20% genome coverage (60–600 nt/3.2kb), but this was still an increase from zero coverage with mNGS and provided an indication that HBV was indeed present (Table 2). When HBV sequences were originally obtained by mNGS, xGen enrichment significantly (28–73%) improved the overall coverage. As an example, sample 2001149 originally had 8% HBV coverage which was extended to 69% with enrichment (Fig 3A). Interestingly, this individual was dually-infected with genotypes B2 and D1. Incomplete coverage and reliance on a consensus sequence created the appearance of a recombinant, however, overlapping sequences from both strains were detected whereas contiguous reads spanning putative ‘recombination breakpoints’ were not. (Fig 3A). Phylogenetics from the other patients with >50% of the HBV genome revealed infections with genotypes A1, A2, and D (Table 2).

Fig 3. Dual BD infection and clustering of HDV genotype 1 strains in Israel.

(A) Coverage plot of HBV specimen, 2001149. mNGS reads are shown in orange and xGen reads in dark blue (upper panel). Genotype D1 reads are in gold and genotype B2 reads are in brown (lower panel). (B) Coverage plot of HDV specimen, 2001234. (C) Neighbor-joining phylogenetic tree of near full-length HDV sequences. New Israeli sequence branches are in red and reference strains are in black labeled with accession numbers. Bootstrap values of nodes >70 are shown.

HDV titers were much higher and thus the abundance of viral reads in mNGS libraries were often sufficient to already provide near full-length sequences (Table 2) [15]. Sample 20001234 was representative of most cases, with HDV-xGen providing significant boosts in sequence depth, but with genome coverage only minimally extended (3–11%) (Fig 3B). We obtained 12 full genomes (>95%) and 14 partial (33–92%) sequences. HDV consensus genomes generated ± enrichment were compared to one another and all agreed with >96% identity unless the mNGS initially yielded partial sequences with low coverage (S3 Table). This indicates the additional rounds of amplification required for xGen did not bias the final sequence [13, 28]. Twenty-one strains with >84% genome coverage were included in the phylogenetic analysis. Consistent with previous Sanger sequencing classifications off the HDAg coding region, all strains were HDV genotype 1 (Fig 3C) [15]. Sample 2000742 from Ethiopia branched with HDV strains from Somalia (U81988.1) and Ethiopia (U81989.1). Notably, the HBV sequence from this same individual is genotype A1, indicating both infections likely originated in Africa. Sample 2001222, the only patient from Romania, branches with Gen 1 strains from a variety of geographies, but the rest of the Israeli samples cluster together on a Gen 1 branch absent reference strains from other countries. The 2001149 and 2001063 strains with short branch lengths are from the same patient (blue) and share 99.28% identity.


Metagenomics is an extremely powerful approach that allows one to query for the presence of any pathogen in patient samples without any prior knowledge of the sequence. Yet, due to its unbiased nature, mNGS often only scratches the surface for virus detection, especially for low viral load specimens. Here, the high viral loads typical of HCV, and also with HDV, allowed us to readily obtain full genomes using standard methodology, whereas the low titers of HBV required enrichment [26]. This cohort included 53 males and 43 females with an overall median age of 58. Most were likely exposed to HCV through IDU or blood transfusion, while others had no identifiable risk factor. Sensitivity for numerous viruses in our controls suggested we should have detected other blood borne pathogens if they were there (S1 Fig), but no additional co-infections besides GBV-C were present among the HCV positive individuals.

In Israel, the age-adjusted prevalence of HCV infection was recently estimated to be 5 per thousand, with immigrants from Eastern Europe making up most patients [19]. Indeed, many immigrants to Israel are from countries with high HCV prevalence, such as Georgia, Turkmenistan, Moldova, Uzbekistan, Ukraine, Morocco, Romania, and Kazakhstan. The disparity in HCV infection rates for native-born Israelis (0.1%) versus immigrants (5.7%) is significant and further borne out in studies focusing on IDU populations [18, 29]. While the prevailing trend reported is that HCV Genotype 1b predominates in Israel, the presence of other genotypes besides type 1 (70%) has been noted, including type 2 (8%), type 3 (20%) and type 4 (2%) [16, 1921, 30]. The majority of HCV strains characterized here were genotype 1b and found in individuals immigrating from the former Soviet Union and neighboring Eastern European countries. Nearly half (10/22) of the native born in Israelis in the cohort were genotype 1a, with the remaining 9 genotype 1a strains coming from Western Europe and the Middle East/North Africa. Genotype 3a strains were primarily from middle aged (49 yr old), IDU males from any country.

Sequence-independent (nuclease treatment, filtration, ultracentrifugation) and -dependent (rRNA depletion, CRISPR cas9-mediated DASH) methods represent enrichment strategies intended to concentrate viral nucleic acid or lower host background [31, 32]. Target enrichment represents an alternate approach based on positive selection which greatly enhances sensitivity for low viral load infections in clinical samples and consequently reduces the overall depth of sequencing required [12, 28, 33]. Comprehensive approaches like ViroCap which tile >185,000 sequences totaling ~200 Mb, include probes against RefSeqs, near-neighbors, and other viral databases, just as VirCapSeq covers 207 viral taxa with nearly 2 million probes, even after clustering highly identical regions [34, 35]. These methods provide amazing boosts in coverage and depth, but these probe sets are prohibitively expensive to synthesize and use on a routine basis. A commercialized, and somewhat leaner probe set (~600,000 probes) covering ~1000 human viruses from Twist Biosciences now puts this technology within reach of more labs. The trade-off we have observed is that while sensitivity for many viruses is substantially improved, performance is compromised for highly diverse viruses where probes are designed against only one or a few representative strains (Table 3). When public health measures require full genome sequencing to halt outbreaks, observe transmission clusters, or track diagnostically relevant mismatches, mere identification of specific viruses like HIV and HCV is insufficient. We and others have shown that probes to all subtypes and groups (HIV) and genotypes (HCV) are required to reliably obtain full genomes [13, 14, 36].

We therefore tailored a specific probe set to ensure capture of all HBV (A-I) and HDV (1–8) genotypes. Boosts in coverage and sequence depth on a variety of HBV specimens with a range of viral loads validated this approach (Fig 2C). xGen did extend coverage for HBV in some co-infected Israeli samples, however at titers this low, there was only so much improvement to be expected. This suppression of HBV titers by HDV that we observed is consistent with numerous reports [37, 38]. Where HBV sequences were obtained, these were genotype A1, A2, D, and a B/D dual infection. HDV viral loads on the other hand were considerably higher than HBV and enrichment was often not necessary. Our full genome sequencing confirmed previous sub-genomic HDAg protein sequencing wherein genotype 1 was the only strain detected [15]. Levels of co-infection in most countries are simply not known, and as we have seen in Cameroon, they can be far higher than expected [39]. The significant prevalence of HDV (6.5%) in Israel mandates HDV RNA testing for all coinfected patients.

mNGS has yielded a wealth of information and at times, actionable results for patients [4042]. However, due to costs, turn-around time, validation, and reimbursement among many issues, the arrival of mNGS in the clinic as a test to replace all other infectious disease diagnostics does not appear imminent [43]. Nevertheless, as we demonstrate here, it can play an important role as a research tool and a means of insight into epidemiologic trends. While levels of hepatitis are going down, Israel is home to many immigrants and surveillance is needed. The numerous applications for these and other viruses ensure that mNGS will play an even greater role in dictating public health policy in Israel and elsewhere.

Supporting information

S1 Fig. Viral detection in mNGS libraries.

A positive control consisting of 8 viruses and chlamydia trachomatis spiked into normal human plasma each at log 4.0 copies/ml was included with samples in three separate extractions, library preps and sequencing runs of HCV positive samples. Reads were taxonomically assigned by the SURPI pipeline. The top histogram represents the genome coverages and the bottom histograms represent reads per million for each pathogen.


S1 Table. Patient demographic data.

Demographic information for HCV and HBV/HDV positive patients enrolled in study at the Israeli National HIV and Viral Hepatitis Reference Center (NHRL).


S2 Table. HCV mNGS library metrics.

HCV positive library results are listed and include total reads, HCV reads, percent genome coverage, genotype classification, and HCV reads per million.


S3 Table. HDV sequence agreement.

Pairwise nucleotide identity values comparing HDV consensus sequences from mNGS versus xGen is expressed as a percent.



We thank Guixia Yu, Scot Federman, and Dr. Charles Chiu at University of California San Francisco for sequencing and data processing of HCV samples. We thank Dr. Matthew Frankel for the HBV genotype panel specimens. We thank Dave Campbell and Dr. Nicholas Downey at IDT for assistance with xGen probe design, and John Robichaud and Mark Consugar at Twist Biosciences for assistance with the Pan-viral protocol.


  1. 1. Alavian SM, Carman WF, Jazayeri SM. HBsAg variants: diagnostic-escape and diagnostic dilemma. J Clin Virol. 2013;57(3):201–8. pmid:22789139
  2. 2. Chevaliez S, Bouvier-Alias M, Castera L, Pawlotsky JM. The Cobas AmpliPrep-Cobas TaqMan real-time polymerase chain reaction assay fails to detect hepatitis C virus RNA in highly viremic genotype 4 clinical samples. Hepatology. 2009;49(4):1397–8. pmid:19330876
  3. 3. Watanabe T, Inoue T, Tanoue Y, Maekawa H, Hamada-Tsutsumi S, Yoshiba S, et al. Hepatitis C virus genotype 2 may not be detected by the Cobas AmpliPrep/Cobas TaqMan HCV Test, Version 1.0. J Clin Microbiol. 2013;51(12):4275–6. pmid:24068011
  4. 4. Brennan CA, Bodelle P, Coffey R, Harris B, Holzmayer V, Luk KC, et al. HIV global surveillance: foundation for retroviral discovery and assay development. Journal of medical virology. 2006;78 Suppl 1:S24–9.
  5. 5. Casto AM, Adler AL, Makhsous N, Crawford K, Qin X, Kuypers JM, et al. Prospective, Real-time Metagenomic Sequencing During Norovirus Outbreak Reveals Discrete Transmission Clusters. Clin Infect Dis. 2019;69(6):941–8. pmid:30576430
  6. 6. Greninger AL, Naccache SN, Federman S, Yu G, Mbala P, Bres V, et al. Rapid metagenomic identification of viral pathogens in clinical samples by real-time nanopore sequencing analysis. Genome Med. 2015;7:99. pmid:26416663
  7. 7. Greninger AL, Naccache SN, Messacar K, Clayton A, Yu G, Somasekar S, et al. A novel outbreak enterovirus D68 strain associated with acute flaccid myelitis cases in the USA (2012–14): a retrospective cohort study. Lancet Infect Dis. 2015;15(6):671–82. pmid:25837569
  8. 8. Theze J, Li T, du Plessis L, Bouquet J, Kraemer MUG, Somasekar S, et al. Genomic Epidemiology Reconstructs the Introduction and Spread of Zika Virus in Central America and Mexico. Cell Host Microbe. 2018;23(6):855–64 e7. pmid:29805095
  9. 9. Chiu CY. Viral pathogen discovery. Curr Opin Microbiol. 2013;16(4):468–78. pmid:23725672
  10. 10. Firth C, Lipkin WI. The genomics of emerging pathogens. Annu Rev Genomics Hum Genet. 2013;14:281–300. pmid:24003855
  11. 11. Kumar A, Murthy S, Kapoor A. Evolution of selective-sequencing approaches for virus discovery and virome analysis. Virus Res. 2017;239:172–9. pmid:28583442
  12. 12. Depledge DP, Palser AL, Watson SJ, Lai IY, Gray ER, Grant P, et al. Specific capture and whole-genome sequencing of viruses from clinical samples. PLoS One. 2011;6(11):e27805. pmid:22125625
  13. 13. Yamaguchi J, Olivo A, Laeyendecker O, Forberg K, Ndembi N, Mbanya D, et al. Universal Target Capture of HIV Sequences From NGS Libraries. Front Microbiol. 2018;9:2150. pmid:30271393
  14. 14. Bonsall D, Ansari MA, Ip C, Trebes A, Brown A, Klenerman P, et al. ve-SEQ: Robust, unbiased enrichment for streamlined detection and whole-genome sequencing of HCV and other highly diverse pathogens. F1000Res. 2015;4:1062. pmid:27092241
  15. 15. Shirazi R, Ram D, Rakovsky A, Bucris E, Gozlan Y, Lustig Y, et al. Characterization of hepatitis B and delta coinfection in Israel. BMC Infect Dis. 2018;18(1):97. pmid:29486716
  16. 16. Zuckerman E. RHS, Rennert G. HBV and HCV Epidemiology In Israel. Carmel Medical Center, Technion; 2013.
  17. 17. Bar-Shany S, Green MS, Slepon R, Shinar E. Ethnic differences in the prevalence of anti-hepatitis C antibodies and hepatitis B surface antigen in Israeli blood donors by age, sex, country of birth and origin. J Viral Hepat. 1995;2(3):139–44. pmid:7493308
  18. 18. Sermoneta-Gertel S, Donchin M, Adler R, Baras M, Perlstein T, Manny N, et al. Hepatitis c virus infection in employees of a large university hospital in Israel. Infect Control Hosp Epidemiol. 2001;22(12):754–61. pmid:11876453
  19. 19. Weil C, Nwankwo C, Friedman M, Kenet G, Chodick G, Shalev V. Epidemiology of hepatitis C virus infection in a large Israeli health maintenance organization. Journal of medical virology. 2016;88(6):1044–50. pmid:26538137
  20. 20. Kartashev V, Doring M, Nieto L, Coletta E, Kaiser R, Sierra S, et al. New findings in HCV genotype distribution in selected West European, Russian and Israeli regions. J Clin Virol. 2016;81:82–9. pmid:27367545
  21. 21. Gozlan Y, Bucris E, Shirazi R, Rakovsky A, Ben-Ari Z, Davidov Y, et al. High frequency of multiclass HCV resistance-associated mutations in patients failing direct-acting antivirals: real-life data. Antivir Ther. 2019;24(3):221–8. pmid:30880684
  22. 22. Berg MG, Yamaguchi J, Alessandri-Gradt E, Tell RW, Plantier JC, Brennan CA. A Pan-HIV Strategy for Complete Genome Sequencing. J Clin Microbiol. 2016;54(4):868–82. pmid:26699702
  23. 23. Naccache SN, Federman S, Veeraraghavan N, Zaharia M, Lee D, Samayoa E, et al. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res. 2014;24(7):1180–92. pmid:24899342
  24. 24. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–7. pmid:15034147
  25. 25. Hall TA. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl Acids Symp Ser. 1999;41:95–8.
  26. 26. Freiman JM, Wang J, Easterbrook PJ, Horsburgh CR, Marinucci F, White LF, et al. Deriving the optimal limit of detection for an HCV point-of-care test for viraemic infection: Analysis of a global dataset. J Hepatol. 2019;71(1):62–70. pmid:30797050
  27. 27. Berg MG L D, Coller K, Frankel M, Aronsohn A, Cheng K, Forberg K, et al. Discovery of a Novel Human Pegivirus in Blood Associated with Hepatitis C Virus Co-Infection. PLoS Pathog. 2015;11(12):e1005325. pmid:26658760
  28. 28. Forberg K, Rodgers MA, Dawson GJ, Sauleda S, Olivo A, Vallari A, et al. Human pegivirus 2 exhibits minimal geographic and temporal genetic diversity. Virology. 2019;539:69–79. pmid:31689572
  29. 29. Loebstein R, Mahagna R, Maor Y, Kurnik D, Elbaz E, Halkin H, et al. Hepatitis C, B, and human immunodeficiency virus infections in illicit drug users in Israel: prevalence and risk factors. Isr Med Assoc J. 2008;10(11):775–8. pmid:19070285
  30. 30. Gozlan Y, Ben-Ari Z, Moscona R, Shirazi R, Rakovsky A, Kabat A, et al. HCV genotype-1 subtypes and resistance-associated substitutions in drug-naive and in direct-acting antiviral treatment failure patients. Antivir Ther. 2017;22(5):431–41. pmid:28067632
  31. 31. Conceicao-Neto N, Zeller M, Lefrere H, De Bruyn P, Beller L, Deboutte W, et al. Modular approach to customise sample preparation procedures for viral metagenomics: a reproducible protocol for virome analysis. Sci Rep. 2015;5:16532. pmid:26559140
  32. 32. Gu W, Crawford ED, O'Donovan BD, Wilson MR, Chow ED, Retallack H, et al. Depletion of Abundant Sequences by Hybridization (DASH): using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications. Genome Biol. 2016;17:41. pmid:26944702
  33. 33. O'Flaherty BM, Li Y, Tao Y, Paden CR, Queen K, Zhang J, et al. Comprehensive viral enrichment enables sensitive respiratory virus genomic identification and analysis by next generation sequencing. Genome Res. 2018;28(6):869–77. pmid:29703817
  34. 34. Briese T, Kapoor A, Mishra N, Jain K, Kumar A, Jabado OJ, et al. Virome Capture Sequencing Enables Sensitive Viral Diagnosis and Comprehensive Virome Analysis. mBio. 2015;6(5):e01491–15. pmid:26396248
  35. 35. Wylie TN, Wylie KM, Herter BN, Storch GA. Enhanced virome sequencing using targeted sequence capture. Genome Res. 2015;25(12):1910–20. pmid:26395152
  36. 36. Yamaguchi J, McArthur C, Vallari A, Sthreshley L, Cloherty GA, Berg MG, et al. Complete genome sequence of CG-0018a-01 establishes HIV-1 subtype L. J Acquir Immune Defic Syndr. 2019.
  37. 37. Pollicino T, Raffa G, Santantonio T, Gaeta GB, Iannello G, Alibrandi A, et al. Replicative and transcriptional activities of hepatitis B virus in patients coinfected with hepatitis B and hepatitis delta viruses. Journal of virology. 2011;85(1):432–9. pmid:20962099
  38. 38. Shirvani-Dastgerdi E, Tacke F. Molecular interactions between hepatitis B virus and delta virus. World J Virol. 2015;4(2):36–41. pmid:25964870
  39. 39. Butler EK, Rodgers MA, Coller KE, Barnaby D, Krilich E, Olivo A, et al. High prevalence of hepatitis delta virus in Cameroon. Sci Rep. 2018;8(1):11617. pmid:30072752
  40. 40. Houldcroft CJ, Beale MA, Breuer J. Clinical and biological insights from viral genome sequencing. Nat Rev Microbiol. 2017;15(3):183–92. pmid:28090077
  41. 41. Mongkolrattanothai K, Naccache SN, Bender JM, Samayoa E, Pham E, Yu G, et al. Neurobrucellosis: Unexpected Answer From Metagenomic Next-Generation Sequencing. J Pediatric Infect Dis Soc. 2017;6(4):393–8. pmid:28062553
  42. 42. Wilson MR, Naccache SN, Samayoa E, Biagtan M, Bashir H, Yu G, et al. Actionable diagnosis of neuroleptospirosis by next-generation sequencing. N Engl J Med. 2014;370(25):2408–17. pmid:24896819
  43. 43. Greninger AL. The challenge of diagnostic metagenomics. Expert Rev Mol Diagn. 2018;18(7):605–15. pmid:29898605