Conceived and designed the experiments: NLY PS-C MDS AB EH JLD. Performed the experiments: NLY MDS. Analyzed the data: NLY PS-C MDS. Contributed reagents/materials/analysis tools: AB PS-C EH JLD. Wrote the paper: NLY PS-C MDS EH JLD. Designed software used in analysis: PS-C.
The authors have declared that no competing interests exist.
Dengue virus is an emerging infectious agent that infects an estimated 50–100 million people annually worldwide, yet current diagnostic practices cannot detect an etiologic pathogen in ∼40% of dengue-like illnesses. Metagenomic approaches to pathogen detection, such as viral microarrays and deep sequencing, are promising tools to address emerging and non-diagnosable disease challenges. In this study, we used the Virochip microarray and deep sequencing to characterize the spectrum of viruses present in human sera from 123 Nicaraguan patients presenting with dengue-like symptoms but testing negative for dengue virus. We utilized a barcoding strategy to simultaneously deep sequence multiple serum specimens, generating on average over 1 million reads per sample. We then implemented a stepwise bioinformatic filtering pipeline to remove the majority of human and low-quality sequences to improve the speed and accuracy of subsequent unbiased database searches. By deep sequencing, we were able to detect virus sequence in 37% (45/123) of previously negative cases. These included 13 cases with Human Herpesvirus 6 sequences. Other samples contained sequences with similarity to sequences from viruses in the
Dengue virus infection is a global health concern, affecting as many as 100 million people annually worldwide. A critical first step to proper treatment and control of any virus infection is a correct diagnosis. Traditional diagnostic tests for viruses depend on amplification of conserved portions of the viral genome, detection of the binding of antibodies to viral proteins, or replication of the virus in cell cultures. These methods have a major shortcoming: they are unable to detect divergent or novel viruses for which
Viral infections pose a significant global health burden, especially in the developing world where most infectious disease deaths occur in children and are commonly due to preventable or treatable agents. Effective diagnostic and surveillance tools are crucial for reducing disability-adjusted-life-years (DALYs) due to infectious agents and for bolstering elimination and treatment programs
Dengue virus (DENV) infection is the most common arthropod-borne viral disease of humans, with an estimated 50–100 million clinical infections occurring annually worldwide
Traditional viral detection methods, such as serology, virus isolation, and PCR, are optimized for the detection of known agents
Metagenomic analysis enables more systemic detection of both known and novel viral pathogens
This study describes the use of the Virochip microarray and deep sequencing for the direct viral diagnosis of serum from cases of acute pediatric febrile illness in a tropical urban setting. Patient clinical data and serum samples were collected between 2005 and 2009 as part of an ongoing pediatric dengue study in Managua, Nicaragua
Acute serum samples were collected from suspected dengue cases at the Hospital Infantil Manuel de Jesús Rivera (HIMJR), the National Pediatric Reference Hospital in Managua, Nicaragua, after undergoing informed consent or the informed consent procedure. Patients were enrolled in the study if they presented with fever or history of fever less than 7 days and one or more of the following signs and symptoms: headache, arthralgia, myalgia, retro-orbital pain, positive tourniquet test, petechiae, or signs of bleeding. Patients with a defined diagnosis other than dengue,
Approximately one half of the suspected dengue cases testing negative by all four dengue diagnostic assays were included in the metagenomics analysis described here. 34 cases (pools 1–4, see below) corresponded to the subset of patients who presented within 4 days of symptom onset and who reported both fever or history of fever and rash. 89 of the samples (pool 5) were selected randomly from among the remaining samples. As positive controls, seven samples (pool 5) that had been clinically diagnosed as virus positive were included. The study protocol was reviewed and approved by the Institutional Review Boards (IRB) of the University of California, Berkeley, and of the Nicaraguan Ministry of Health.
Total nucleic acid from 140 µl of serum was extracted using the QIAamp Viral RNA Isolation Kit (Qiagen), which co-purifies RNA and DNA. End-tagged dsDNA libraries were created essentially as previously described
For microarray hybridization, a fraction of each library was amplified by PCR as above but with a modified dNTP mixture including 5-(3-aminoallyl)-dUTP (Ambion) in lieu of 75% of the dTTP normally in the mixture. The resulting amino-allyl-containing DNA was purified using a DNA Clean and Concentrator-5 column (Zymo Research). The eluate was heat denatured at 95°C for 2 min, cooled briefly on ice, then fluorescently labeled in reactions containing 100 mM sodium bicarbonate pH 9, 10% DMSO, and 667 µM Cy3 mono NHS ester (GE Healthcare) for 1 hour at 25°C. Labeled DNA was purified using DNA-CC-5 columns and added to hybridization reactions containing 3×SSC, 25 mM HEPES pH 7.4, and 0.25% SDS. Hybridization mixtures were heated at 95°C for 2 minutes, applied to microarrays, and hybridized overnight at 65°C. Following hybridization, arrays were washed twice in 0.57× SSC and 0.028% SDS and twice in 0.057× SSC, then scanned on an Axon GenePix 4000B microarray scanner. Three analysis tools were used to analyze Virochip data: E-predict
For deep sequencing, the Illumina paired-end adapter sequences were appended to library molecules using PCR, essentially as previously described
In some cases, PCR and Sanger sequencing was used to confirm Virochip and deep sequencing calls and to recover additional sequence. Primer sequences are listed in
Full-length poliovirus genomic RNA was transcribed from MluI-linearized plasmid prib(+)XpA using T7 RNA polymerase as previously described
Predicted circovirus-like replicase sequences were searched against the NCBI non-redundant protein database (BLASTx, E value 10−2). Aligning sequences were retrieved and consolidated using CD-HIT into a set of representative sequences
The initial FASTQ data from each pool's lane were binned by barcode. The barcode-split reads were trimmed of non-template deriving and potentially error-prone sequence: a randomly incorporated nucleotide (N), the barcode bases, and the sequence corresponding to the random hexamer, leaving 55 (pools 1, 2, and 4), 54 (pool 3), or 90 (pool 5) bases per read. The lowest complexity fraction was identified by sequences with LZW ratios (compressed size/uncompressed size) less than 0.45
In order to make specific virus-positive calls, we implemented a set of rules to minimize false positives while maintaining sensitivity. In order to reduce the number of false positive sequences that may share identity equally with both viral and non-viral genomes, we restricted our analysis to those queries whose best alignments were only to animal viral sequences. In a number of datasets, we detected human klassevirus 1, a virus identified and studied in our lab
We initially screened the serum samples with the Virochip pan viral detection microarray. This was done as a complement to the deep sequencing analysis and in order to compare the sensitivity of the two approaches. We included 7 blinded positive control samples that had been previously diagnosed in the clinic as being positive for DENV-2 (n = 4), DENV-1 (n = 1), or hepatitis A virus (HAV; n = 2). The Virochip successfully identified the correct virus in all of these positive controls, and in the case of the dengue virus positive samples, the correct serotype as well (
Patient code | Clinic virus ID | Virochip virus ID | Sequencing virus ID | Virus TaxID |
# virus reads | # initial reads | Fraction virus reads |
187 | DENV-2 | DENV-2 | Dengue virus 2 | 11060 | 4280 | 1.1E+06 | 3.9E−03 |
275 | DENV-2 | DENV-2 | Dengue virus 2 | 11060 | 1511 | 1.6E+06 | 9.7E−04 |
282 | DENV-2 | DENV-2 | Dengue virus 2 | 11060 | 699 | 1.6E+06 | 4.2E−04 |
266 | DENV-2 | DENV-2 | Dengue virus 2 | 11060 | 135749 | 4.8E+06 | 2.8E−02 |
274 | DENV-1 | DENV-1 | Dengue virus 1 | 11053 | 27 | 1.2E+06 | 2.3E−05 |
401 |
HAV | HAV | Hepatitis A virus | 12092 | 2164 | 1.8E+05 | 1.2E−02 |
401 |
HAV | HAV | Hepatitis A virus | 12092 | 4562 | 1.3E+06 | 3.5E−03 |
235 | - | - | Human herpesvirus 6 | 10368 | 116 | 5.5E+06 | 2.1E−05 |
451 | - | - | Human herpesvirus 6 | 10368 | 88 | 2.7E+06 | 3.2E−05 |
207 | - | - | Human herpesvirus 6 | 10368 | 390 | 9.6E+06 | 4.1E−05 |
432 | - | - | Human herpesvirus 6 | 10368 | 411 | 3.5E+06 | 1.2E−04 |
574 | - | - | Human herpesvirus 6 | 10368 | 138 | 3.2E+06 | 4.4E−05 |
370 | - | - | Human herpesvirus 6 | 10368 | 90 | 3.2E+06 | 2.9E−05 |
78 | - | - | Human herpesvirus 6 | 10368 | 113 | 1.2E+06 | 9.8E−05 |
131 | - | - | Human herpesvirus 6 | 10368 | 24 | 1.2E+06 | 2.0E−05 |
183 | - | - | Human herpesvirus 6 | 10368 | 66 | 3.0E+06 | 2.2E−05 |
270 | - | - | Human herpesvirus 6 | 10368 | 28 | 1.2E+06 | 2.4E−05 |
344 | - | - | Human herpesvirus 6 | 10368 | 303 | 1.3E+06 | 2.2E−04 |
350 | - | - | Human herpesvirus 6 | 10368 | 48 | 3.0E+06 | 1.6E−05 |
438 | - | - | Human herpesvirus 6 | 10368 | 72 | 4.4E+06 | 1.6E−05 |
315 | - | - | African swine fever virus | 10497 | 42 | 1.9E+06 | 2.2E−05 |
382 | - | - | Human herpesvirus 4 | 10376 | 44 | 9.6E+05 | 4.6E−05 |
387 | - | - | GB virus C | 54290 | 171 | 9.0E+06 | 1.9E−05 |
180 | - | - | GB virus C | 54290 | 42 | 8.0E+05 | 5.2E−05 |
161 | - | - | Human parvovirus B19 | 10798 | 14 | 3.0E+06 | 4.7E−06 |
118 | - | - | Circovirus-like genome RW-E | 642255 | 177 | 7.4E+06 | 2.4E−05 |
323 | - | - | Circovirus-like genome RW-E | 642255 | 12 | 5.0E+06 | 2.4E−06 |
363 | - | - | Circovirus-like genome RW-E | 642255 | 17 | 1.9E+06 | 8.9E−06 |
371 | - | - | Circovirus-like genome RW-E | 642255 | 21 | 1.6E+06 | 1.3E−05 |
387 | - | - | Circovirus-like genome RW-E | 642255 | 92 | 9.0E+06 | 1.0E−05 |
355 | - | - | Beak and feather disease virus | 77856 | 12 | 2.1E+06 | 5.7E−06 |
345 | - | - | Beak and feather disease virus | 77856 | 62 | 2.2E+06 | 2.9E−05 |
315 | - | - | Swan circovirus | 459957 | 26 | 1.9E+06 | 1.4E−05 |
329 | - | - | Gull circovirus | 400121 | 14 | 2.2E+06 | 6.3E−06 |
321 | - | - | Porcine circovirus 1 | 133704 | 30 | 4.6E+06 | 6.5E−06 |
375 | - | - | Porcine circovirus 1 | 133704 | 53 | 3.8E+06 | 1.4E−05 |
377 | - | - | Cyclovirus PK5034 | 742916 | 81 | 6.6E+06 | 1.2E−05 |
322 | - | - | Cyclovirus PK5222 | 742917 | 206 | 3.8E+06 | 5.5E−05 |
235 | - | - | Torque teno virus | 68887 | 23 | 5.5E+06 | 4.2E−06 |
73 | - | TTV | Torque teno midi virus 1 | 687379 | 137 | 6.9E+06 | 2.0E−05 |
505 | - | - | Torque teno virus | 68887 | 37 | 6.9E+06 | 5.3E−06 |
505 | - | - | Small anellovirus | 393049 | 25 | 6.9E+06 | 3.6E−06 |
457 | - | - | Torque teno virus | 68887 | 29 | 1.5E+07 | 1.9E−06 |
171 | - | - | Torque teno mini virus 2 | 687370 | 18 | 1.2E+06 | 1.6E−05 |
159 | - | TTV | Torque teno mini virus 5 | 687373 | 143 | 2.6E+06 | 5.6E−05 |
179 | - | - | Torque teno mini virus 1 | 687369 | 17 | 1.8E+06 | 9.3E−06 |
193 | - | - | Torque teno mini virus 2 | 687370 | 56 | 1.6E+06 | 3.6E−05 |
183 | - | TTV | Torque teno mini virus 3 | 687371 | 139 | 3.0E+06 | 4.6E−05 |
156 | - | TTV | Torque teno midi virus 1 | 687379 | 213 | 2.3E+06 | 9.1E−05 |
186 | - | - | Torque teno virus 15 | 687354 | 1701 | 2.0E+06 | 8.3E−04 |
282 | - | TTV | Torque teno midi virus 1 | 687379 | 61 | 1.6E+06 | 3.7E−05 |
335 | - | - | Torque teno virus | 68887 | 47 | 1.7E+06 | 2.8E−05 |
330 | - | - | TTV-like mini virus | 93678 | 77 | 1.8E+06 | 4.2E−05 |
270 | - | - | Torque teno virus 8 | 687347 | 82 | 1.2E+06 | 7.1E−05 |
331 | - | - | Torque teno midi virus 2 | 687380 | 113 | 1.4E+06 | 8.2E−05 |
349 | - | TTV | Torque teno midi virus | 432261 | 47 | 1.6E+06 | 2.9E−05 |
350 | - | TTV | Torque teno mini virus 4 | 687372 | 51 | 3.0E+06 | 1.7E−05 |
566 | - | TTV | Torque teno mini virus 4 | 687372 | 206 | 1.9E+06 | 1.1E−04 |
377 | - | - | Torque teno mini virus 4 | 687372 | 153 | 6.6E+06 | 2.3E−05 |
168 | TTV |
1.9E+05 | |||||
263 | - | TTV | 1.5E+06 |
The NCBI TaxID and name of the virus species with the highest number of hits among those viruses with BLAST hits is given.
These two samples were prepared from aliquots of the same serum sample.
In its deep sequencing dataset, Sample 168 had 9 reads matching TTV, just below our positive identification threshold.
We applied
A total of 130 serum samples were deep sequenced, including 7 positive controls and 123 previously undiagnosed samples. We performed deep sequencing on 34 of the serum samples using the Illumina GAII platform, generating a total of 184.6 million 65-nucleotide long paired-end reads (one flow cell lane each for four sample pools, 12.0 billion bases total, median of 3.7 million reads per sample). We sequenced 96 serum samples (pool five) on a HiSeq 2000 instrument, which provides more, longer sequences per run. The HiSeq run generated 196.4 million 97-nt sequences (one flow cell lane, 19 billion bases total; median of 1.7 million reads per sample).
The raw reads were first separated by barcode and analyzed as individual data sets as described in the
Average percent remaining reads after each of the filtering steps. Low-quality and low-complexity reads are removed first, followed by iterative BLAT and BLAST comparisons to human sequence. Averages were calculated for all samples (n = 130). Inset: secondary pipeline depicting post-filtering viral searches. The dashed bubble includes future methods to improve the sensitivity of viral sequence detection.
The reads remaining after filtering were then compared to sequences in the NCBI non-redundant nucleotide and protein databases using BLASTn and BLASTx respectively. Virus-derived sequences were detected in all 7 positive control samples and in 45/123 (37%) of previously negative serum samples (
We recovered virus sequences matching the expected viral genomes in all of the positive control samples. The fraction of viral sequences in the controls spanned 4 orders of magnitude, from 0.002% to 2.8% of total reads. The two HAV positive control samples (#401) were aliquots of the same serum sample and were processed and analyzed independently. The fraction of viral reads in the duplicates was within 4-fold (0.4% and 1.2%). This demonstrates that our library preparation, sequencing, and bioinformatics pipeline is capable of reproducibly detecting evidence of clinically relevant infections.
In addition to the controls, two non-control samples contained evidence of RNA virus sequence. Both samples had reads deriving from GB Virus C (GBV-C, also known as Hepatitis G Virus) and were essentially identical to GBV-C database sequences. We detected no sequences that best aligned to dsRNA viruses or to retroviruses (except for human endogenous retrovirus and contaminating MLV RT-derived sequences, see
Human Herpesvirus 6 (HHV-6) sequence was detected in 13/123 previously negative samples (10.6%). The HHV-6 positive samples had an average normalized read count of 145 HHV-6 reads per sample (range: 24–411), representing 0.002% to 0.02% of the datasets (
Histograms of HHV-6B genome coverage generated by aligning reads with minimum 90% identity over the total read length to the genome. The depth of sequence coverage was calculated as the total Kb of aligned sequence per 1 Kb bin over the HHV-6B reference genome. Genome track representation adapted from Dominguez
In addition to HHV-6, we detected Human Herpesvirus 4 (HHV-4, also known as Epstein Barr Virus) sequences in one sample. As with HHV-6, The HHV-4 sequences were virtually identical to previously reported sequences. One sample also contained reads similar to another dsDNA virus, African Swine Fever Virus (ASFV), which has been previously detected in human serum
We also identified sequences derived from single-stranded DNA viruses in some samples. In one sample we detected Parvovirus B19-derived reads with high identity to database sequences. Sequences related to various members of the
Sequences similar to members of the
We termed the extended replicase-like sequences Circovirus-like NI/2007 1–3 (Cvl-NI 1–3), and compared them to a representative set of other replicase sequences (
Phylogenetic neighbor-joining tree of amino acid sequences showing the relationship between Circovirus-like NI rep sequences (red) and 19 representative replicase sequences. Abbreviations: CV, circovirus, Ba, Barbel, Bat, Bat ZS/Yunnan-China/2009, BFDV, beak and feather disease virus, Ca, Canary Circo-like Circovirus-like genome, Cyclo, cyclovirus, PKbeef, PKbeef23/PAK/2009, Du, Muscovy duck, Ed,
A subset of the positive samples (
In this study, we examined the virus diversity in serum samples from Nicaraguan children with unknown acute febrile illness. We performed Virochip microarray and deep sequencing analyses on 7 positive control and 123 undiagnosed samples. Both of these methods succeeded in detecting the expected virus in the positive control samples. Virochip analysis produced putative viral hits in 10/123 (8%) of the previously negative samples, whereas deep sequencing revealed virus or virus-like sequences in 45/123 (37%). This study demonstrates the utility of these metagenomic strategies to detect virus sequence in multiple human serum samples and is the first to utilize second-generation sequencing to simultaneously investigate many cases of acute unknown tropical illness.
Monitoring the emergence and spread of novel human pathogens in tropical regions is a central public health concern. Metagenomic analysis enables more systemic viral detection of both known and novel viral pathogens
We detected virus sequence at concentrations as low as ∼2 in 106 reads. Virus sequence detected in a clinical sample at vanishingly low copy numbers may reflect several possible host-microbe scenarios. The sequence detected may be that of a pathogenic virus capable of causing illness at low copy number or through indirect effects, a ubiquitous non-disease causing microbe, a virus outside of its primary replication site, low-level contamination, an artifact of sample collection timing/processing, or remains of incomplete immune clearance. Additional evidence must be considered in each case to define the host-microbe relationship.
In this study, we compared the performance of the Virochip and deep sequencing for detecting virus sequence in human serum. The limit of detection of the Virochip was approximately one part in 105 for the poliovirus controls, for which there are microarray probes with perfect sequence complementarity (
We were unable to detect a virus in two thirds of the 123 dengue-like illness samples. These results could reflect true negative status, which would result from a non-viral infection, illness due to non-infectious agent, or complete immunologic clearance. Alternatively, the negative results could reflect failures in our diagnostic approaches due to imperfect sensitivity, unsatisfactory sample preparation, improper sample type, or failure to recognize highly divergent viral sequences. The presence of sequences that lack even remote similarities to known species also highlights the need for further development of
Determining the etiology of human diseases with symptoms that overlap with dengue-like illness is important for understanding the full spectrum of emerging or previously uncharacterized pathogens in tropical populations. In this study, 10% of acute serum samples negative for dengue virus from cases of pediatric dengue-like illness were positive for HHV-6. Primary HHV-6 infection causes undifferentiated febrile illness and
After acute infection, HHV-6 can latently persist in the host quiescently, with no production of infectious virions or with low levels of viral replication. Latency is believed to endure in several cell types, including monocytes and bone marrow progenitor cells
Primary HHV-6 infection is a major cause (∼20%) of infant hospitalizations in the United States
Similarly, the one sample positive for Parvovirus B19 sequence may be a case of acute infection with a commonly acquired childhood virus. Parvovirus B19 can manifest as
Epstein Barr Virus (HHV-4) sequences were found in the serum of one patient who presented with relatively severe symptoms, and died during hospitalization (
In addition to the viruses for which a plausible disease association exists, many samples contained sequences from viruses with no well-established link to human disease. These included the two samples positive for GBV-C and those containing ASFV-like, TTV-like, and circovirus-like sequences.
The
Metagenomic approaches provide an effective high-throughput method to detect uncharacterized virus diversity in a tropical setting from many samples simultaneously. The findings presented in this study further our knowledge of well-characterized and previously unknown viruses present in serum collected from pediatric dengue-like illness patients and advance our understanding of the application of metagenomic approaches to human pathogen detection. Deep sequencing analysis of clinical samples holds tremendous promise as a diagnostic tool by permitting the detection of many different viruses simultaneously, including those present at low-copy numbers and of divergent origin. Major remaining barriers to high-throughput sequencing strategies becoming standard diagnostic practice include prohibitive cost, lengthy sample preparation time, and computationally intensive data analysis requirements. These challenges are magnified in resource-limited settings, such as Nicaragua, but are gradually being addressed. Industry hardware and technical advancements have steadily decreased the per-base cost of deep sequencing, and the results presented here strengthen our expectations of multiplexed sample preparation and bioinformatic data filtering within the framework of current second-generation sequencing platforms. Long-term bi-directional partnerships with developing country collaborators facilitate easier access to techniques not currently available on-site, such as deep sequencing, and are also important in providing training opportunities for local scientists and developing relevant pathogen tests and diagnostic policies.
This study expands our understanding of the virus diversity in pediatric dengue-like illness in Nicaragua and the application of genomic detection techniques in a tropical setting, findings that are particularly valuable given the pressing need for improved global emerging pathogen surveillance.
(PDF)
(PDF)
(PDF)
(PDF)
We thank the children and families participating in the Dengue Clinical Study and the study staff and physicians at the Hospital Infantil Manuel de Jesús Rivera, in particular Crisanta Rocha, Sheyla Silva, Maria Angeles Pérez, Federico Narvaez, Gamaliel Gutierrez, Cintia Saborío, and Julia Medina. We are also grateful to the study personnel at the Nicaraguan National Virology Laboratory at the Nicaraguan Ministry of Health and the Sustainable Sciences Institute in Nicaragua.