Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Deep Sequencing to Identify the Causes of Viral Encephalitis

  • Benjamin K. Chan,

    Affiliation Division of Infectious Diseases, Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, Utah, United States of America

  • Theodore Wilson,

    Affiliation Department of Pathology, University of Utah School of Medicine, Salt Lake City, Utah, United States of America

  • Kael F. Fischer,

    Affiliation Department of Pathology, University of Utah School of Medicine, Salt Lake City, Utah, United States of America

  • John D. Kriesel

    Affiliation Division of Infectious Diseases, Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, Utah, United States of America

Deep Sequencing to Identify the Causes of Viral Encephalitis

  • Benjamin K. Chan, 
  • Theodore Wilson, 
  • Kael F. Fischer, 
  • John D. Kriesel


Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue.


Current diagnostic methods used in cases of infectious encephalitis successfully identify a specific microbiologic cause of the disease in ∼40% of cases.[1], [2] Recent work suggests that a larger number of cases actually have an infectious etiology but are misdiagnosed.[3]. PCR of CSF can be very helpful for identifying DNA viruses (e.g. herpes simplex virus type 1, HSV1) though it is less effective for the detection of RNA viruses (e.g. West Nile Virus).[4] Further limiting the efficacy of all PCR, culture, and antibody-dependent diagnostic methods are the requirements of specialized reagents and a priori knowledge of pathogens to be tested. An incomplete panel of microbial candidates for specific testing can lead to false-negative test results with missed opportunities for effective therapy.[5] Finally, validated PCR primers and protocols sometimes fail to identify known pathogens due to mutations in the primer-binding region, an issue previously addressed by our group in the detection of GB Virus C (GBV-C) in demyelinated human brain.[6]

Deep sequencing offers the prospect of relatively unbiased testing for all previously catalogued and sequenced microbial pathogens in a single test. Where specific PCR, serology and culture focus on a defined set of candidate pathogens, deep sequencing presents a relatively unbiased survey of RNA or DNA sequences present in a sample. Furthermore, this approach does not rely on microbial recovery and isolation, an important attribute given that the microbiome is diverse and, for the most part, cannot be readily cultured.[7] Limitations of the deep sequencing approach for diagnosing infections include: the possible introduction of contaminating sequences into the preparation, difficulties with identifying sequences not included in reference databases (e.g. GenBank) and understanding the significance of rare sequences found within the sample. These problems must be addressed by the use of appropriate controls and, where possible, metagenomic techniques.

In the current study, seven encephalitis cases and fourteen normal brain controls were sequenced and evaluated for the presence of viral sequences. Building upon our recent detection of a novel variant of GBV-C in the brain of an individual who died with primary progressive multiple sclerosis (PPMS), updated bioinformatics methods were used in the current study (Figure 1).[6] Identification of a pathogen was possible in each of the five samples that had a known or strongly suspected infectious etiology, and no pathogen was identified in the two samples without a suspected infectious etiology.


Ethics Statement

This research was submitted to the University of Utah Health Sciences IRB and, since it was performed on de-identified pathologic material, was found to be exempt from review and oversight.


Fourteen frozen normal control and 5 frozen encephalitis brain specimens were obtained from the Rocky Mountain and UCLA Brain Banks. Two additional frozen encephalitis specimens were obtained from Dr. Don Gilden at the University of Colorado. All the specimens were collected post-mortem within 20 hours of death, either fresh frozen or snap frozen in liquid nitrogen and were associated with a neuropathological diagnosis. All 7 diseased specimens were from subjects with encephalitis verified by neuropathology. The samples were assigned to one of two groups: controls (n = 14) and encephalitis (n = 7).

RNA Extraction and RNA-seq

RNA was extracted from frozen brain (volume ∼10 mm3) using a Qiagen (Valencia, CA) RNeasy Blood and Tissue kit. RNA was extracted because all viruses utilize RNA at some point during their lifecycle. The extracted RNA was DNase treated per kit instructions and submitted for sequencing at the University of Utah Next Generation Sequencing Shared Resource Facility. Prior to sequencing, RNA was analyzed on an Agilent Bioanalyzer Nanochip (Agilent Technologies, USA) and evaluated for RNA size, abundance and integrity as previously described.[6] Samples were reverse transcribed and prepared with the Illumina TruSeq kit. To ensure the inclusion of possible RNA genomes, oligo dT selection was not performed. To avoid bias, rRNA selection was also not performed. Deep sequencing was performed using two barcoded samples per lane from a single end (50 bases) on an Illumina HiSeq 2000. The sequences have been deposited in the NCBI dbGaP database at URL (This link will be activated for controlled access on or about May 1, 2014.)

Screening of Reads

Metagenomic analysis of the specimens was performed blind, that is without the benefit of pathology reports or other diagnostic information. The sequence data sets were then screened for quality: FASTQ reads containing five or more positions with an Illumina quality score less than 19 were removed and excluded from analysis, providing High Quality (HQ) reads. The number of identical HQ reads for each obtained sequence was noted in a compressed FASTA format to reduce file size and computing run-times in subsequent analysis steps. Using the Bowtie computer program, each HQ read set was aligned to the human genome (NCBI build GRCh37.68) and human transcriptome.[8], [9] Reads that aligned (using Bowtie) to the human genome, human transcriptome, human and mouse ribosomes, or Φ-X174 (an internal sequencing control) were excluded from further analysis. Sequences that did not align to any of those databases were carried forward in the analysis as “screened reads”.

Non-redundant Viral Database

The sets of screened reads were then aligned to sequences in a non-redundant viral database (NRVDB, using MegaBLAST.[10] The NRVDB was derived from 1,296,974 viral sequences in the GenBank database. It includes 579,282 unique viral sequence records of 31 to 1.2 million bp in length, representing 2480 different viral taxa. The use of this database reduces redundant hits resulting from overrepresented taxa such as HIV, Hepatitis C Virus and Influenza A Virus, which collectively comprise >50% of the total viral records within GenBank.

Determination of Hit-Rates and P-Values

Using MegaBLAST with a word size of 28, individual reads that aligned to NRVDB were considered hits. The normalized-hit-rate (NHR) of every sample (encephalitis and controls) to each viral taxon was calculated by dividing the number of sequences that aligned to the taxon by the number of screened reads obtained for the sample. To judge relative enrichment of virus-like sequences, the NHR of each individual encephalitis experimental sample was compared to the NHR distribution for the control samples, for every viral taxon, as previously described.[6] Using custom software written in the Python programming language with SciPy tools, the Z-Test was used to quantify the statistical significance of any viral-taxon overrepresentations in the encephalitis brain samples compared with controls.[11], [12], [13]

Taxonomy-Based Bioinformatics Follow-Up

Taxa with Bonferroni corrected p-values ≤ 0.01 were analyzed further. MegaBLAST was used to align the screened reads to comprehensive sequence databases of the taxon of interest. Following alignment, contiguous sequences (contigs) were assembled using SSAKE from the reads that aligned to sequences in each taxon of interest.[14] All alignments and contigs were then manually examined to determine whether if they represented human sequence within the taxon-specific database. This determination was based on alignments to the NCBI NR database (MegaBLAST) and examination of the annotations of the GenBank records of the aligning taxon-specific sequences.

Virus-Specific Amplification

All primer sequences used in this study are given in Document S1. RNA and DNA were re-extracted from the encephalitis and control brain samples (Qiagen, Valencia, CA RNeasy Lipid Tissue and DNeasy Blood and Tissue kits) in preparation for VZV- and HSV-specific PCR and measles-specific RT-PCR. HSV and VZV PCR reactions were performed as previously described.[15], [16] An additional set of HSV1 primers was designed directly from the RNA sequencing data. Reaction conditions were the same as previously described.[16] HSV1 strain 17 syn+ (originally obtained from Dr. James Hill, Louisiana State University) diluted to 103 plaque-forming units/ml was used as the positive HSV control. The VZV postive control material was from a VZV+ MeWo cell culture (kindly provided by Dr. Don Gilden, University of Colorado). The VZV control material was used undiluted. Negative control reactions substituted water for the nucleic acid extracts. Ethidium bromide stained 1.5% agarose gels were used to visualize the resulting PCR products.

Measles virus positive control material was RNA derived from the live-attenuated measles vaccine (MMR-II, Merck & Co, Whitehall Station, New Jersey). Measles virus RNA was extracted from one full dose of the vaccine (0.5 ml, ∼1000 pfu; Qiagen RNeasy, Valencia, CA). A random double stranded cDNA amplicon library was generated using a modified (Document S1) Round A/B protocol.[17] Four μl of extracted measles RNA was used as the round A input and the resulting Round B library was used, undiluted, as the measles positive control. The negative control reaction substituted water for the nucleic acid extracts. The RT-PCR method of Rota et. al. was used to detect measles virus in the experimental and control specimens.[18] Ethidium bromide stained 1.5% agarose gels were used to visualize the resulting RT-PCR products. The PCR and RT-PCR products were purified and Sanger sequenced to confirm the identity of the amplicons (HSV1 or measles).


Deep Sequencing

Fifty to 90 million HQ 50 bp reads were obtained from each of the 7 encephalitis samples and 14 normal brain samples, representing RNA present in these brain specimens at the time of collection. Removing reads that aligned to the human genome, transcriptome (NCBI build GRCh37.68), or ribosomes resulted in 199,666 to 907,362 (mean ± SD = 749,443±248,337) screened reads in the encephalitis samples and 216,651 to 2,342,726 (mean ± SD = 944,680±679,117) in the control samples (detailed in Table 1).[19]

Bioinformatic Analysis

For each of the seven encephalitis samples, 2480 taxa were evaluated, providing 17,360 comparisons. A heat map with color intensity representing the negative log transformed and Bonferroni corrected p-values was prepared with Java Treeview (represented in Figure 1).[20] Z-test comparisons of the encephalitis and control brain samples revealed significant viral taxon enrichment in 170 taxon-sample pairs. Each significant pair was systematically evaluated by alignment to both a taxon-specific database and the human genome. A total of 134 sample-taxon pairs, were found to be the consequence of reads that aligned to the human genome, including human endogenous retroviruses, using the lower stringency alignment protocol. These sample-taxon pairs were excluded from further analysis. The remaining 36 taxon-sample pairs were distributed among the 7 encephalitis brain samples (Table 2). These 36 taxon-sample pairs all had p-values <10−8.5 after adjusting for multiple comparisons, indicating enrichment. These significantly enriched taxon-sample pairs represented several different viral families; the number of significant pairs is shown in parenthesis: Herpesviridae (17), Paramyxoviridae (10), Poxviridae (6), Hepeviridae (2) and Flaviviridae (1).

Taxon-Specific Follow-Up

Assembly of reads that aligned to the taxon-specific follow up databases resulted in apparently viral contigs ranging from 66 to 4019 bp long in 5 of the 7 encephalitis samples (Table 2). These contigs were re-aligned to the taxon-specific database as well as the human genome with MegaBLAST.

Groups of closely related taxa were identified as significant using this method. For instance, sequence records from several paramyxovirus family members were significantly associated with samples CO-A and CO-B. Considering sequence homologies and human origin of the samples, multiple viral taxa (canine distemper virus, rinderpest virus, cetacean morbillivirus) were excluded from specific PCR follow-up, and the most closely related human virus (also significantly overrepresented) was used for PCR validation. Furthermore, each alignment was examined manually to determine if the aligning GI contained annotated human sequence, or if it was aligned to a human sequence not found in the canonical human genome and human transcriptome. These results and the resulting viral candidates chosen for specific PCR follow-up are shown in Table 2.

PCR Confirmation of Viral Sequence

HSV1 sequences were confirmed by Sanger sequencing of PCR products obtained from brain samples 710 and 4403, both from subjects with herpes encephalitis indicated in neuropathology reports (Figure 2, Table 3). Samples Co-A and Co-B were from subjects with subacute sclerosing panencephalitis (SSPE), according to their associated pathology reports. The deep sequencing analysis indicated the presence of the measles virus (MV). This was confirmed by specific PCR (Figure 2). Furthermore, the depth of sequencing coverage in Co-A and Co-B was sufficient to identify mutations in the genomes of each MV strain known to be present in SSPE-causing isolates. [21] Deep sequencing did not identify any viral candidates in samples 1418 and 4471. Specific PCR and RT-PCR for HSV1, VZV, and MV produced no amplicons in these samples (Table 3).

Figure 2. Sequence alignment and experimental confirmation of viral identity.

Panel A: 50 bp reads mapping to the HSV1 (strain F, gi: 290766003) genome and measles virus (Edmonston strain, gi: 331784) genome are shown. Panel B: Virus–specific PCR amplicons are shown with the positive control (viral DNA: HSV 17 syn+, clinical strain VZV, MV extracted from vaccine) and negative controls (ddH2O). Custom primers were required to amplify HSV1 from specimen 924, confirmed by amplicon sequencing (data not shown).

Table 3. Comparison of deep sequencing, molecular and pathology results.

In four of five cases, virus calls from the bioinformatic analysis of the deep sequencing data and the subsequent pathogen specific amplification were concordant with the prior clinical diagnoses. However, our metagenomic analysis did not reveal the presence of VZV sequence in sample 924, although that sample's neuropathology report, dating from 1985, identified VZV as the likely cause of encephalitis. PCR with validated diagnostic primers also failed to confirm the presence of VZV in the sample (data not shown). The deep sequencing of sample 924 did indicate the presence of HSV1, although the validated diagnostic HSV1 primers used successfully on samples 710 and 4403 [16] failed to yield a PCR product with sample 924. Based on the deep sequencing contig alignments, a novel set of primers was designed. A product of the expected size (∼800 bp, data not shown) was obtained, using methods previously described.[16] The sequence of the product was found to be 100% identical to the major capsid protein gene (UL19) of HSV1 McKrae. Thus, the HSV1 present in the brain of sample 924 may represent a strain with mutations present in region of the DNA polymerase gene amplified by the published diagnostic PCR primers.

DNA extracted from each of the 14 control brain specimens were also interrogated with the set of specific HSV1 and VZV primers. No product was obtained for any of the 14 controls with these primer sets (data not shown). Likewise, RT-PCR interrogation of the 14 control specimens for MV yielded no amplicons.


This study demonstrates the utility of using deep-sequencing to identify viral etiologies in encephalitis. As observed previously, validated PCR primers sometimes fail to amplify agents against which they have been validated.[5], [22] Sample 924 was found to contain HSV1, yet diagnostic HSV primers failed to yield a product. PCR primers derived directly from the metagenomic sequence obtained via deep sequencing were eventually found to be effective. The point of these PCR experiments was simply to validate the deep sequencing results and is not seen as an advance in diagnostic PCR reagents for these viruses per se.

In four out of five cases, there was agreement between neuropathology reports and the deep sequencing results. Given the fact that no specific reagents were used in these neuropathological examinations, this may be considered remarkable agreement. However, the lack of specificity inherent in these examinations was illustrated by the discordant result obtained for sample 924, where VZV was suspected and HSV1 was detected by deep sequencing and validated by PCR.

In samples with high titers of virus, the coverage obtained from deep sequencing experiments is sometimes sufficient to elucidate many genomic features of the infectious agent. For example, in samples Co-A and Co-B, the signature genomic mutations associated with defective, SSPE-causing virus were easily observed.

Deep sequencing viral detection methods rely upon alignment of the experimental sequences to known viral sequences. This work used the viral nucleotide sequences in GenBank. Using this approach, identification of a previously unknown virus in a sample is possible only if one or more related viruses are present in the database. The nonredundant viral database used here is comprehensive by current (2013) standards, but will need to be updated as new records are added to GenBank.

Deep sequencing technologies are changing rapidly and the analysis of the deep sequencing data is becoming more efficient. As sequencing technologies improve and become more cost effective, their application in clinical diagnostics may become commonplace, especially for the identification of pathogens.[23], [24], [25] This study demonstrates the feasibility of using deep sequencing to identify viral causes of encephalitis.

Supporting Information

Document S1.

Specifies the PCR primer sets used in this study.



The authors would like to acknowledge: Dr. Rashed M. Nagra, Director, Human Brain and Spinal Fluid Resource Center (UCLA Brain Bank), Los Angeles, CA for the provision of specimens; Dr. John Corboy, Director, Rocky Mountain MS Center, Denver, CO for the provision of specimens; and Brian Dalley from Huntsman Cancer Institute Microarray Core Facility who supervised the preparation of cDNA libraries and the Illumina sequencing.

Author Contributions

Conceived and designed the experiments: BKC TAW KFF JK. Performed the experiments: BKC TAW KFF. Analyzed the data: BKC TAW KFF JK. Contributed reagents/materials/analysis tools: KF JK. Wrote the paper: BKC TAW KFF JK.


  1. 1. Glaser CA, Gilliam S, Schnurr D, Forghani B, Honarmand S, et al. (2003) In search of encephalitis etiologies: diagnostic challenges in the California Encephalitis Project, 1998–2000. Clin Infect Dis 36: 731–742.
  2. 2. Glaser CA, Honarmand S, Anderson LJ, Schnurr DP, Forghani B, et al. (2006) Beyond viruses: clinical profiles and etiologies associated with encephalitis. Clin Infect Dis 43: 1565–1577.
  3. 3. Mailles A, Stahl J-P (2009) Committee botS, Group tI (2009) Infectious Encephalitis in France in 2007: A National Prospective Study. Clinical Infectious Diseases 49: 1838–1847.
  4. 4. Gea-Banacloche J, Johnson RT, Bagic A, Butman JA, Murray PR, et al. (2004) West Nile virus: pathogenesis and therapeutic options. Ann Intern Med 140: 545–553.
  5. 5. Chiu CY, Rouskin S, Koshy A, Urisman A, Fischer K, et al. (2006) Microarray detection of human parainfluenzavirus 4 infection associated with respiratory failure in an immunocompetent adult. Clin Infect Dis 43: e71–76.
  6. 6. Kriesel JD, Hobbs MR, Jones BB, Milash B, Nagra RM, et al. (2012) Deep Sequencing for the Detection of Virus-Like Sequences in the Brains of Patients with Multiple Sclerosis. PLoS One 7: e3188.
  7. 7. Relman DA (2012) Microbiology: Learning about who we are. Nature 486: 194–195.
  8. 8. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10: R25.
  9. 9. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, et al. (2002) The human genome browser at UCSC. Genome Res 12: 996–1006.
  10. 10. Zhang Z, Schwartz S, Wagner L, Miller W (2000) A greedy algorithm for aligning DNA sequences. J Comput Biol 7: 203–214.
  11. 11. Van Rossum G, De Boer J (1991) Interactively Testing Remote Servers Using the Python Programming Language. CWI Quarterly 4: 283–303.
  12. 12. Steel CD, Kim WK, Sanford LD, Wellman LL, Burnett S, et al. (2010) Distinct macrophage subpopulations regulate viral encephalitis but not viral clearance in the CNS. J Neuroimmunol 226: 81–92.
  13. 13. Jones E, Oliphant T, Peterson P (2001) SciPy: Open Source Scientific Tools for Python.
  14. 14. Warren RL, Sutton GG, Jones SJ, Holt RA (2007) Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23: 500–501.
  15. 15. Puchhammer-Stockl E, Popow-Kraupp T, Heinz FX, Mandl CW, Kunz C (1991) Detection of varicella-zoster virus DNA by polymerase chain reaction in the cerebrospinal fluid of patients suffering from neurological complications associated with chicken pox or herpes zoster. J Clin Microbiol 29: 1513–1516.
  16. 16. Stevenson J, Hymas W, Hillyard D (2005) Effect of sequence polymorphisms on performance of two real-time PCR assays for detection of herpes simplex virus. J Clin Microbiol 43: 2391–2398.
  17. 17. Wang D, Urisman A, Liu YT, Springer M, Ksiazek TG, et al. (2003) Viral discovery and sequence recovery using DNA microarrays. PLoS Biol 1: E2.
  18. 18. Rota PA, Khan AS, Durigon E, Yuran T, Villamarzo YS, et al. (1995) Detection of measles virus RNA in urine specimens from vaccine recipients. J Clin Microbiol 33: 2485–2488.
  19. 19. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, et al. (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921.
  20. 20. Saldanha AJ (2004) Java Treeview–extensible visualization of microarray data. Bioinformatics 20: 3246–3248.
  21. 21. Patterson JB, Cornu TI, Redwine J, Dales S, Lewicki H, et al. (2001) Evidence that the hypermutated M protein of a subacute sclerosing panencephalitis measles virus actively contributes to the chronic progressive CNS disease. Virology 291: 215–225.
  22. 22. Chiu CY, Alizadeh AA, Rouskin S, Merker JD, Yeh E, et al. (2007) Diagnosis of a critical respiratory illness caused by human metapneumovirus by use of a pan-virus microarray. J Clin Microbiol 45: 2340–2343.
  23. 23. Greninger AL, Runckel C, Chiu CY, Haggerty T, Parsonnet J, et al. (2009) The complete genome of klassevirus - a novel picornavirus in pediatric stool. Virol J 6: 82.
  24. 24. Lim ES, Reyes A, Antonio M, Saha D, Ikumapayi UN, et al. (2013) Discovery of STL polyomavirus, a polyomavirus of ancestral recombinant origin that encodes a unique T antigen by alternative splicing. Virology 436: 295–303.
  25. 25. Zaki AM, van Boheemen S, Bestebroer TM, Osterhaus AD, Fouchier RA (2012) Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia. N Engl J Med 367: 1814–1820.