Deep Sequencing to Identify the Causes of Viral Encephalitis

Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue.


Introduction
Current diagnostic methods used in cases of infectious encephalitis successfully identify a specific microbiologic cause of the disease in ,40% of cases. [1,2] Recent work suggests that a larger number of cases actually have an infectious etiology but are misdiagnosed. [3]. PCR of CSF can be very helpful for identifying DNA viruses (e.g. herpes simplex virus type 1, HSV1) though it is less effective for the detection of RNA viruses (e.g. West Nile Virus). [4] Further limiting the efficacy of all PCR, culture, and antibody-dependent diagnostic methods are the requirements of specialized reagents and a priori knowledge of pathogens to be tested. An incomplete panel of microbial candidates for specific testing can lead to false-negative test results with missed opportunities for effective therapy. [5] Finally, validated PCR primers and protocols sometimes fail to identify known pathogens due to mutations in the primer-binding region, an issue previously addressed by our group in the detection of GB Virus C (GBV-C) in demyelinated human brain. [6] Deep sequencing offers the prospect of relatively unbiased testing for all previously catalogued and sequenced microbial pathogens in a single test. Where specific PCR, serology and culture focus on a defined set of candidate pathogens, deep sequencing presents a relatively unbiased survey of RNA or DNA sequences present in a sample. Furthermore, this approach does not rely on microbial recovery and isolation, an important attribute given that the microbiome is diverse and, for the most part, cannot be readily cultured. [7] Limitations of the deep sequencing approach for diagnosing infections include: the possible introduction of contaminating sequences into the preparation, difficulties with identifying sequences not included in reference databases (e.g. GenBank) and understanding the significance of rare sequences found within the sample. These problems must be addressed by the use of appropriate controls and, where possible, metagenomic techniques.
In the current study, seven encephalitis cases and fourteen normal brain controls were sequenced and evaluated for the presence of viral sequences. Building upon our recent detection of a novel variant of GBV-C in the brain of an individual who died with primary progressive multiple sclerosis (PPMS), updated bioinformatics methods were used in the current study ( Figure 1). [6] Identification of a pathogen was possible in each of the five samples that had a known or strongly suspected infectious etiology, and no pathogen was identified in the two samples without a suspected infectious etiology.

Ethics Statement
This research was submitted to the University of Utah Health Sciences IRB and, since it was performed on de-identified pathologic material, was found to be exempt from review and oversight.

Samples
Fourteen frozen normal control and 5 frozen encephalitis brain specimens were obtained from the Rocky Mountain and UCLA Brain Banks. Two additional frozen encephalitis specimens were obtained from Dr. Don Gilden at the University of Colorado. All the specimens were collected post-mortem within 20 hours of death, either fresh frozen or snap frozen in liquid nitrogen and were associated with a neuropathological diagnosis. All 7 diseased specimens were from subjects with encephalitis verified by neuropathology. The samples were assigned to one of two groups: controls (n = 14) and encephalitis (n = 7).

RNA Extraction and RNA-seq
RNA was extracted from frozen brain (volume ,10 mm 3 ) using a Qiagen (Valencia, CA) RNeasy Blood and Tissue kit. RNA was extracted because all viruses utilize RNA at some point during their lifecycle. The extracted RNA was DNase treated per kit instructions and submitted for sequencing at the University of Utah Next Generation Sequencing Shared Resource Facility. Prior to sequencing, RNA was analyzed on an Agilent Bioanalyzer Nanochip (Agilent Technologies, USA) and evaluated for RNA size, abundance and integrity as previously described. [

Screening of Reads
Metagenomic analysis of the specimens was performed blind, that is without the benefit of pathology reports or other diagnostic information. The sequence data sets were then screened for quality: FASTQ reads containing five or more positions with an Illumina quality score less than 19 were removed and excluded from analysis, providing High Quality (HQ) reads. The number of identical HQ reads for each obtained sequence was noted in a compressed FASTA format to reduce file size and computing runtimes in subsequent analysis steps. Using the Bowtie computer program, each HQ read set was aligned to the human genome (NCBI build GRCh37.68) and human transcriptome. [8,9] Reads that aligned (using Bowtie) to the human genome, human transcriptome, human and mouse ribosomes, or W-X174 (an internal sequencing control) were excluded from further analysis. Sequences that did not align to any of those databases were carried forward in the analysis as ''screened reads''.

Non-redundant Viral Database
The sets of screened reads were then aligned to sequences in a non-redundant viral database (NRVDB, http://fischer-lab.path. utah.edu/data/GBV-C/NR_ViroBank) using MegaBLAST. [10] The NRVDB was derived from 1,296,974 viral sequences in the GenBank database. It includes 579,282 unique viral sequence records of 31 to 1.2 million bp in length, representing 2480 different viral taxa. The use of this database reduces redundant hits resulting from overrepresented taxa such as HIV, Hepatitis C Virus and Influenza A Virus, which collectively comprise .50% of the total viral records within GenBank.

Determination of Hit-Rates and P-Values
Using MegaBLAST with a word size of 28, individual reads that aligned to NRVDB were considered hits. The normalized-hit-rate (NHR) of every sample (encephalitis and controls) to each viral taxon was calculated by dividing the number of sequences that aligned to the taxon by the number of screened reads obtained for the sample. To judge relative enrichment of virus-like sequences, the NHR of each individual encephalitis experimental sample was compared to the NHR distribution for the control samples, for every viral taxon, as previously described. [6] Using custom software written in the Python programming language with SciPy tools, the Z-Test was used to quantify the statistical significance of any viral-taxon overrepresentations in the encephalitis brain samples compared with controls. [11,12,13] Taxonomy-Based Bioinformatics Follow-Up Taxa with Bonferroni corrected p-values # 0.01 were analyzed further. MegaBLAST was used to align the screened reads to comprehensive sequence databases of the taxon of interest. Following alignment, contiguous sequences (contigs) were assembled using SSAKE from the reads that aligned to sequences in each taxon of interest. [14] All alignments and contigs were then manually examined to determine whether if they represented human sequence within the taxon-specific database. This determination was based on alignments to the NCBI NR database (MegaBLAST) and examination of the annotations of the GenBank records of the aligning taxon-specific sequences.

Virus-Specific Amplification
All primer sequences used in this study are given in Document S1. RNA and DNA were re-extracted from the encephalitis and control brain samples (Qiagen, Valencia, CA RNeasy Lipid Tissue and DNeasy Blood and Tissue kits) in preparation for VZV-and HSV-specific PCR and measles-specific RT-PCR. HSV and VZV PCR reactions were performed as previously described. [15,16] An additional set of HSV1 primers was designed directly from the RNA sequencing data. Reaction conditions were the same as previously described. [16] HSV1 strain 17 syn+ (originally obtained from Dr. James Hill, Louisiana State University) diluted to 10 3 plaque-forming units/ml was used as the positive HSV control. The VZV postive control material was from a VZV+ MeWo cell culture (kindly provided by Dr. Don Gilden, University of Colorado). The VZV control material was used undiluted. Negative control reactions substituted water for the nucleic acid extracts. Ethidium bromide stained 1.5% agarose gels were used to visualize the resulting PCR products.
Measles virus positive control material was RNA derived from the live-attenuated measles vaccine (MMR-II, Merck & Co, Whitehall Station, New Jersey). Measles virus RNA was extracted from one full dose of the vaccine (0.5 ml, ,1000 pfu; Qiagen RNeasy, Valencia, CA). A random double stranded cDNA amplicon library was generated using a modified (Document S1) Round A/B protocol. [17] Four ml of extracted measles RNA was used as the round A input and the resulting Round B library was used, undiluted, as the measles positive control. The negative control reaction substituted water for the nucleic acid extracts. The RT-PCR method of Rota et. al. was used to detect measles virus in the experimental and control specimens. [18] Ethidium bromide stained 1.5% agarose gels were used to visualize the resulting RT-PCR products. The PCR and RT-PCR products were purified and Sanger sequenced to confirm the identity of the amplicons (HSV1 or measles).

Deep Sequencing
Fifty to 90 million HQ 50 bp reads were obtained from each of the 7 encephalitis samples and 14 normal brain samples, representing RNA present in these brain specimens at the time of collection. Removing reads that aligned to the human genome, transcriptome (NCBI build GRCh37.68), or ribosomes resulted in 199,666 to 907,362 (mean 6 SD = 749,4436248,337) screened reads in the encephalitis samples and 216,651 to 2,342,726 (mean 6 SD = 944,6806679,117) in the control samples (detailed in Table 1). [19] Bioinformatic Analysis For each of the seven encephalitis samples, 2480 taxa were evaluated, providing 17,360 comparisons. A heat map with color intensity representing the negative log transformed and Bonferroni corrected p-values was prepared with Java Treeview (represented in Figure 1). [20] Z-test comparisons of the encephalitis and control brain samples revealed significant viral taxon enrichment in 170 taxon-sample pairs. Each significant pair was systematically evaluated by alignment to both a taxon-specific database and the human genome. A total of 134 sample-taxon pairs, were found to be the consequence of reads that aligned to the human genome, including human endogenous retroviruses, using the lower stringency alignment protocol. These sample-taxon pairs were excluded from further analysis. The remaining 36 taxon-sample pairs were distributed among the 7 encephalitis brain samples ( Table 2). These 36 taxon-sample pairs all had p-values ,10 28.5 after adjusting for multiple comparisons, indicating enrichment. These significantly enriched taxon-sample pairs represented several different viral families; the number of significant pairs is shown in parenthesis: Herpesviridae (17), Paramyxoviridae (10), Poxviridae (6), Hepeviridae (2) and Flaviviridae (1).

Taxon-Specific Follow-Up
Assembly of reads that aligned to the taxon-specific follow up databases resulted in apparently viral contigs ranging from 66 to 4019 bp long in 5 of the 7 encephalitis samples ( Table 2). These contigs were re-aligned to the taxon-specific database as well as the human genome with MegaBLAST.
Groups of closely related taxa were identified as significant using this method. For instance, sequence records from several paramyxovirus family members were significantly associated with samples CO-A and CO-B. Considering sequence homologies and human origin of the samples, multiple viral taxa (canine distemper virus, rinderpest virus, cetacean morbillivirus) were excluded from specific PCR follow-up, and the most closely related human virus (also significantly overrepresented) was used for PCR validation. Furthermore, each alignment was examined manually to determine if the aligning GI contained annotated human sequence, or if it was aligned to a human sequence not found in the canonical human genome and human transcriptome. These results and the resulting viral candidates chosen for specific PCR follow-up are shown in Table 2.

PCR Confirmation of Viral Sequence
HSV1 sequences were confirmed by Sanger sequencing of PCR products obtained from brain samples 710 and 4403, both from subjects with herpes encephalitis indicated in neuropathology reports ( Figure 2, Table 3). Samples Co-A and Co-B were from subjects with subacute sclerosing panencephalitis (SSPE), according to their associated pathology reports. The deep sequencing analysis indicated the presence of the measles virus (MV). This was confirmed by specific PCR (Figure 2). Furthermore, the depth of sequencing coverage in Co-A and Co-B was sufficient to identify mutations in the genomes of each MV strain known to be present in SSPE-causing isolates. [21] Deep sequencing did not identify any viral candidates in samples 1418 and 4471. Specific PCR and RT-PCR for HSV1, VZV, and MV produced no amplicons in these samples (Table 3).
In four of five cases, virus calls from the bioinformatic analysis of the deep sequencing data and the subsequent pathogen specific amplification were concordant with the prior clinical diagnoses. However, our metagenomic analysis did not reveal the presence of Viral taxon (species) hit rates from the NRVDB were compared between encephalitis brain samples (N = 7) and control brain samples (N = 14) using the Z-test. Displayed are viral taxa where corrected P,10 26 for at least one of the encephalitis samples. Reads aligning to comprehensive taxon-specific databases were examined to determine if they were of viral or human origin (see Methods). Hits to each taxon were assembled into contiguous sequences (contigs) using SSAKE. [14] The GenBank records with the greatest number of alignments were selected, and % GI covered is the amount of the GI covered at $ 1X. Taxa where contigs could be formed (i.e. HSV1 and measles) were subjected to further analysis by pathogen-specific amplification. primers used successfully on samples 710 and 4403 [16] failed to yield a PCR product with sample 924. Based on the deep sequencing contig alignments, a novel set of primers was designed. A product of the expected size (,800 bp, data not shown) was obtained, using methods previously described. [16] The sequence of the product was found to be 100% identical to the major capsid protein gene (UL19) of HSV1 McKrae. Thus, the HSV1 present in the brain of sample 924 may represent a strain with mutations present in region of the DNA polymerase gene amplified by the published diagnostic PCR primers. DNA extracted from each of the 14 control brain specimens were also interrogated with the set of specific HSV1 and VZV primers. No product was obtained for any of the 14 controls with these primer sets (data not shown). Likewise, RT-PCR interrogation of the 14 control specimens for MV yielded no amplicons.

Discussion
This study demonstrates the utility of using deep-sequencing to identify viral etiologies in encephalitis. As observed previously, validated PCR primers sometimes fail to amplify agents against which they have been validated. [5,22] Sample 924 was found to contain HSV1, yet diagnostic HSV primers failed to yield a product. PCR primers derived directly from the metagenomic sequence obtained via deep sequencing were eventually found to be effective. The point of these PCR experiments was simply to validate the deep sequencing results and is not seen as an advance in diagnostic PCR reagents for these viruses per se.
In four out of five cases, there was agreement between neuropathology reports and the deep sequencing results. Given the fact that no specific reagents were used in these neuropathological examinations, this may be considered remarkable agreement. However, the lack of specificity inherent in these examinations was illustrated by the discordant result obtained for sample 924, where VZV was suspected and HSV1 was detected by deep sequencing and validated by PCR.
In samples with high titers of virus, the coverage obtained from deep sequencing experiments is sometimes sufficient to elucidate many genomic features of the infectious agent. For example, in samples Co-A and Co-B, the signature genomic mutations associated with defective, SSPE-causing virus were easily observed.
Deep sequencing viral detection methods rely upon alignment of the experimental sequences to known viral sequences. This work used the viral nucleotide sequences in GenBank. Using this approach, identification of a previously unknown virus in a sample is possible only if one or more related viruses are present in the database. The nonredundant viral database used here is comprehensive by current (2013) standards, but will need to be updated as new records are added to GenBank.
Deep sequencing technologies are changing rapidly and the analysis of the deep sequencing data is becoming more efficient. As sequencing technologies improve and become more cost effective, their application in clinical diagnostics may become commonplace, especially for the identification of pathogens. [23,24,25] This study demonstrates the feasibility of using deep sequencing to identify viral causes of encephalitis.

Supporting Information
Document S1 Specifies the PCR primer sets used in this study. (DOCX)