Sensitive Detection and Simultaneous Discrimination of Influenza A and B Viruses in Nasopharyngeal Swabs in a Single Assay Using Next-Generation Sequencing-Based Diagnostics

Reassortment of 2009 (H1N1) pandemic influenza virus (pdH1N1) with other strains may produce more virulent and pathogenic forms, detection and their rapid characterization is critical. In this study, we reported a “one-size-fits-all” approach using a next-generation sequencing (NGS) detection platform to extensively identify influenza viral genomes for diagnosis and determination of novel virulence and drug resistance markers. A de novo module and other bioinformatics tools were used to generate contiguous sequence and identify influenza types/subtypes. Of 162 archived influenza-positive patient specimens, 161(99.4%) were positive for either influenza A or B viruses determined using the NGS assay. Among these, 135(83.3%) were A(H3N2), 14(8.6%) were A(pdH1N1), 2(1.2%) were A(H3N2) and A(pdH1N1) virus co-infections and 10(6.2%) were influenza B viruses. Of the influenza A viruses, 66.7% of A(H3N2) viruses tested had a E627K mutation in the PB2 protein, and 87.8% of the influenza A viruses contained the S31N mutation in the M2 protein. Further studies demonstrated that the NGS assay could achieve a high level of sensitivity and reveal adequate genetic information for final laboratory confirmation. The current diagnostic platform allows for simultaneous identification of a broad range of influenza viruses, monitoring emerging influenza strains with pandemic potential that facilitating diagnostics and antiviral treatment in the clinical setting and protection of the public health.


Introduction
Influenza viruses belonging to the family Orthomyxoviridae, are classified into influenzavirus A, B, and C. While influenza A and C viruses infect humans and many other species, influenza B viruses only infect humans and seals [1,2,3]. The Influenza virus genome consists of eight negative single-stranded RNA segments encoding eleven proteins: hemagglutinin (HA), neuraminidase (NA), nucleoprotein (NP), polymerase proteins (PB1, PB2, and PA), matrix proteins (M1 and M2), and nonstructural proteins (NS1 and NS2) [4]. Based on reactivity with surface glycoproteins HA and NA, influenza A viruses are classified into eighteen HA subtypes (H1-H18) and eleven NA subtypes (N1-N11) with a total of 144 subtypes possible, while influenza B virus has no subtypes [4,5,6,7,8]. New influenza viral variants emerge either due to mutations in the virus surface glycoproteins (antigenic drift) or reassortment between viral gene segments from different strains (antigenic shift). While, seasonal influenza outbreaks are thought to arise primarily due to antigenic drift, pandemics are thought to result from complex reassortment events among swine, human, and avian reservoirs. Some of them have been reported to directly infect humans, or transmit avian influenza A viruses from domestic poultry to human [4,9,10]. In addition, influenza viruses normally circulate in pigs are called "variant" viruses when found in people. There was a large outbreak of influenza A(H3N2) variant virus (H3N2v) during the 2010-13 seasons in U.S. and a highly pathogenic avian influenza (HPAI) H5N1 virus was identified in early 2014 in North America [10]. According to a USDA report in January 21 2015, HPAI H5N1, H5N2, and H5N8 viruses, which can be potentially transmitted to domestic poultry population leading to an outbreak [11], have been detected in U.S. wild birds in the winter of 2015 [12]. Identification of H7N9, H10N8 and other viruses in humans outside the U.S. has also been reported [9,13]. Reassortment of a 2009 H1N1 pandemic influenza A virus (pdH1N1) and influenza A(H3N2) variant virus (H3N2v) with other circulating seasonal strains in the U.S. can produce virus variants with transmissibility and altered virulence for humans [14,15]. Multiple influenza types/subtypes usually circulate during each season, for example, pdH1N1 were the predominant viruses circulating in December 2013 with fewer influenza B viruses, but influenza B virus became the predominant circulating virus by late March 2014, while seasonal A(H3N2), novel H3N2v, and mixed A/B viruses infections were also identified during the season in the U.S. [16].
It was reported that rapid influenza diagnostic tests showed low and variable sensitivity [17,18] depending on the influenza A subtype [19], while some FDA-cleared diagnostic tests did not detect the novel H3N2v viruses [20]. Diagnosis of influenza infection from an unknown risk respiratory specimen involves multiple assays [17,21]. Identification of the first case of A(H5N1) virus infection in North America involved multiple laboratories [10]. Current methods that evaluate antiviral resistance require two independent tests for the S31N substitutions conferring amantadine-resistance [22,23] and the H274Y substitution conferring oseltamivir-resistance [22,24]. Conventional methods used by diagnostic or reference laboratories for characterization of novel influenza viruses are ineffective since the tests are based on a small conserved region of each HA, NA, and M genes for detection and subtyping, requires the use of many sets of primers, steps and multiple reactions. In addition, this approach may be ineffective for newly emerging influenza subtypes due to mismatch of PCR primers and culture-based genome sequencing which requires high biosafety levels, preventing its application in clinics. Next-generation sequencing (NGS) analysis has been previously reported to greatly improve manipulation procedure for identification of influenza virus strains, these studies focused on examining influenza A [25,26,27] [25], and multiple [28] primers. None of these studies reported the use of a paired set of degenerate universal primers to simultaneously identify whole-genome sequences of influenza A and B viruses from a large set of clinical specimens using an NGS assay. We previously reported development of a genomic nanomicroarray and NGS combination approach for influenza virus screening and laboratory confirmation [29]. Here, we extended the NGS approach to potential application for influenza virus surveillance, whole-genome characterization of putative antiviral resistance markers and virulence factors that may confer high virulence in human hosts. We developed a novel universal RT-PCR-NGS platform by testing 162 clinical influenza-positive specimens and demonstrated a "one-size-fits-all" approach to simultaneously identify unknown influenza infections and co-infections in a single tube reaction per sample. The analytical sensitivity of the NGS platform was also evaluated in our studies.

Viruses, Clinical Specimens and Tests
Pre-titrated stocks of reference strains of influenza viruses were procured from Dr. Stephen Lindstrom (Centers for Disease Control and Prevention, CDC, Atlanta, GA) or cultured in embryonated chicken eggs at US Food and Drug Administration (FDA). Clinical nasopharyngeal swab specimens were collected from patients with symptoms of influenza-like illness, submitted during the 2010-11 and 2012-13 influenza seasons to the Clinical Virology Laboratory (CVL), Yale New Haven Hospital, New Haven, CT. These specimens were tested as requested by the patients' physicians using a direct fluorescent antibody (DFA) immunoassay (Simul-Fluor Respiratory Virus Screen Reagent, Millipore, Billerica, MA) and/or by quantitative RT-PCR (RT-qPCR), as previously described [21]. A total of 162 nasopharyngeal swab specimens that tested as influenza A-and/or B-positive were de-identified and sent to the Laboratory of Molecular Virology (LMV) at FDA (Institutional Review Board (IRB) approval# Research Involving Human Subjects Committee (RIHSC):13-048B) for sequencing (Fig 1). All participants, or their parents or guardians, provided written informed consent when their specimens collected in Yale New Haven Hospital.

RNA Extraction and Universal RT-PCR
The RNA extraction, some primers, and Reverse Transcription-PCR (RT-PCR) procedures were described previously [29] and in S1 File. According to previous reports and our testing algorithm using multiple alternative options [29,30,31,32], a new set of degenerate universal primers for RT (uniflu, 5'-IAGCARAAGC -3') and fusion primers for PCR (unifluF, 5'-ACGACGGGCGACAIAGCARAAGC-3'; unifluR, 5'-ACGACGGGCGACAAGTAGWAACA-3') were designed and tested. The 13bp flanking sequences italicized were added at the 5' end of each PCR primer to enhance the annealing temperature and achieve high fidelity and yield in PCR amplification. These primers target highly conserved regions at the 5'-and 3'-terminus of each segment of influenza virus to amplify the whole-genome concomitantly, and have coverage of sequence variants for influenza A, B, and C viruses to achieve improvements in analytical sensitivity and inclusivity.

MiSeq Sequencing
Sample preparation and bioinformatics data analysis was performed as described [29]. Briefly, the concentration of mega-amplicons were measured by using the Qubit dsDNA BR Assay System (Covaris, Woburn, MA, USA), and 1 ng of DNA product was processed for NGS sample preparation by using a Illumina Nextera XT DNA Sample Preparation Kit. Mega-amplicons for each specimen were internally marked with dual-index primers (S2 Fig). Up to 96 specimens were barcoded and run in one lane of the sequencer. NGS was performed using a MiSeq v2 kit (500 cycles) to produce 2x250 paired-end reads (Illumina, San Diego, CA). After automated cluster generation in MiSeq, the sequencing was processed and genomic sequence reads obtained.

De novo Assembly Strategy
Sequencing reads of approximately 250bp in length were trimmed and the sequence data verified using FastQC software prior to de novo assembly. Contigs for individual gene segments  from influenza viruses were generated using the de novo assembly module on the CLC bio software (v6.0.6, CLC bio, Cambridge, MA). The parameter for mapping reads back to contiguous was set on as similarity fraction 0.9; length fraction 0.5; mismatch cost 2; insertion cost 3 and deletion cost 3. For characterization of influenza whole-genomes with high confidence coverage, minimum contig length was set at 800bp to assemble the consensus sequences and minimum coverage was 1000 reads [33]. For diagnosis of influenza virus infections, minimum contig length used was 200bp and higher lengths for 2x reads per base in de novo assembly (low confidence coverage, Fig 1 and S3 Fig) [34].

Bioinformatics Data Analysis
A master file containing all unique contigous sequences of each mega-amplicon was generated and used to perform an all-by-all blast search in Influenza Research Database (IRD, www. fludb.org), the Global Initiative on Sharing All Influenza Data (GISAID, platform.gisaid.org) and NCBI database. The match with the highest score was retained to identify the specific influenza genome. Assembled sequences were aligned by the CLUSTAL W program, and phylogenetic analysis was performed using MEGA 5 and the neighbor-joining method [35]. The amino acid sequence for each segment identified was generated and aligned using the Vector NTI Advance package (vÁ11Á5Á2, Life Technologies, Grand Island, NY). The serotype and genotype according to the genetic lineage of influenza A viruses were determined using Flu-Genome [36].

Coverage of Depth and Breadth
The depth of sequence coverage (DOC) was calculated using formulae LN/G, where L is the read length, N is the number of reads and G is the theoretical genome length. The breadth of sequence coverage (BOC) was calculated as percentage of actual testing contig length (!1000 reads or !30x

Sequence Accession Numbers
The newly reported influenza genome sequences are available in GenBank under the following accession numbers: KM654599-KM654931, and KM654933-KM654980.

NGS Detects Influenza Whole-Genome Segments
To determine the sensitivity of NGS detection, ten-fold serial dilutions of H5N1 viral RNA (1.8x10 9 TCID 50 /mL) were first produced followed by RT-PCR exponential amplification for whole-genome segments. As shown in Fig 2A, 10 −7 dilution of A(H5N1) viral RNA (1.8x10 3 TCID 50 /mL) was detected in the RT-PCR assay shown as multiple faint visible bands. A total of ten PCR mega-amplicons from the dilutions were next tested using MiSeq, multiple contigs were finally automatically generated for each mega-amplicons, and all contig sequences were correctly identified as genome of A/Viet Nam/1203/2004(H5N1) virus. The overall average of DOC was 4,715. The average of BOC for each identified contig was 96% (!30x depth) indicating that the near full-length genome sequence was covered ( Table 1). Analysis of reads at each dilution level showed that the whole-genome DOC decreases from 4,073 in 10 −1 dilution to 1,167 in 10 −9 dilution (Fig 2B), and the whole-genome BOC decreases from 98% to 11% at the 10 −9 dilution. We observed that the NP gene (segment 5) represented 99% of BOC at all detected titration points with the lowest detectable level at 10 −9 dilution (1.8x10 1 TCID 50 /mL) with high confidence coverage ( Fig 2C).

Molecular Confirmation of Simultaneous Detection of Influenza A and B Viruses in a Single Specimen/Test
Since some influenza related hospitalized cases were found to be associated with mixed A and B virus co-infections [1,16], analytical inclusivity studies were performed next by using a new set of degenerate universal primers to achieve improved coverage for varieties of both influenza A and B viruses. We tested A/Fujian Gulou/1896/09(H1N1), A/Perth/16/2009(H3N2), and B/ Wisconsin/01/2010 reference viruses. Total RNA was extracted from spiked samples containing either individual or a mixture of two strains 1:1, and RT-PCR was performed. After the NGS test and following de novo assembly, eight contigs were generated for A(H1N1), A (H3N2), and B viruses, respectively. Fifteen and sixteen contigs were generated from the mixed A(H1N1), and A(H3N2) with B viruses supported by high confidence coverage, respectively (S1 Table). A validation assessment for these contigs from mixed viruses confirmed that the genome sequences are identical to the gene segments of the A(H1N1), A(H3N2), or B viruses, respectively. This is the first report of analytical validation of degenerate universal primers that can simultaneously amplify influenza A and B viruses for NGS identification and discrimination of influenza viruses in a single sample/test.

Detection of Influenza Virus in Nasopharyngeal Swab Specimens
By using the aforementioned strategy, we next tested 162 clinical nasopharyngeal swab specimens (Fig 1) Table 2 to represent the types/subtypes for influenza viruses. Results of NGS data analysis representing full-length sequences were listed in S2 Table. The average whole-genome DOC for these viruses was between 0.1x10 3 and 13.7x10 3 , and average wholegenome BOC was between 13% and 100% supporting high confidence coverage.

NGS Assay Shows High Detection Sensitivity
When we evaluated analytical sensitivity of the NGS method and compared it with other clinical tests (S3 Table), the criteria of minimum coverage length of 200bp and more than 2 reads was used to assemble contigs mapping to the influenza genome. Both M (278bp/810 reads) and NS (716bp/16 reads) genes were identified at the 10 −10 dilution level in the reference H5N1 strain indicating highly sensitive detection. Likewise, we further re-analyzed ten universal RT-PCR negative specimens and 19 universal RT-PCR positive but NGS negative specimens when using the criteria of minimum coverage length of 800bp and 1000 reads (Fig 1). The NGS data showed that multiple partial influenza genomic sequences were identified in each of 28 specimens representing different influenza genomic profiles (S3 Table) and confirmed the presence of 24 A(H3N2) and two A(pdH1N1) infections. There were two cases identified as A (H3N2) and A(pdH1N1) viruses co-infections (Flu087 and Flu185). The specimen Flu175 contained full-length segments for M (1033bp/3033 reads) and NS genes (920bp/1203 reads), while Flu177 contained full-length gene sequence for the M gene (1031bp/3645 reads).

Overall Reporting of Influenza A and B Virus Infections
A total of 27.4 million sequence reads representing a total length of 1.73 million base-pairs were obtained from 162 clinical specimens that were processed (S4 Table). Of these, approximately 26 million reads (95%) matched the influenza virus genome representing 78% of total read length with an average of 30,000 reads per contig from a total of 867 contigs. 99.4% (161/ 162) of specimens were identified to contain influenza genomes and their types/subtypes were determined. Among the influenza positive specimens, 83.3% (135/162) were seasonal A (H3N2), 8.6% (14/162) were A(pdH1N1), and 6.2% (10/162) were influenza B viruses. Two cases (1.2%) were identified as A(H3N2) and A(pdH1N1) virus co-infections, and influenza genome was not found in specimen Flu199. Among 123 influenza A viruses with wholegenome sequence data supported by high confidence reads (S5 Table), we obtained 83.6% (823/8 x123) contigs in total. The highest DOC and BOCs were observed for the HA, NP, NA, M, and NS genes with each demonstrating 99-100% coverage. The minimum average DOC observed for the PB1 gene was 1,467. The frequency of being detected by the RT-PCR-NGS assay was highest for gene segment M, followed by NS, NP, PA, NA, HA, PB1, and PB2. Genotype results and representative phylogenetic trees for HA, NA, and M genes are described in S1 Fig. The details of key signature amino-acid mutations from 47 selected influenza A viruses are summarized in Table 3, The molecular biomarkers for pandemic risk, drug-resistant-, transmission-associated signature mutations, and virulence factors were assessed and further described in the discussion section. We concluded that we have simultaneously and correctly identified genetically distinct lineages of influenza A(H3N2 and pdH1N1) and B viruses in clinical specimens.

Discussion
We performed NGS as part of a new detection platform for diagnosis of influenza virus infections using clinical specimens. A method for identification and discrimination of functional significance of diverse influenza viruses has been established. We report (i) multiplexed  Table 3. Comparison of the amino-acid substitutions in the proteins of 47 influenza A (H3N2, and pdH1N1) viruses. Subtype  PB2  PB1  PA  NP  NA  M2  NS1   199  475  627  327  100  356  409  100  119  152  11  20  31  92 Flu042 ) detection and simultaneous discrimination of influenza virus infections in nasopharyngeal swabs and identification of genetically distinct lineages of influenza A(H3N2 and pdH1N1) and B viruses in a single sequencing run, (ii) identification of influenza virus co-infections in a single specimen/test, (iii) development of new degenerate universal primer pairs for identification of influenza viruses and demonstration of highly sensitive detection of influenza virus infections using the NGS assay. We proposed a "one-size-fits-all" approach using an NGS diagnostic platform for extensive identification of a broad range of influenza virus infections including co-infections, by revealing their comprehensive genetic context and providing matrix reporting information for final sequence-based confirmation. A total of 162 influenza-positive de-identified nasopharyngeal swab specimens collected from patients in the Connecticut were tested in a blinded fashion using the RT-PCR-NGS platform. No reassortants were found but multiple mutations were detected in the specimens tested. Sequence analysis of 123 influenza A viruses revealed that 66.7%(82/123) of A(H3N2) viruses had a single signature mutation of E627K in the PB2 protein, and 88%(108/123) of influenza A(H3N2 and pdH1N1) viruses contained the S31N mutation in the M2 protein. A mutation of PB2 E627K has been reported to confer high virulence to the virus by enhancing replication efficiency, and increasing polymerase activity and disease severity of avian influenza viruses in mammals [38]. The S31N mutation in the transmembrane region of the M2 protein confers resistance to amantadine [9,39]. The emergence of E627K(PB2) and S31N(M2) mutations raise concerns of increased human disease severity. Influenza viruses contain multi-basic amino-acid motif at the proteolytic cleavage site of HAs, which is associated with broad tissue tropism and organ dissemination and determines viral pathogenicity [40]. Investigation of HAs in tested specimens showed a single arginine that appeared at the site (PSIQSR#G) in A (pdH1N1) and (PEKQTR#G) in A(H3N2) viruses, suggesting that these viruses belonged to a low pathogenic strain that poses less risk to humans.

Strain
We aligned all protein sequences of 47 influenza A viruses and compared their sequences with those in humans that have been reported as amino-acid signatures in past influenza outbreaks ( Table 3). The emergence of zanamivir-and oseltamivir-resistant viruses is facilitated by mutations in the NA protein, which provides a major target for developing anti-influenza drugs [41,42]. Sequence analysis revealed that the E119V signature mutations in these specimens may be susceptible to oseltamivir in A(H3N2) but not in A(pdH1N1) viruses. Three signature residues in the PA protein, PA-100A, PA-356A, and PA-409N, were reported in most H1N1, H2N2, H3N2 or H5N1 strains and could cause pandemics [43]. A mammalian-adapting V100A substitution was identified in most A(H3N2) but not in A(pdH1N1) viruses, and S409N was identified in all A(pdH1N1) and some A(H3N2) viruses, indicating their pandemic potential. NP protein plays an important role in viral RNA replication and cross-species  PB2  PB1  PA  NP  NA  M2  NS1   199  475  627  327  100  356  409  100  119  152  11  20  31  92 Flu189 The reference sequences are based on the inspection of amino-acids found in the majority of human pdH1N1, H3N2, and H5N1 viruses. The humanspecific substitution residues were highlighted in bold and italic. A dashed line indicates that a space was inserted in the sequence to preserve the alignment.  [25,26,27,28]. However, the sensitivity and simultaneous detection of various influenza viruses from a large set of clinical specimens in a single test has not been reported. Our analytical sensitivity study demonstrated that NGS can identify the full-length NP gene of influenza at a 10 −9 dilution level (1.8x10 1 TCID 50 /mL) in the characterization study, and detect partial sequences of M and NS genes at 10 −10 dilution level in the analytical study (S3 Table). When we performed universal RT-PCR-NGS assays in the analytical study using specimens with low virus concentrations [21], we further confirmed that multiple influenza genomic sequences were identified in each of the 28 human specimens demonstrating that NGS based diagnostics can achieve highly sensitive detection equivalent to the clinical RT-qPCR test at low virus concentrations. To determine whether the current NGS platform could detect influenza virus in different vertebrate species in addition to humans, we tested chicken specimens obtained from Egypt and identified A(H5N1) [48] and A(H9N2) (unpublished data) subtypes. Influenza C virus infection is not routinely tested in clinical settings, therefore, specimens and reference materials were not evaluated in current study.
Since the degenerate universal primers reported here target conserved sequences for influenza A, B and C viruses, it is anticipated that the current NGS based detection platform would enable detection and accurate characterization of influenza infection including novel, emerging strains and reassortants arising during outbreaks. This platform allows multiplex identification and simultaneous discrimination of functional significance in a single test and provides the whole spectrum of the influenza genome. The method described here will have significant implications from the perspective of screening, monitoring, drug resistance, and vaccine development. These features of the assay will facilitate diagnostics and antiviral treatment in the clinical setting and enhance protection of public health. With the aid of bioinformatics, mathematics and epidemiologic studies, a typical genetic matrix composition from a known, and a novel emerging influenza infection can be determined. Development of an automated assembly and analysis pipeline will help uncover new levels of innovation and efficiency of transferring raw reads to specific genomic identification which will facilitate future use of an NGS diagnostics platform for public health surveillance and in clinical microbiology laboratory [49].  Table 1 and S3 Table (provided in response to reviewers queries). (TIFF) S1 File. Supporting Information. (PDF) S1 Table. NGS discrimination of influenza A and B viruses in a single specimen using a set of degenerate universal primers in RT-PCR amplification. (DOC) S2