Whole Genome Characterization, Phylogenetic and Genome Signature Analysis of Human Pandemic H1N1 Virus in Thailand, 2009–2012

Background Three waves of human pandemic influenza occurred in Thailand in 2009–2012. The genome signature features and evolution of pH1N1 need to be characterized to elucidate the aspects responsible for the multiple waves of pandemic. Methodology/Findings Forty whole genome sequences and 584 partial sequences of pH1N1 circulating in Thailand, divided into 1st, 2nd and 3rd wave and post-pandemic were characterized and 77 genome signatures were analyzed. Phylogenetic trees of concatenated whole genome and HA gene sequences were constructed calculating substitution rate and dN/dS of each gene. Phylogenetic analysis showed a distinct pattern of pH1N1 circulation in Thailand, with the first two isolates from May, 2009 belonging to clade 5 while clades 5, 6 and 7 co-circulated during the first wave of pH1N1 pandemic in Thailand. Clade 8 predominated during the second wave and different proportions of the pH1N1 viruses circulating during the third wave and post pandemic period belonged to clades 8, 11.1 and 11.2. The mutation analysis of pH1N1 revealed many adaptive mutations which have become the signature of each clade and may be responsible for the multiple pandemic waves in Thailand, especially with regard to clades 11.1 and 11.2 as evidenced with V731I, G154D of PB1 gene, PA I330V, HA A214T S160G and S202T. The substitution rate of pH1N1 in Thailand ranged from 2.53×10−3±0.02 (M2 genes) to 5.27×10−3±0.03 per site per year (NA gene). Conclusions All results suggested that this virus is still adaptive, maybe to evade the host's immune response and tends to remain in the human host although the dN/dS were under purifying selection in all 8 genes. Due to the gradual evolution of pH1N1 in Thailand, continuous monitoring is essential for evaluation and surveillance to be prepared for and able to control future influenza activities.


Introduction
Since the United States Center of Disease Control and Prevention (US-CDC) have launched the report on a confirmed case of human pandemic influenza virus (pH1N1) infection in a child in southern California on April 21, 2009, [1] this virus has spread on a global scale. The Center for Disease Control and Prevention provided an updated (September 11, 2009) situation report on the international cases stating that there were over 227,607 laboratory-confirmed cases of human pandemic influenza virus H1N1 infection with at least 3,205 deaths worldwide [2]. In Thailand, the first two cases of human pandemic influenza H1N1 infection were reported by the Bureau of Emerging Infectious Diseases, Department of Disease Control, Thai Ministry of Public Health on May 12, 2009 [3].This virus can be transmitted from person to person via droplet transmission [4]. Therefore, the virus can be spread easily in overcrowded areas. The symptoms associated with H1N1 infection are usually less severe than those of seasonal influenza A virus infection. Most patients experience only mild to moderate symptoms, such as high body temperature in conjunction with some respiratory symptoms such as cough, sore throat and headache. Some people may have severe symptoms like pneumonia and acute respiratory distress syndrome [5]. In addition, individuals with underlying complications such as heart disease, chronic lung disease, or diabetes potentially display more severe symptoms [6].
Many previous studies have provided genetic analysis of pH1N1 indicating that this virus is composed of NA (neuraminidase) and M (matrix) genes originating from the Eurasian swine lineage while the other six gene segments have been traced back to a triple-reassorted North American swine lineage. Specifically, three polymerase genes, PB2 (basic polymerase 2), PB1 (basic polymerase 1) and PA (acid polymerase) are combinations of the North American swine, avian and human H3N2 lineages while the HA (hemagglutinatin), NP (nucleoprotein) and NS (non-structural) genes belong to the classical swine lineage [7][8][9][10]. Moreover, several reports have identified the genome signatures and point mutations significant for improved fitness and transmission  [14]. However, monitoring adaptive mutations of pandemic influenza virus should be performed continuously to elucidate the evolutionary trend for vaccine design. As in our previous study, the trend of confirmed cases of pH1N1 infection demonstrated that there were 3 dominant waves of outbreaks of pH1N1 virus in Thailand from the first outbreak in 2009 until 2012, with the first peak from June to October 2009, the second wave from December 2009 to March 2010, and the third wave from July to November 2010. In this study, we performed whole genome characterization to describe the evolutionary trend of human pandemic influenza virus which has been circulating in Thailand during three waves and evaluated the potential positions of mutated residues and their contribution to virulence and pathogenesis of pH1N1 to provide vital information required for preparing control strategies, such as defining vaccine composition and efficiently predicting the potentially emerging strains in a future wave of outbreak.
Forty samples positive for pH1N1 virus were randomly selected for whole genome characterization in chronological order from May 2009 to April 2012 and 584 partial sequences were randomly selected for analysis of the 7 major mutations of 6 genes (92, 92, 92, 93, 93, and 122 sequences for PB2, PB1, HA, NA, M and NS respectively). Each isolate represented a different geographical location and disease severity. All the whole genome sequences and partially selected for 7 major mutations sequences were submitted to GenBank database. All accession numbers of whole genome sequences and partial genome sequences were listed in Table S2 and S3 respectively. The details of each patient selected for whole genome characterization were listed in Table 1.

Phylogenetic Analysis
Whole genome sequence analysis of 40 pH1N1 isolates in Thailand revealed different maximum Kimura distances between nucleotides ranging from 1.03% (M) to 1.85% (NS1) with percentages of amino acid divergence in each gene segment ranging from 1.26% (M) to 3.41 (HA) ( Table 2). Phylogenetic analysis of the 40 concatenated whole genome sequences was performed along with 250 concatenated genomes of global isolates from previous studies [15,16,17]. The bootstrap supported phylogenetic tree showed that pH1N1 isolates from three waves in Thailand clustered into 6 distinct clades, namely, clades 5, 6, 7, 8, 11.1 and 11.2 (Fig. 1). The phylogenetic tree of the full-length HA gene was constructed using all isolates from Thailand and 11 representative isolates selected from each HA clade viruses as references (Fig. 2). The results showed that the first two isolates

Substitution rates and selection pressure
The 40 whole genome sequences of pH1N1 virus circulating in Thailand during 2009-2012 were compared with the common reference sequence A/California/07/2009 from the GenBank database to calculate nucleotide substitution rates for each gene ( Table 2). The mean substitution rates per gene segment ranged from 2.53610 23 60.02 (for the M2 genes) to 5.  Table 2).

Discussion
Due to the lack of genetic reassortment and similar tree topologies of each gene segments, concatenated phylogenetic tree was justified to be used for evolutionary analysis of pH1N1 influenza virus to exaggerate phylogenetic resolution of this virus, as previous studies [15,16]. Phylogenetic analysis of the concatenated whole genome of pH1N1 showed absence of any distinct geographical distribution, but a pattern of chronological distribution. During 2009-2012, there were 3 waves of human pandemic influenza virus in Thailand, which has a climate of tropical savannah, with summer from mid February to mid May, rainy season from mid May to October and winter from November until mid February [20]. Based on the similar evolutionary trend of the whole genome tree and HA tree, pH1N1 isolates circulating in the rainy season of 2009 belonged to clades 5, 6 (which emerged from a common node) and clade 7 while those circulating from winter to summer 2010 were for the most part in clade 8. However, the results indicated that the pH1N1 virus has distinctively adapted and shared many substitutions. This phenomenon may indicate the founder effect during the summer and rainy season of 2010. Many factors could be responsible for this profound genetic distinction, such as the beginning of the school semester, the high season of tourism in Thailand, or quantity of antiviral drug use in Thai population. Although there were different sublineages of pH1N1 virus, the low maximum distance indicated that new variants were still closely related to the prior sublineages and the pH1N1 vaccine could be effective in this region.

Substitution rate and selection pressure
In Thailand, all pH1N1 genes were under purifying selection with 0.089 (NP gene)#v#0.634(M2 gene), indicating a likelihood of 9-63% non-synonymous mutation compared to synonymous mutation [21]. However, these evolutionary ratios were relatively high compared to those in human seasonal H1N1 and H3N2 [22], archived as evolutionary stasis, based on the very low rates of amino acid change (v) in each gene. [23]. The higher selective pressure and substitution rates in some genes, such as HA, NA M2 and NS could be described by the change of amino acids in these genes of pH1N1 virus has been adapted to a new host species or avoid a host's immune response [24]. Therefore, it seems plausible to suggest that this pH1N1 virus will tend to stay and adapt for a longer period in the human host before it will become seasonal influenza and thus, more static. The evolutionary rates should be continuously monitored and compared to the epidemiological study. The other possible aftermaths of year-to-year divergence in host species which should be considered are niche differentiation and character displacement. This divergence described an accumulation of antigenic drift of influenza virus overtime where virus changes it pathogenic epitopes to escape host immune system due to host immunological pressure. This phenomenon resulted in viral persistence in host population or the alteration to expand viral niche which can create the resource partitioning, such as changing tissue tropism and interspecies transmission.

Genome signature analysis and evolutionary trend
The pH1N1 influenza viruses circulating in Thailand still have typical hot spots [13]. Most were susceptible to oseltamivir (NA 275H) and zanamivir (NA 136S) but resistant to adamantane antiviral drugs. All isolates were able to synthesize a truncated PB1-F2 protein and seemed to have a normal rate of viral replication and transmission to mammalian host cells (PB2 627E). There was no significant difference of the mutation pattern of pH1N1 isolated from mild and severe cases. However, many amino acid signatures were found in different clades of virus. Clade 7, 8 and 11 pH1N1 isolates showed the mutation S220T of the HA gene. The mutation V731I, G154D of the PB1 gene and I330V mutation in the PA gene were predominantly found in pH1N1 isolates from clade 11.2. The A214T mutation in the HA gene was mainly found in isolates from clade 11, both 11.1 and 11.2. Almost all isolates circulating in clade 11 harbored the mutations S160G and S202T of the HA gene. A previous study has identified the genome signature of clade 8.1 as E391K of the HA gene, but in this study, this mutation was also present in pH1N1 isolates from clades 11.1 and 11.2. In this study, many evolutionary mutation trends were identified and have to be taken into consideration. The most conserved gene in genome signature analysis was M2 (Table 4 and Fig. 3). Considering the point mutations at S202T and S220T, highlighted as positions at one of the antigenic sites Ca [17,25], it is unclear whether these amino acid signatures will effect a change in antigenic domain because these positions are not exposed to the surface and there was no other mutation of the amino acid signature at the Ca site. Likewise, analysis of HA diversity showed that the Thailand strain of pH1N1 still resembled the vaccine strain (CA/07/2009). Hence, the current vaccine is still effective to control pH1N1 infection in Thailand.

Conclusion
Based on whole genome characterization, phylogenetic and genome signature analysis, it can be concluded that the circulation of pH1N1 influenza virus in Thailand during 2009-2012 has facilitated the virus establishing itself as a new seasonal influenza virus displaying gradual mutation and low pathogenicity. However, considering the health risk for patients with underlying conditions and pH1N1's capability of re-assortment, studies employing phylogenetic analysis and epidemiological surveillance have to be conducted to continuously monitor the evolutionary trend which will assist in preparations for and control of any future outbreak.

Ethical Consideration
The study was conducted on positive pH1N1 specimens collected from routine service and stored as anonymous. Permission had been granted by the Director of King Chulalongkorn Memorial hospital. Patient identifiers including personal information (name, address) and hospitalization number were removed from these samples to protect patient confidentiality and neither did they appear in any part of document in this study. The research protocol was approved by the Institutional Review Board (IRB number 284/54), Faculty of Medicine, Chulalongkorn University. IRB waived the need for consent because the samples were anonymous.

Sample Collection
From May 4, 2009 until April 29, 2012, 11,753 nasal and throat swab samples were collected from 3 provinces, Bangkok, representing the central part of Thailand, Suraj Thani, representing the south and Khon Kaen, the north-east of Thailand. Samples were collected in viral transport media (VTM) from patients diagnosed with influenza-like illness (onset of hightemperature, fever, sore throat, or cough, watery eyes, body ache/headache and difficulty to breathe). RNA would be extracted from samples followed by real-time RT-PCR specific to influenza A, B and H1 (pandemic strain), H3 and H5 subtyping. [26,27] The 40 pH1N1-positive specimens were selected randomly and chronologically (on average, 1-2 samples per month) from samples with a Ct value by real-time RT-PCR below 25 for primers specific for the M gene of influenza A and HA of pH1N1 influenza virus. In addition, approximate 200 pH1N1 positive specimens obtained in the course of the 3 waves of outbreak were also selected for significant specific mutation analyses. All specimens were kept at 270uC until RNA extraction.

Viral RNA Extraction
RNA was extracted by using a commercially available Viral Nucleic Acid Extraction Kit (RBC Bioscience Co, Taiwan) according to the manufacturer's protocol. All experiments were performed in a Bio-safety Level 2 plus environment.

Detection and Subtyping of Influenza Virus by Real-Time RT PCR
One-step multiplex real-time RT PCR assays were performed to detect and subtype influenza virus based on Taqman probes as previously described [26][27][28]. Each sample was subjected to 7 reactions testing for the GAPDH gene (internal control), M (matrix) gene of influenza A and B to determine type of influenza virus, HA (Hemagglutinin gene) for seasonal H1 and H3, H5 avian influenza virus and H1 of human pandemic influenza virus.

Conventional PCR and Sequencing
Upon real-time RT-PCR, 40 samples positive for human pandemic influenza virus collected from three waves of outbreak in Thailand were chosen for whole genome characterization. Another 200 pH1N1 positive samples were selected to examine 6 genes for 7 point mutations defining the molecular characteristics of pH1N1 as previously described [13]. To begin with, viral cDNA was synthesized by using the M-MLV reverse transcription system (Promega, Madison, WI) and universal primers as described by Hoffman et al. [28] at 37uC for 3 hours. In brief, 1 ml of cDNA was added to the reaction mixture containing 15 ml of PerfectTaq Plus Mastermix (5Prime, Hamburg, Germany) with 0.5 ml of each forward and reverse primers, MgCl 2 and nuclease free water to a final volume of 25 ml. PCR reactions were performed in a thermal cycler (Eppendorf, Germany) under the following conditions: Initial denaturation at 94uC for 3 minutes, followed by 40 amplification cycles comprising denaturation for 30 seconds, primer annealing at 50uC for 30 seconds (for PB1, PA, NP genes) and 55uC for 30 seconds (for PB2, HA, NA, M and NS genes) followed by extension at 72uC for 90 seconds, to be concluded by a final extension step at 72uC for 10 minutes. Upon electrophoresis in a 2% agarose gel, the amplified products were visualized on a UV trans-illuminator after staining with ethidium bromide solution. PCR products were purified using the HiYield Gel DNA Fragment Extraction kit (RBC Bioscience Co, Taiwan). DNA sequencing was performed by FirstBASE Laboratories Sdn Bhd (Selangor Darul Ehsan, Malaysia). The cDNA sequencing was performed by FirstBase Co.Ltd. (Kuala Lumpur, Malaysia). Primers used for whole genome characterization and hot spot analysis are shown in Table S1.

Sequence analysis and Phylogenetic Tree Construction
DNAStar version 6 and Clustalw 1.83 were used for multiple sequence alignment. Bioedit7.0 was used to analyze the sequence alignment. The trimmed whole genome sequences were manually concatenated in sequence from segment 1-8 (PB2-PB1-PA-HA-NP-NA-M-NS) prior to phylogenetic tree construction. Reference strains for phylogenetic tree construction were retrieved from the NCBI influenza virus database based on a previous study [15][16][17]. MEGA 5.0 was used for the phylogenetic tree construction by applying the neighbor-joining method with Kimura's two-parameter distance model and 1000 bootstrap replicates [29]. BioEdit program version 7.0.9 was used for genomic signature analysis. The graphical presentation of relative residue abundance of genomic signatures within each gene was depicted by using WebLogo [30].

Estimation of Nucleotide Substitution Rate and Measurement of Selection Pressure
The rates of evolutionary change in each gene (nucleotide substitution per site per year) were provided by using BEAST program version 1.4.6 [31]. The strict clock model was employed to provide the rate of nucleotide substitution. To evaluate the selection pressure on each gene segment of pH1N1 virus in Thailand, the mean values of non-synonymous substitution (dN) and synonymous substitution (dS) per site were calculated by using the single likelihood ancestor counting method (SLAC) based on Neighbor-Joining trees under the substitution model selected by the Datamonkey website (http://www.data-monkey.org).