Molecular Evolution of Human H1N1 and H3N2 Influenza A Virus in Thailand, 2006–2009

Background Annual seasonal influenza outbreaks are associated with high morbidity and mortality. Objective To index and document evolutionary changes among influenza A H1N1 and H3N2 viruses isolated from Thailand during 2006–2009, using complete genome sequences. Methods Nasopharyngeal aspirates were collected from patients diagnosed with respiratory illness in Thailand during 2006–2009. All samples were screened for Influenza A virus. A total of 13 H1N1 and 21 H3N2 were confirmed and whole genome sequenced for the evolutionary analysis using standard phylogenetic approaches. Results Phylogenetic analysis of HA revealed a clear diversification of seasonal from vaccine strain lineages. H3N2 seasonal clusters were closely related to the WHO recommended vaccine strains in each season. Most H1N1 isolates could be differentiated into 3 lineages. The A/Brisbane/59/2007 lineage, a vaccine strain for H1N1 since 2008, is closely related with the H1N1 subtypes circulating in 2009. HA sequences were conserved at the receptor-binding site. Amino acid variations in the antigenic site resulted in a possible N-linked glycosylation motif. Recent H3N2 isolates had higher genetic variations compared to H1N1 isolates. Most substitutions in the NP protein were clustered in the T-cell recognition domains. Conclusion In this study we performed evolutionary genetic analysis of influenza A viruses in Thailand between 2006–2009. Although the current vaccine strain is efficient for controlling the circulating outbreak subtypes, surveillance is necessary to provide unambiguous information on emergent viruses. In summary, the findings of this study contribute the understanding of evolution in influenza A viruses in humans and is useful for routine surveillance and vaccine strain selection.


Introduction
Influenza A viruses are negative-strand RNA viruses of the family Orthomyxoviridae which can be divided into subtypes based on the antigenic properties of the surface glycoproteins, hemagglutinin (HA) and neuraminidase (NA) [1]. Every year influenza A virus causes infection with varying severity depending on host acquired immunity against that particular virus strain. Influenza epidemics are associated with above average annual mortality levels, causing 10,000 to 15,000 deaths. Occasionally global pandemics of influenza occur, infecting 20% to 40% of the population in a single year and dramatically raising death rates. Three such major global pandemics caused by novel antigenic variants of influenza viruses have affected the human population: in 1918 (H1N1 subtype), in 1957 (H2N2), and in 1968 (H3N2) resulting in millions of deaths [2,3]. The recent circulation of highly pathogenic avian H5N1 viruses in Asia since 2003 has caused human fatalities [4]. The novel H1N1 influenza virus that emerged in humans in Mexico in early 2009 and spread globally in the human population has been declared a pandemic strain. Two forms of the 16 possible HA subtypes (H1 and H3), and two of the nine possible NA subtypes (N1 and N2) are circulating in man. H3N2 and H1N1 influenza A viruses have co-circulated in the human population since the re-emergence of H1N1 in 1977 increasing the possibility of genetic reassortment. The prevalence of different subtype combinations may vary from season to season. Subtype H3N2 has constituted the predominant influenza A strain during the last 20 years, with the exception of the 1988-1989 and 2000-2001 seasons when H1N1 infections were more prevalent [5]. The surface glycoprotein HA is under selective pressure to undergo mutation in order to evade the host's immune system [6]. Antibodies against the HA protein inhibit receptor binding and are very effective at preventing re-infection with the same strain. However, HA can change to evade previously acquired immunity either by antigenic drift, whereby mutations of the currently circulating HA gene disrupt antibody binding, or by antigenic shift, with the virus acquiring an HA of a new subtype by reassortment of one or more gene segments. The WHO has published recommendations on the composition of influenza vaccine for the northern and southern hemispheres. For the northern hemisphere, the WHO has issued the recommendation for strains to be included in the trivalent vaccine for the next season based on epidemiological data and genetic analyses of circulating strains.
The ability to predict emergence of circulating influenza strains for subsequent annual vaccine development has become vital. Comparisons between antigenic differences and phylogenetic analysis are essential to further the understanding of emergence of multiple lineages of influenza virus variants. Therefore, the aim of this study is to elucidate the evolution of influenza A H1N1 and H3N2 isolates from Thailand over a four years time period early 2006 to 2009. Understanding of viral genome content and its evolution is experted to aid in predicting the emergence of new variant strains during the following season and defining the vaccine composition.

Results
Specimens were collected from patients diagnosed with respiratory illness in Bangkok, Thailand from 2006 to 2009. Clinical specimens were screened for influenza A virus by multiplex real-time PCR and then subtyped as H1 or H3 with specific primers. Of the influenza A virus positive samples, 21H3N2 and 13 H1N1 samples were selected to represent each time-point, respectively. The HA and NA sequences of some strains isolated in 2006 to 2007 have been previously reported [13]. The details are shown in Table 1. (accession numbers are provided in supporting information Table S1).
Five H1N1 isolates from the 2006 season and eight isolates from 2009 were compared with the vaccine strains for phylogenetic associations. H1 phylogeny showed that the isolates clustered in 3 distinct lineages (Fig. 1) Twenty one HA sequences of H3N2 isolates were compared with vaccine strains (Fig. 2 Amino acid sequences of circulating strains were compared to the vaccine strains in order to highlight functional variation that might potentially impact vaccine efficacy. HA constitutes the receptor-binding and membrane fusion glycoprotein of influenza virus. Alignment of the terminal sialic acid (SA) residues of glycoproteins and glycolipids representing the cellular receptors for influenza virus, which are the targets for neutralizing antibodies, and the conserved N-linked glycosylation sites are shown in Fig. 3 and Fig. 4. The conserved amino acid residues in both H1 and H3 influenza A virus, Tyr(Y)-98, Ser(S)-136, Trp(W)-153, His(H)-183 and Tyr(Y)-195 (numbering according to H3 structure) at the HA receptor-binding site have been described by Skehel and Wiley [14]. These residues were found at the HA receptor binding site of both H1N1 and H3N2. The residues mainly responsible for NeuAca2,6Gal linkage of H3 are at positions 226 and 228. Amino acids at the terminal SA of all H3 isolates were Ile(I)-226 and Ser(S)-228, similar to those previously reported by Lindstrom et al. [15]. N-linked glycosylation was commonly found in HA. As for HA1 of the H1 subtype, the seven positions of potential Nglycosylation, with a threshold value of .0.5, were predicted (at positions 15, 27, 58, 91, 129, 164 and 290). These sites were found to be conserved among all isolates in this study. Nine potential Nglycosylation sites were predicted in HA1 of the H3 subtype The HA1 domain of HA, the major antigenic protein of influenza A viruses, contains all the antigenic sites of HA and is continually under selection pressure driven by the host's immune response. Patterns of antigenic site variations in the HA gene were observed by amino acid alignment. H1N1 antibodies are directed to each of the two strain-specific (Sa and Sb) and common antigenic sites (Ca and Cb) of the virus hemagglutinin [16] (Fig. 3). As for H3, the altered amino acids differences were detected at antigenic sites A, B, D and E (  Some internal proteins harbor the amino acid essential for the HLA epitope. Details concerning alteration of the HLA epitope are shown in supporting information Table S2. Most substitutions in the T cell epitope were found in the NP protein of both H1N1 and H3N2 viruses. The substitutions were variable by season or year of collection. Some amino acid of the isolates had undergone variations between two successive seasons demonstrating progressive evolution in each proteins segment. Except for the NS1 protein, HA and NA displayed more variations per season than other proteins (Fig. 5) Fig. 6. Most amino acid differences between lineages were identified in the surface proteins, HA and NA. As for the internal proteins, NS1 also demonstrated remarkable amino acid variation between lineages. In contrast, NP had undergone the least umber of alterations of all 8 proteins compared.

Discussion
Each year, WHO recommends the most suitable composition of influenza vaccine strains for the Northern and Southern Hemispheres, based on phylogenetic analyses of HA and antigenic characteristics of circulating viruses. Accordingly, genetic comparison of the HA sequences determined in this study and vaccine strains showed seasonal clusters are closely related to the vaccine strains recommended.   [17]. Given the lower rate of change in H1N1 viruses, these vaccines would be expected to confer protection against H1N1 viruses that were recovered from clinical cases in 2009. Although, the rate of genetic change of seasonal H1N1 viruses are not as rapid as that identified in H3N2, further surveillance studies are required. Sequence analysis of HA showed high variation in HA1, which might be due to its receptor-binding properties and being targeted by neutralizing antibodies since it represents the membrane fusion glycoprotein of influenza virus. The residues within the receptorbinding site are relatively conserved but the residue mainly responsible for NeuAca2,6Gal linkage specific for the H3 subtype was Ile226 in the place of Leu226 [18]. In H3, amino acid substitutions were detected at three antigenic sites, A, B and E. The antigenic site position preferred for mutation was located at site A. Positions in the H1N1 isolates differentiating them from the A/Solomon Island/3/2006 like lineage and A/Brisbane/59/2007 like lineage were part of the antigenic Sa site. Oligosaccharides in the HA surface proteins might more readily facilitate viral escape from the immune system than single amino acid changes at the antigenic sites. Oligosaccharides may trigger conformational changes in the molecule and mask antigenic sites, which in turn would prevent binding of host antibodies. The predicted N-linked glycosylation at position 144 of the HA antigenic site A has not been observed since the 2006-2007 season. This position may not play any major role in escape from the immune system.
T-cell epitopes in internal proteins of influenza A virus are more conserved than antibody epitopes. The reason for this higher degree of conservation is that 80% of the antibody epitopes are located in the variable glycoproteins HA and NA, while only 40% of the T-cell epitopes are found in these proteins [19]. Most substitutions in regions involved in protective T-cell response were detected in the NP protein, as most T-cell epitopes are defined for the NP protein and this protein constitutes the main target for the host's cytotoxic immune response [20,21].
During the 2008-2009 season, prevalence of oseltamivir resistance was very high among isolates from over 30 countries [22]. In this study, oseltamivir resistance was detected in H1N1 viruses from 2009 strains. The amantadine drug resistant H3N2 viruses increased during the 2007-2009 season in Thailand, while the most of H1N1was sensitive to amantadine.
Complete genome analysis of human influenza A viruses was necessary to obtain a comprehensive picture of virus evolution. The genetic make-up of influenza A viruses changes every year. Hence, continuous antigen and genome sequence surveillance of influenza A viruses is still a requirement. In this study, our

Clinical samples
All Influenza A virus positive samples obtained during the patients' routine examination or treatment were chosen and stored at 270uC for further analysis. All the patient identifiers were removed from these samples to protect patient confidentiality.Patient consent was practically impossible to obtain. However, permission was granted by the director of the hospital for inclusion of these samples in the study. An ethics committee at the Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand approved all study protocols. An IRB approval was obtained under certification reference number COA No. 287/ 2008 and IRB No. 391/51, for this study. In addition, all specimens were used exclusively for academic research and the patients were not remunerated. Nasopharyngeal aspirates were randomly selected from the seasonal influenza A positive specimens from the routine clinical service during 2006-2009. All samples were tested for influenza A virus subtypes H1 and H3 using a one-step multiplex real-time RT-PCR as described [7]. Specimens positive for pandemic influenza A H1N1 2009 were excluded from this study.  RNA extraction and whole genome sequencing Viral RNA was extracted from 200-ml nasopharyngeal aspiration samples by Real Genomics Viral Nucleic Acid Kit (RBC Bioscience, Taiwan). cDNA was synthesized at 37uC for 3 hours using the M-MLV reverse-transcription system (Promega, Madison, WI) and 1 mM universal primer as described by Hoffmann et al [8]. The whole genome sequences were amplified using the primer sets for human H1N1 and H3N2 influenza A virus (primer sequences are available on request). Briefly, 1 ml of cDNA template was added to the reaction mixture containing 10 ml of 2.5X Eppendorf mastermix (Eppendorf, Hamburg, Germany), 0.5 mM forward and reverse primers and nuclease free water to a final volume of 25 ml. Amplification was performed in a thermal cycler (Eppendorf, Germany) under the following conditions: Denaturation at 94uC for 3 min, followed by 40 amplification cycles consisting of denaturation at 94uC for 30 sec, primer annealing at 50uC for 30 sec (for PB2, PB1, PA genes) and 55uC for 30 sec (for HA, NP, NA, MP, NS genes) followed by extension at 72uC for 90 sec, and concluded by a final extension at 72uC for 7 min. After amplification, electrophoresis was carried out in ethidium bromide containing 2% agarose gels and visualized on a UV trans-illuminator. PCR products were gel-purified using the Perfectprep Gel Cleanup kit (Eppendorf, GmbH, Germany). DNA sequencing (primer sequences on request) was performed by First BASE Laboratories Sdn Bhd (Selangor Darul Ehsan, Malaysia).

Phylogenetic analysis
Nucleotide sequences were aligned with ClustalX v1.83 [9]. Dendrograms were constructed using the Neighbor-joining (NJ) approach implemented in MEGA 4 [10]. Bootstrap support for tree topologies was accomplished using NJ methods implemented in MEGA with 1,000 iterations. Genetic distances based on NJ phylogenetic trees were calculated applying Kimura's twoparameter method using MEGA 4.

Prediction of N-Glycosylation
Potential N-glycosylation sites (amino acids Asn-X-Ser/Thr, where X is not Asp or Pro) were predicted using nine artificial neural networks with the NetNGlyc server 1.0 [11]. A threshold value of .0.5 average potential score was set to predict glycosylated sites. The N-Gly-cosite prediction tool at Los Alamos [12] was used to visualize the fraction of isolates possessing certain glycosylation sites along the aligned sequences.