Molecular Epidemiology and Complete Genome Characterization of H1N1pdm Virus from India

Background Influenza A virus is one of world’s major uncontrolled pathogen, causing seasonal epidemic as well as global pandemic. This was evidenced by recent emergence and continued prevalent 2009 swine origin pandemic H1N1 Influenza A virus, provoking first true pandemic in the past 40 years. In the course of its evolution, the virus acquired many mutations and multiple unidentified molecular determinants are likely responsible for the ability of the 2009 H1N1 virus to cause increased disease severity in humans. Availability of limited data on complete genome hampers the continuous monitoring of this type of events. Outbreaks with considerable morbidity and mortality have been reported from all parts of the country. Methods/Results Considering a large number of clinical cases of infection complete genome based sequence characterization of Indian H1N1pdm virus and their phylogenetic analysis with respect to circulating global viruses was undertaken, to reveal the phylodynamic pattern of H1N1pdm virus in India from 2009–2011. The Clade VII was observed as a major circulating clade in phylogenetic analysis. Selection pressure analysis revealed 18 positively selected sites in major surface proteins of H1N1pdm virus. Conclusions This study clearly revealed that clade VII has been identified as recent circulating clade in India as well globally. Few clade VII specific well identified markers undergone positive selection during virus evolution. Continuous monitoring of the H1N1pdm virus is warranted to track of the virus evolution and further transmission. This study will serve as a baseline data for future surveillance and also for development of suitable therapeutics.


Introduction
Influenza A virus is known to cause an acute respiratory disease with a history of causing severe pandemics including the recent one by novel swine origin Influenza A virus (S-OIV). The property of virus subtype to mutate into variety of strains with differing pathogenic profile, eventually resulted in achieving higher fitness in a brief period. Influenza A virus is a member of family Orthomyxoviridae. Based on the antigenicity, virus may be classified into 16 Hemagglutinin (H1-H16) and 9 Neuraminidase (N1-N9) subtypes. Influenza A virus genome is composed of eight segments of single-stranded, negative-sense RNA and each of which encodes one or two proteins. The HA protein is critical for binding to cellular receptors and fusion of the viral and endosomal membranes. Replication and transcription of viral RNAs (vRNAs) are carried out by three polymerase subunits PB2, PB1, and PA, and the nucleoprotein (NP). Newly synthesized viral ribonucleoprotein complexes are exported from the nucleus to the cytoplasm by the nuclear export protein (NEP, formerly called NS2) and the matrix protein M1, and are assembled into virions at the plasma membrane. NA protein cleaves sialic acid residues on the host cell glycoproteins and glycolipids to which the HA proteins of newly assembled virions bind and, therefore plays an important role in the release of newly formed virions from the host cell membrane [1].
Several reports described both emergence and pandemic potential of the virus in the perspective of earlier pandemic influenza viruses of 1918 (H1N1), 1957 (H2N2) and 1968 (H3N2) through comparison of the available genetic sequence data [2]. The genetic analysis of the novel H1N1 virus isolated from a patient in California revealed that it was a recent reassortment of gene segments from both North American and Eurasian swine lineages. Since April 2009, the novel swine-origin influenza A (H1N1pdm) virus has rapidly spread across the globe. World Health Organization declared the outbreak a global pandemic in June 2009. The WHO global Influenza surveillance network has greatly contributed to the knowledge about circulating influenza viruses, including the emergence of novel strains [3][4][5]. This newly emerged virus represents a quadruple reassortment of two swine strains, one human strain, and one avian strain of influenza virus [6]. The largest proportion of genes comes from swine influenza virus strain (30.6% from North American swine influenza strains, 17.5% from Eurasian swine influenza strains), followed by North American avian influenza strains (34.4%) and human influenza strains (17.5%). Historically, pigs play an important role in interspecies transmission of influenza virus. Susceptible pig cells possess receptors for both avian (alpha 2-3-linked sialic acids) and human influenza strains (alpha 2-6-linked sialic acids). Presence of both receptors allow for the reassortment of influenza virus genes from different species, when a pig cell is infected with more than one strain [7]. The influenza A (H1N1pdm) has caused a considerable number of deaths within a short duration since its emergence [8].
The major symptoms of the disease is characterized by the sudden onset of high fever, chills, coughing, sore throat, muscle pain, severe headache, malaise, and inflammation of the upper respiratory tract and trachea, with general discomfort, but it rarely induces severe inflammatory lung diseases, including pneumonic involvement due to host innate and acquired immunity. Swine origin pandemic human influenza A virus (H1N1pdm) has spread rapidly around the world since its initial documentation in April 2009. According to last update (29 Jan 2010-update 85) of WHO in pandemic period H1N1pdm had spread to 209 countries and overseas territories, with 14711 deaths since the first reports of the virus in human in April 2009. In India the H1N1pdm virus is circulating through its emergence continuously and viral cases are being reported from different parts of the country in post pandemic phase [9][10][11][12]. Certain specific molecular markers predictive of adaptation to humans were found to be absent in the pandemic Influenza A 2009 (H1N1pdm) viruses suggesting that, previously unrecognized molecular determinants could be responsible for the transmission among humans. Several reports about the comparison of HA gene sequence with those of the earlier influenza pandemics have shown that humanspecific markers supporting efficient transmissibility of these viruses in human are present in the H1N1pdm virus [1,13]. Further, continuous monitoring of the evolution of this virus is advocated to track the mutations that may increase pathogenicity and/or transmissibility.
Understanding the virus evolution within India in relation to global diversification of the virus is also essential. So far, not much data is available on complete genome characterization of Indian H1N1pdm virus. The circumstances surrounding the emergence of this pathogen, and the factors that facilitated the initial crossspecies transmission, are still not fully understood. It became apparent in the early days of the outbreak that the virus can be directly transmitted between humans. Among the various efforts made to evaluate, diagnose and implement the measures against the spread of virus, is the timely release of the genomic sequences from different viral isolates [14]. Keeping this in mind therefore, attempts were made to have adequate genome information to understand the true picture of novel H1N1pdm virus circulating in India. The present study was aimed to elucidate the complete genome sequence information of four recently circulating H1N1pdm virus isolated from different parts of India during 2010-2011. The phylodynamic pattern of H1N1pdm virus from 2009-2012 of global and Indian isolates was analyzed and the implication of resultant mutation due to selection pressure was also discussed in detail.

Results
Clinical Presentation of Suspected H1N1pdm Samples 35 patients (WHO category C cases) were confirmed positive by CDC real time RT-PCR with positivity of 29.16%. The youngest case was a 6 months old female child. Monthly sample analysis profile revealed that 92.5% of the samples pertained to the period September-December 2010-2011, and the rest 7.5% of cases reported besides this period. 47.5% cases were seen amongst the age group of 20 to 39 years, while 15.83% cases were seen amongst the age group of 5-19 years. The median age of the samples investigated was 30 years (range 6 months-76 years). 6.66% of the patients were under age 5 and 10.83% were more than 54 years old. The female/male ratio for H1N1pdm in different age groups were significantly greater than 1. No patient was previously vaccinated, however oseltamivir was started after 5 days in 30% of the cases. An overall case fatality rate was 8.33% with 10 deaths. Maximum deaths were seen in younger age group (7-25 years) with increased case fatality rate of 15% in 2011. Death in complicated cases occurred between 24-48 hours of report to hospitals. The clinical history revealed that all the patients had suffered from fever (.38.0uC). Other prominent clinical symptoms include fever (axilla, Oral) (80%), cough (42%), sore throat (38%), nasal catarrh (75%) and shortness of breath (66%). Monthly and age wise distribution of suspected patients is summarized in Figure 1.

Laboratory Diagnosis of H1N1pdm Samples
Out of 120 suspected samples, 35 (29.16%) were positive for pandemic Influenza A H1N1 and 7 (5.83%) were positive for Influenza A (Seasonal virus). The cases of H1N1pdm started rising from September 2010 with maximum number of cases (n = 44). All the samples were diagnosed by WHO approved CDC Real time RT-PCR using 4 sets of primer and probes. Samples found positive for all the four probes viz. Influenza A, Swine Influenza A, Swine H1, RNase P (Inf A, swA, swH1, RNP) were declared positive for H1N1pdm virus. Each lot of samples were tested with a positive confirmed H1N1pdm cell culture RNA as positive control and healthy throat swab sample RNA as negative control. Detailed features including clinical presentations of H1N1pdm positive samples were summarized in Table 1.

Isolation and Identification of H1N1pdm Virus
Three selected positive samples were attempted for the H1N1pdm virus isolation in MDCK cells through three blind passages. Initially, H1N1pdm virus infection in MDCK cells was analysed microscopically for the appearance of prominent cyotopathic effects (granulation, clustering and finally total detachment from the adherent surface) till 48-72 hpi ( Figure 2A). Infected cell culture supernatant was harvested at this stage and used for further identification and complete genome characterization. Hemagglutination (HA) titre with guinea pig RBC was determined in infected culture supernatant i.e. the highest dilution at which hemagglutination occurred. The HA titre was found 16-32 for the four different isolates used in this study ( Figure 2C). Immunofluorescence test was performed to observe localization of the intracellular H1N1pdm virus using anti-pdmH1N1 HA polyclonal antibody (GenScript, USA). Bright apple green fluorescence was observed in H1N1pdm virus infected cells whereas no fluorescence was observed in mock infected MDCK cells ( Figure 2B). Virus isolation was also confirmed at genomic level at different passage level with WHO approved CDC Real time RT-PCR ( Figure 2D).

Analysis of the Concatenated Complete Genome of the Indian H1N1pdm Virus
The genome sequences of representative Influenza A (H1N1pdm) viruses of diverse geographical origins were retrieved from NCBI GenBank database from the period of 2009-2012 (

Phylogenetic Analysis
Extensive phylogenetic analysis based on concatenated whole genome sequences (13158 nt; n = 65) and full HA gene (1701 nt; n = 45) of representative H1N1pdm viruses sampled between 2009-2012 from different geographical regions along with the Indian isolates revealed seven distinct clades ( Figure 3 and Figure 4 ). Both the phylogenetic analysis revealed the same topology. All the four Indian isolates sequenced in this study formed a close branch and grouped into clade VII. This clade VII was represented by maximum number of isolates from geographically diverse areas.

Analysis of Individual Gene Segments
Comparison of individual gene segment at protein level with respect to A/California/04/2009 (H1N1pdm prototype strain) and A/India/Pune/NIV6447/2009 (previously sequenced Indian strain) revealed a total of 73 substitutions scattered throughout the eight gene segments in four Indian viruses sequenced in this study.

Selection Pressure Analysis
Selection pressure analysis of HA, NA and MP gene of 72 global H1N1pdm virus strains revealed 18 positively selected sites. Integrated analysis was performed for differential selection pressure acting on HA (566 codons), NA (469 codons), M1 (252 codons) and M2 (97 codons) proteins. Positive selection on HA gene was stronger than NA, M1 and M2 protein gene. In total 11 HA, 3 NA, 2 M1 and 2 M2 sites were found under positive selection by at least two methods ( Table 4). Out of 11 HA sites, 2 positions were located in signal peptide, 4 sites in HA1 and 5 sites in HA2. Position 151, 222 and 239 were situated within a known B-cell antigenic region. 3 sites (30, 248 and 386) in NA gene were found to be positively selected. Analysis of matrix protein gene revealed 2 sites each in M1 (28, 181) and M2 (10,26) to be under positive selection. A specific selection pressure analysis for Indian isolates (n = 17) for HA and NA gene revealed 3 sites in HA and 2 sites in NA gene under positive selection (Table S3). Out of these S220T (HA) and N248D (NA) were earlier attributed to clade VII specific substitutions [19,21].

Discussion
Transmission of pandemic Influenza virus is persisting in many continents but current activity levels are low in Asia. Recent peaks in the activity were noted during early 2010 in northern India, Nepal and Sri Lanka. Influenza activity remained stable but elevated in western India, continued to decline substantially in Northern India, and remained low overall in Southern and Eastern India [15]. This virus was generated by multiple reassortment events, and each of its precursor gene segments has circulated in swine for more than 10 years. Infection of swine with H1N1/2009 virus has been observed in multiple countries. But, because of a paucity of systematic surveillance of swine influenza worldwide the question remains whether H1N1/2009 will become established in swine and become a reservoir of reassortment that may produce novel viruses of potential threat to public health [16].  The H1N1/2009 virus has remained antigenically and genetically stable and are relatively low virulence in humans since its detection in April 2009. Most genetic changes in H1N1pdm to date have not been clearly linked to changes in antigenicity, disease severity, antiviral drug resistance, or transmission efficiency. However, rapid evolution rate characteristic of influenza viruses suggest that changes in antigenicity are inevitable in future [17]. With the number of reported pandemic cases of H1N1 virus in many parts of the world and continued viral persistence in India and nearby countries (Nepal, Sri Lanka, Bangladesh), elevated activity has given an urgent need to track the global dispersion of this virus in humans.
In this particular study, the main focus was complete genome characterization of the circulating isolates of northern India (Gwalior region) and to decipher conservative and non conservative substitutions, its comparative analysis with respect to other Indian and global circulating H1N1pdm isolates. The continued circulation of virus in particular region from 2009-till date is also a serious concern and required in depth investigation. With the determined objective of molecular investigation of circulating H1N1pdm virus, Influenza like illness (ILI) in suspected clinical samples from Gwalior, India during 2010-2011 were investigated. The clinical picture of the patients revealed the same pattern as was reported in 2009 [18] but there was an increase in number of H1N1pdm cases in 2010. It was revealed during the study that the virus has affected all the age groups with the highest in young age group. The numbers of females were affected more than males during the period under observation. Fatality ratio (5.83%) was found prominently high in young persons. Young groups have least experience of influenza A (H1N1pdm) virus and are recognized as potential source in the transmission of influenza. It is also possible that propensity to consult doctor is greatest in younger age groups. However, in 2011 the numbers of positive cases were higher in young age group of 18-28 Yr. The possible reason of higher cases in 2011 may be increase in viral virulence and its better adaptation in the region, which may become severe in the coming years.
In this study four Indian isolates that are confirmed by virus specific CPE, HA, IFT as well as CDC Real time RT-PCR were selected for complete genome characterization. The nucleotide sequence analysis revealed that there is no significant difference among viruses recovered from two different places and of different years from India. Diversity of the Indian isolates at the amino acid level with respect to the prototype strain and within the Indian isolates was found to be maximum in the HA and NP gene. Substitution S220T (HA) specific to clade VII isolates was adequate to lineate the isolates in HA-based phylogeny. Most of the amino acid changes were conservative, involving interchanges of amino acids having same physicochemical properties. However, few major non-conservative changes between Indian isolates were also observed. Compared to the prototype strain, glutamic acid was replaced by a strongly basic amino acid lysine at position 391 (HA) among the four Indian H1N1pdm virus and at the position 71 (NS1) in one Indian H1N1pdm virus sequenced in this study. Two important non conservative substitutions involving acidic aspartic acid to basic histidine at position 441(PB2) in two Indian H1N1pdm virus and cyclic proline to acyclic serine at position 100(HA) among the four Indian H1N1pdm virus were also recorded. Similar non conservative substitutions involving shift in amino acids were also recorded in other gene segments. However, the significance of these substitutions need to be addressed.
To identify genetic lineage of H1N1pdm virus, phylogenetic analysis was conducted for concatenated whole genome sequences retrieved from GenBank from 2009-2012 including all the available H1N1pdm whole genome from India sequenced till date. Whole genome and full HA based phylogenetic analysis revealed existing seven discrete clades of H1N1pdm virus circulating globally. Both the trees based on genome information comprised of all representative H1N1pdm clades from diverse geographical origin which included maximum number of representative H1N1pdm from all the affected areas. Both the trees yielded similar topologies, with characteristic distribution of H1N1pdm isolates into seven distinct clades. Maximum numbers of isolates were grouped into clade VII. The clade I included prototype California/04 and California/07 virus isolated first during H1N1pdm [19]. All Indian isolates (2009-2011) were grouped in clade VII except Hyd/NIV51/2009 and Pune/ NIV6196/2009, Pune/NIV10604/2009 (HA gene phylogeny) virus isolated during initial pandemic phase grouped into clade V and VI respectively [20]. Clade VII is identified as predominant circulating clade in India, Asia as well as globally [19]. Phylogenetic analysis of all Indian H1N1pdm complete genome sequenced so far demonstrated that earliest isolate from Hyderabad (A/India/Hyd/NIV51/2009) during initial pandemic phase was a clade V isolate. Two other isolates from Pune during later pandemic phase (A/India/pune/NIV6196/2009, A/India/pune/ NIV10604/2009) belonged to clade VI. Both the cases were not directly associated with any foreign travel history that is why it is not clear whether the clade evolved within the country or were imparted into the country. All other Indian isolates from last pandemic phase to post pandemic phase belonged to clade VII. Two initial Indian isolates belonging to clade VII had a foreign travel history and thus may be indicative of the fact that clade VII was introduced from an external source [21]. Therefore it may be possible clade VII is favourably selected as dominant H1N1pdm lineage in India.
Influenza viruses comprise of segmented viral genome, and are more prone to genetic reassortment during mixed infections. Hence the circulating H1N1pdm strains also evolve and may favourably be selected with higher fitness at a particular time point. It is most likely that the H1N1pdm strains were also undergone similar evolutionary process and the viruses of higher fitness were favourably selected over time. The selection pressure analysis revealed 18 positively selected sites in major surface proteins of Influenza A (H1N1pdm) virus i.e. HA, NA and matrix proteins. Since these proteins plays crucial role in the attachment, assembly, release of the virus, these substitutions might have played important role in making these isolates more transmissible. Differential selection analysis also supported the pandemic 2009 strains being subject to distinctive selection compared to their progenitors [21]. The results indicated HA gene may experience stronger positive selection compared to NA and matrix gene in process of adaptation to the human population globally. Out of 18 positive selected sites, the S220T (HA; found in Indian isolate) and I30V (NA; found in global isolate) were also reported in previous studies as clade VII specific markers [19]. Position A151T/V and R222K, are situated within A and D epitopic regions of HA and is also associated with receptor binding [22]. Since HA plays a crucial role in virus attachment, these substitution might have played an important role in virus transmission.
The present study is the first systematic study carried out to characterize the true genetic nature of recently circulating Indian H1N1pdm virus in post-pandemic phase. This study clearly indicates that the cosmopolitan clade VII is predominant in India. Few reported Clade VII markers revealed in this study indicates that the clade is undergone positive selection during virus evolution since last 3 years and a shift to clade VII in Indian isolates was observed from other circulating clades during 2009- 2012. The complete genome information of recent H1N1pdm Indian virus isolate elucidated for the first time in this study will serve in future epidemiological surveillance in Indian subcontinent and abroad.

Clinical Samples and Virus
A total of 120 acute phase throat/nasopharyngeal swab samples suspected for H1N1pdm virus, with Influenza A like illness between 3-7 days of onset of fever (with case definition of sudden onset of fever .38uC, cough or sore throat) were referred from sentinel hospitals in Gwalior, India for the laboratory investigation of H1N1pdm outbreak during 2010 and 2011. Throat/nasopharyngeal swab samples were received in viral transport medium (Himedia) at appropriate cold temperature (4uC) and triple packaging system. All the samples were processed in the High Containment Facility (a biosafety level 23 laboratory) at DRDE, Gwalior. A total of four Indian isolates (3 from Gwalior and 1 from Bangalore) were selected for the complete genome sequence and phylogenetic analysis in this study.

Nucleic Acid Extraction
Viral RNA was extracted from 140 ml of clinical sample and cell culture supernatant (Isolates) by using QIAamp viral RNA mini kit (Qiagen, Germany) in accordance with the manufacturer's instructions. Finally, RNA was eluted in 50 ml of elution buffer and stored at 280uC until use.

Real-time RT-PCR
The CDC Real-time RT-PCR assay was used for novel swine flu virus identification in MX 3000P quantitative PCR system (Stratagene, USA). The assay is based on Taqman chemistry including a panel of oligonucleotide primers and dual labeled hydrolysis probe sets [universal Influenza A (Inf A), swine influenza A (swInf A), swine H1 (swH1), and RNaseP (RP)] employing Invitrogen SuperScript TM III PlatinumH one step quantitative kit. The amplification was carried out in a 25 ml reaction volume according to the CDC instruction and standard thermal profile for sample screening [23]. Briefly, the reagents include 26 buffer (Invitrogen One-step RT-PCR kit, USA) 12.5 ml, enzyme mix 0.5 ml, both forward and reverse primers 0.5 ml (40 mM), and probe 0.5 ml (10 mM) each and DEPC treated water added up to a total volume of 25 ml. Finally, 5 ml of viral RNA eluate extracted from different samples was added for Realtime RT-PCR assay.  [24]. Tissue culture fluid was harvested after observing MDCK cell lines for cytopathic effect. Morphological changes of MDCK cells were photographed with an inverted microscope (Olympus IX 71) at 0 to 72 hr. The presence of pandemic H1N1 virus in infected culture fluid was demonstrated by hemagglutination, immunofluorescence using virus specific antibodies and CDC real time RT-PCR.
Hemagglutination (HA) test was performed using guinea pig RBC following standard protocol [25]. Briefly, the infected culture supernatant was allowed to react with 0.5% of RBC to hemagglutination reaction for 1 h at room temperature. After incubation, results were interpreted accordingly, a positive reaction was observed by mat formation in U-bottom plate (Nunc, USA) and settled RBCs in the form of button for negative reaction. For the immunofluorescence test (IFT), virus was allowed to infect the cells at required time points and the cells were washed 3 times with PBS followed by the fixation with chilled methanol for 1 h. The fixed cells were then permeabilized by 0.1% Triton-X 100 at room temperature for 20 min and incubated with rabbit Anti-pdmH1N1 HA pAb (1:2000) (GenScript, USA) followed by anti-rabbit IgG-FITC conjugate (Sigma)(1:160). Cells were washed and visualized under Carl-Zeiss Aximot 2 (Germany) microscope equipped for incident illumination with a narrow band filter combination selective for FITC. Virus at different passage levels were also confirmed by CDC Real time RT-PCR as described above.

Complete Génome Amplification
One step RT-PCR was carried out to amplify all the eight segments using the recommended WHO-CDC whole genome primers [23]. Each gene segments were amplified in three to eight fragments of 324 to 833 bp (Minimum to maximum product size) with 100 bp overlapping sequence in order to get at least four fold sequence coverage. A total of 46 overlapping amplicons spanning the complete genomic region were amplified using 92 primers. To amplify each segment, 5 ml of RNA was added to a 25 ml of master mix containing 2.5 ml 10X PCR buffer, 1.5 ml MgCl 2 (3 mM), 0.5 ml dNTP (200 mM each), 0.5 Reverse Transcriptase (0.4 units/ ml), 0.5 ml RNAse inhibitor (0.4 units/ml), 0.5 ml TaqDNA polymerase (0.05 units/ml), 0.25 ml of respective forward and reverse primers and 13 ml of molecular biology grade water. The One-step RT-PCR was carried out using Enhanced Avian HS RT-PCR kit (Sigma, USA). The PCR amplification was carried out in a final volume of 25 ml in a thermal cycler (Bio-Rad, USA). The thermal profile comprised of reverse transcription at 48uC for 45 min, initial denaturation at 95uC for 2 min followed by 35 cycles at 95uC for 1 min, annealing at 56-65uC for 1 min, extension at 72uC for 2 min and final extension at 72uC for 10 min. The PCR products were gel purified from 1% agarose gel using the QIAquick gel extraction kit (Qiagen, Germany) and used as template in sequencing reactions.

Sequencing Reaction
Double pass sequencing was carried out employing big dye terminator cycle sequencing ready reaction kit (Perkin-Elmer, Applied Biosystems, USA) on an ABI 310 sequencer. Briefly, each sequencing reaction was carried out in a final volume of 10 ml by mixing the Big Dye terminator mix containing the thermostable AmpliTaq DNA polymerase, dNTPs and four dye-labelled dideoxy nucleotide terminators (ddNTPs) and 25 ng of purified PCR product, and 3.2 pmol of either sense or antisense primer. Cycle sequencing parameters were as follows: 25 cycles of 96uC for 5 sec, 50uC for 15 sec, and 60uC for 4 min). The reaction mixture was column purified and the DNA was dried in vacuum. The DNA pellet was resuspended in 15 ml of hidiformamide, heated at 95uC for 5 min before loaded on the ABI 310 automated DNA sequencer (Applied Biosystems, USA).

Sequence Analysis
The nucleotide sequences were retrieved, edited and analysed using the SeqScape (Applied Biosystems, USA) and EditSeq and MegAlign modules of Lasergene 5 software package (DNASTAR Inc, USA). Multiple sequence alignment was carried out employing MUSCLE [26]. The deduced amino acid was determined from the nucleotide sequence using the EditSeq module of Lasergene 5 software package (DNASTAR Inc, USA). The percent nucleotide identity and percent amino acid identity values were calculated as pairwise p-distances. Extensive phylogenetic analysis based on full HA gene (1701 nt) and complete genome (13158nt: concatenated eight segments) were carried out by including 45 and 65 globally diverse H1N1pdm sequences (Table  S1) respectively using MrBayes version 3.1.2 [27]. The Bayesian tree was inferred by running a Markov-chain Monte Carlo algorithm for 1, million generations, sampling at every 100 th generation with a burn in setting of 10% of generations. The GTR+G+I model (general time-reversible model with gammadistributed rates of variation among sites and a proportion of invariable sites) was found to be the best-fit model for our dataset. Convergence was assessed using mean SD in partition frequency values by using a threshold of 0.01.

Selection Pressure Analysis
Selection pressure analysis acting on the codons of surface proteins i.e. hemagglutinin (HA), neuraminidase (NA) and matrix protein (MP) of H1N1pdm virus was carried out using HyPhy open-source software package available under the datamonkey web-server (http://www.datamonkey.org/) [28]. Analysis was performed using reference sequences [n = 80(HA); n = 73(NA); n = 71(MP)] including Indian H1N1pdm virus for all the three gene segments (Table S2). A separate analysis for HA and NA gene were also carried out by including 17 Indian H1N1pdm viruses (Table S2). The ratio of non-synonnymous (dN) to synonymous (dS) substitutions per site (dN/dS or v) were estimated using five different approaches including: single likelihood ancestor counting (SLAC), fixed effects likelihood (FEL), random effects method (REL), mixed effects model of evolution (MEME), fast unbiased bayesian approximation (FU-BAR). Best nucleotide substitutions model for different data sets as determined through the available tool in Datamonkey server was adopted in the analysis.

Supporting Information
Table S1 Gene bank accession numbers used in Phylogenetic analysis.