Molecular features of influenza A (H1N1)pdm09 prevalent in Mexico during winter seasons 2012-2014

Since the emergence of the pandemic H1N1pdm09 virus in Mexico and California, biannual increases in the number of cases have been detected in Mexico. As observed in previous seasons, pandemic A/H1N1 09 virus was detected in severe cases during the 2011–2012 winter season and finally, during the 2013–2014 winter season it became the most prevalent influenza virus. Molecular and phylogenetic analyses of the whole viral genome are necessary to determine the antigenic and pathogenic characteristics of influenza viruses that cause severe outcomes of the disease. In this paper, we analyzed the evolution, antigenic and genetic drift of Mexican isolates from 2009, at the beginning of the pandemic, to 2014. We found a clear variation of the virus in Mexico from the 2011–2014 season due to different markers and in accordance with previous reports. In this study, we identified 13 novel substitutions with important biological effects, including virulence, T cell epitope presented by MHC and host specificity shift and some others substitutions might have more than one biological function. The systematic monitoring of mutations on whole genome of influenza A pH1N1 (2009) virus circulating at INER in Mexico City might provide valuable information to predict the emergence of new pathogenic influenza virus


Introduction
Since the emergence of the pandemic H1N1pdm09 virus in Mexico and California, biannual increases in the number of cases have been detected in Mexico [1,2] The first outbreak began from late March to July 2009, followed by a second wave from late August 2009 to March 2010 [1]. The 2010-2011 winter season was characterized by low cases of pandemic virus and a predominance of Influenza A H3N2 and Influenza B [3]. As  Viral genome sequencing 50 clinical samples were positive for AH1N1pdm09 and we obtained the whole viral genome sequence in 23 of these samples. 13 samples were sequenced using high-density oligonuclotides microarrays and 10 samples were sequenced using massive parallel sequencing with the Illumina platform. 4 samples were sequenced using both technologies. Additionally, there were 4 samples where we could not achieve whole genome sequences and we only obtained sequences from specific segments (INMEGEN-INER 1

Re-sequencing array
The whole genome sequences from clinical samples were obtained using high-density oligonuclotides microarrays for re-sequencing, (GIS H1N1 Flu BioSurveillance Array Nimblechip 132k) as previously described [10] Retro transcription (RT) of the influenza AH1N1pdm09 viral genome was carried out from RNA extracted from nasal swabs of clinical samples positive to this viral type. We used the Invitrogen Super Script III high fidelity enzymes kit (SuperScript III First strand synthesis system for RT-PCR, Cat No. 18080-051) and primers from the Biosurveillance Resequencing Oligo Kit (AITbiotech, AITBH1N1M-50), for RT-PCR. Later we used 2 ul of cDNA from the RT for the 8 segments amplification by PCR with Platinum Taq DNA polymerase (Platinum Taq DNA polymerase, Thermo Scientific, Foster City, CA, USA) and cocktail of primers included in the resequencing oligo kit. PCR conditions were 94˚C, 2min; 40 cycles (94˚C, 30 s; 66˚C, 3min); 72˚C, 5min.
PCR fragments from each sample were pooled and labeled with CY3 (Nimbelgen Onecolor DNA Labeling Kit, REF 05 223 555 001) independently and 800 ng of the labeled reaction were used for array hybridization, following the manufacturer´s instructions (NimbleGen Hybridization Kit, REF 05 583 683 001).
Microarrays were scanned using a GenePix 4000B scanner and processed with the Nimble-Scan and EvolStar softwares to translate the fluorescence intensity into nucleotides and obtain a sequence FASTA file.
In total, the array contains 8,236 control probes and 121,928 H1N1(2009) probes, which provides 2X coverage of the entire H1N1(2009) genome, and up to 8X coverage of the regions comprising the 36 mutation hotspots and 10 drug-binding sites [11].
Another set of 10 clinical samples were sequenced using massive paralleled sequencing (MPS) on an Illumina MiSeq (Illumina, CA, USA), as described below. Four samples were sequenced in parallel by microarrays and (MPS).
Massive parallel sequencing. The 8 viral segments were amplified simultaneously and directly from clinical samples, using MBTuni12 and MBTuni13 primers, as described elsewhere [12]. Amplification products for 10 samples were gel-purified (QIAquick Gel Extraction Kit; QIAGEN, Valencia, CA). Barcoded libraries for NGS were produced using Nextera XT DNA Library Preparation kit and paired-end sequencing (2x250 cycles) was performed using the MiSeq platform (Illumina). The reads were mapped to A/California/07/2009(H1N1)/2009 (H1N1), data was obtained from the NIAID Influenza Research Database (IRD) [13] (access number of this sequence is presented in S1 Table) using SMALT V.0.7.6. Assembly was performed using Velvet package (Velvet V1.2.10) and visualized with TABLET (V 1.15.09.01). A 100% coverage was achieved for each virus, with at least 90 depth for HA, NS, NA, M and NP segments (mean coverage: 10). The PA, PB1 and PB2 segments had 80% coverage at >10x. The sequences from this study were deposited at the NIAID Influenza Research Database (IRD, http://www.fludb.org) [13] and are available in GenBank (Accesion numbers can be found in S1 Table) Mutation analysis The sequences were analyzed using A/California/07/2009(H1N1)/2009 amino acid sequence as reference and compared to global sequences at FluSurver [14] from 2009 to 2016, this for each of the eight viral segments (Access number in S1 Table), in order to identify novel substitutions and determine how many times reported substitutions have been previously reported. Sequences were aligned using the CLUSTAL W method in Molecular Evolutionary Genetics Analysis (MEGA 6.0) software [15].  [16] and edited by MEGA 6.0. A maximum likelihood tree was constructed for each influenza segment using MEGA 6.0. The Tajima-Nei model was selected with 5-parameter gamma distributed rates and 1,000 bootstrap replicates. Editing of trees was made using Tree (http://tree.bio.ed. ac.uk/software/figtree/).

Results
Clinical and demographic characteristics of the studied individuals are summarized in Table 1. The mean age of the patients was 45.4 ± 20.8 years and 76.4% of these individuals were male. Fatal outcome (%) 5.8 Data are means ± standard deviation (SD), or number and percentage. *2 patients had asthma and 1 COPD.
The mean BMI (kg/m 2 ) was 29.2, and 23.5% required mechanical ventilation for 15.2 (mean) days. Patients had 22 days of hospitalization stay, 17.6% of them had co-morbidities such as asthma and chronic obstructive pulmonary disease (COPD), and 5.8% of these patients died.
Patients who required mechanical ventilation displayed a Kirby index (PaO 2 /FiO 2 ) (mean: 104.7). The main signs and symptoms at the start of the illness included fever, myalgia, cough, and headaches, while chest pain, dyspnea, and cyanosis occurred after the third day. Critically ill patients received oseltamivir (150 mg/day) during the period while they were under mechanical ventilation while non-critically ill patients received 150mg/day of oseltamivir for 5 days.
Substitutions detected by whole genome sequencing of the AH1N1 09pdm influenza virus 13 of the substitutions we identified were novel, and their potential biological role has not been defined. Substitutions that have been previously reported are involved in host specificity shift, viral oligomerization interfaces (VOI), binding small ligands (BSL), Ab's recognition, binding host proteins (BHP), binding nucleic acids (BNA) and antigenic drift. VOI is calculated automatically from known structures and captures sites both in viral oligomerization sites as well as crystal contacts. BHP includes mostly interaction with human immune response proteins. BSL is derived from small ligands seen in known crystal structures and, in case of influenza surface proteins, often signifies interaction with small glycans which can be in or near a glycosylation site.  Fig 1). These Mexican sequences clustered next to HA sequences of strains from New York (CY189313, CY189361, bootstrap value 66). These sequences share two substitutions in Signal Peptide of the immature HA at amino acid 7 and 15 (-11 V/I and -3 A/T). Group 2 (KR271559 and KR1571583) sequences clustered separately and shared a novel substitution in HA H138Q. Group 3 (KR271599 and KR271567) clustered with strains isolated in New York, Washington and Helsinki. One isolate (KR271567) shared mutation T474M with sequences from Helsinki and Finland principally. We mapped the amino acid substitutions occurring at the nodes of H1N1/2009 HA phylogeny, revealing amino acid changes at K163Q and A256T for 2013-2014 sequences as reported previously. Additionally, a cluster of 2 Mexican sequences was marked by substitution H138Q. Regarding NA, we observed 3 substitutions that marked sequences from 2013-2014; I34V, I321V and K432E. To determine whether individual sites were under positive selection, we used the mixed effects model of evolution (MEME) method. Only I34V was statistically significant (P = 0.05). Additionally we found substitutions that distinguished the three groups mentioned above with HA analyses. We observed substitution Q308L in group 1, while T452I was found in group 2 and the substitutions N449K, N386K and S82P were present on sequence KR271569 in group 3. For NS1 we detected the substitution L36I in group 1, the N127S in group 2 and finally the substitution K131E in group 3. Regarding PA we found the substitution I13V as a signature of the group 1 (statistically significant with P = 0.05), while in group 3 we detected two substitutions F35L and I459T. In PB1 we associated the substitution A374T with group 2 and M646I with group 3. In PB2 T676I was found in group 2 [17]. Other substitutions Red dots at nodes show branches with 50% bootstrap support leading to the 2014 sequences described in this work. Trees for the rest of the viral genome segments can be found in S1-S7 Figs (S1 Fig  NA, S2 Fig PB2, S3 Fig PB2, S4 Fig PA, S5 Fig M, S6 Fig NP, S7 Fig NS). were found in M1, M2, NS2 and NP but were similar to those reported in other countries ( Table 2).

Discussion
Influenza A viruses cause acute respiratory tract infections and represent a significant public health threat [18]. The outbreak strain of swine-origin influenza A/H1N1 virus infection in 2009 is still circulating during the winter season in many countries, and may cause severe pulmonary illness in susceptible individuals [4,19]. Therefore continuous analysis of the entire genome of these viruses will provide comprehensive information of its molecular evolution in order to maintain effective prevention measures for public health. It has been extensively demonstrated that pandemic A/H1N1 influenza virus contains a mixture of segments, including HA, NP, PB1, PB2, PA and NS from a triple reassortant virus isolated in north America and segments NA and M from the Eurasian swine influenza virus [20,21]. Due to the differential mutation rates of each viral segment, molecular epidemiology surveillance is important to detect antigenic variations among circulating influenza strains, which can modify the pathogenicity and antiviral resistance patterns of these viral strains.
In this study we sequenced the entire genome of pandemic A/H1N1 strains isolated from patients in a reference Hospital in Mexico City (INER) in different years and we compared these sequences with consensus sequences in order to detect mutations that might be associated with viral evolution or might influence the antigenicity of the virus. It is important to mention that the sequences were obtained directly from clinical samples, in order to avoid in vitro artificial selection.
We found a clear variation of the virus in Mexico from the 2011-2014 season due to different markers and in accordance with previous reports [22,23]. Mutations V344M and I354L of PB2 and N321K of PA, I397M, I435T of PB1, S498N of NP, N44S, V241L, N369K of NA V80I of M1, L90I of NS1 and S185T, S203T, E374K and S451N of HA appeared together during the evolution of influenza virus in Mexico. However we found other unique substitutions in PB2 G644R and T676I, PB1 A374T, HA H138Q, NA Q308L, and L36I in NS1 of some Mexican strains that could indicate geographical divergence. These findings should be confirmed with more sequences obtained in other regions of the country. Recently a very extensive analysis of sequences of influenza virus in a post-pandemic era showed that different geographical regions generated local epidemic peaks and might act as a potential seed of local virus. Further studies would be required in order to determine which of them are fixed in the next generation of influenza virus in Mexico. It is known that influenza viruses undergo positive selection during the process of cross-species transmission and during the initial stages of a human outbreak, followed by purifying (negative) selection, when viruses adapted to the new human host in late epidemic [24][25][26]. In this context we still observed positive selection of substitutions of NA I34V and PA I13V in winter season 2013-2014.
Our data indicate that the polymerase complex appears to have different evolutionary dynamics; probably due to differences in the evolutionary rate among the viral segments. This might also be a reflection of differences in selective pressure once the virus is in the infected host; regarding the rest of the viral genome segments, all the Mexican isolates from season 2013-2014 clustered with sequences from New York and Helsinki.
Mexican sequences from 2013-2014 belonged to clade 6B characterized by D97N, K163Q, S185T, K283E and A256T substitutions (Fig 1). However a different branch was observed with 2 sequences presenting specific substitutions in all segments, including the substitution H138Q which lied in antigenic site in HA, These results suggest that strains could appear temporarily, with different molecular characteristics and potentially with antigenic or pathogenic implications.
A recently significant change occurred in virus from 2013 to 2014, the substitution K163Q were predominant from these years until now and has been associated with binding prevention of antibodies (Abs) elicited in a larger number of middle-aged humans all over the world [27].
In this study we identified 13 novel substitutions with important biological effects, including virulence, T cell epitope presented by MHC and potential roles in host specificity shifts and it should be noted that although these mutations have not been particularly characterized, they are found to be close to or belong to regions of biological importance previously ( Table 2).
Substitution PB2-R591P is on a position where R has been implicated in a more efficient replication of pandemic H1N1 viruses in mammals [28] while the effect of proline is not known. PB1-S704T is part of the H-2K b epitope and might affect the efficiency of antigen processing during cytotoxic T lymphocyte (CTL) response [29,30]. E16A is in the N-terminal ectodomain of M2 at the outside of the viral membrane near to the disulfide bridges connecting the M2 tetramer 18235503] but its specific effect has not been particularly characterized yet [31].
In HA we observe changes that could affect immunogenicity of the influenza virus; sequences from 2015 to 2016 had additional mutations (S162N) in antigenic site (Sa) and together with the substitution I276T are defining a new clade 6B.1 [21,22]. Importantly S162N may confer an additional glycosylation site in HA that could affect even more the immunogenicity of influenza virus in the future.
Our results indicate that the pandemic influenza virus changes through the seasons and show how the use of whole genome sequences are able to provide a deeper understanding of virus evolution through time.
Supporting information S1 Table. Accession numbers of sequence used as reference and accession numbers of sequences from this study.