Molecular Evolutionary Analysis of pH1N1 2009 Influenza Virus in Reunion Island, South West Indian Ocean Region: A Cohort Study

Background/Objectives Molecular epidemiology is a powerful tool to decipher the dynamics of viral transmission, quasispecies temporal evolution and origins. Little is known about the pH1N1 molecular dynamics in general population. A prospective study (CoPanFlu-RUN) was carried out in Reunion Island to characterize pH1N1 genetic variability and molecular evolution occurring in population during the pH1N1 Influenza pandemic in 2009. Methodology We directly amplified pH1N1 genomes from 28 different nasal swabs (26 individuals from 21 households). Fifteen strains were fully sequenced and 13 partially. This includes pairs of sequences from different members of 5 separate households; and two pairs from individuals, collected at different times. We assessed the molecular evolution of pH1N1 by genetic variability and phylogenetic analyses. Principal Findings We found that i) Reunion pH1N1 sequences stemmed from global “clade 7” but shaped two phylogenetic sub-clades; ii) D239E mutation was identified in the hemagglutinin protein of all Reunion sequences, a mutation which has been associated elsewhere with mild-, upper-respiratory tract pH1N1 infecting strains; iii) Date estimates from molecular phylogenies predicted clade emergence some time before the first detection of pH1N1 by the epidemiological surveillance system; iv) Phylogenetic relatedness was observed between Reunion pH1N1 viruses and those from other countries in South-western Indian Ocean area; v) Quasispecies populations were observed within households and individuals of the cohort-study. Conclusions Surveillance and/or prevention systems presently based on Influenza virus sequence variation should take into account that the majority of studies of pH1N1 Influenza generate genetic data for the HA/NA viral segments obtained from hospitalized-patients, which is potentially non-representative of the overall viral diversity within whole populations. Our observations highlight the importance of collecting unbiased data at the community level and conducting whole genome analysis to accurately understand viral dynamics.


Introduction
The first Influenza pandemic of the 21 st century was caused by the 2009 A/H1N1 Influenza virus (pH1N1), first reported in early spring 2009 in Mexico and the United States [1]. Initial phylogenetic studies showed that this virus was a reassortant of genomic segments from an Eurasian lineage swine H1N1 Influenza virus and a North American triple-reassortant swine H1N2 or H1N1 virus [2,3]. From July 19 the new virus spread across the world, reaching more than 140 countries [4]. The early viral diversification into seven discrete genetic clades [5] was further confirmed by several subsequent studies [6][7][8][9]. Clade 7 rapidly became the most prevalent worldwide, but other clade-variants continued to circulate, as most countries were affected by pH1N1 through multiple introductions of different clade members [6,8,[10][11][12][13][14]. These multiple introductions are likely explained by the air-borne transmission of flu [15] and by intense international air traffic and exchanges [12]. International aircraft travel on which passengers typically are confined for a period of hours present opportunities for air borne transmission. Airborne viral diseases as in the case of Influenza are more prone during the preclinical incubation period to silent transmission and large diffusion among travellers of the same flight and hence are more likely associated with multiple undetected introductions in a given country. Once introduced, new viral strains are likely to spread rapidly across geographic regions [16].
The epidemic wave of the Influenza pandemic caused by pH1N1 reached Reunion Island during the austral winter 2009 (July-December). According to the department of epidemiological surveillance, the epidemic activity started on week 30 (July 20), peaked on week 35 (August 24) and was completely over by week 40 (September 28) [17]. The first case imported from Australia was detected on July 5 and the first autochthonous transmission was recorded on July 21. First estimates suggested that 67,000 individuals who had consulted a physician were infected by the pH1N1 virus [17]. However, these studies were largely skewed towards symptomatic patients seeking medical support and their conclusions could hardly be extrapolated to ILIs occurring in the community.
The CoPanFlu programme is an international project dedicated to the study of the pandemic, based on the follow-up of household cohorts in metropolitan France, Reunion Island [18] and other African [19], South-American and Asian countries. In Reunion Island, the CoPanFlu-RUN programme provided the first pH1N1 sero-epidemiological analyses and revealed that the number of individuals infected by pH1N1 was at least 3 times higher than the estimates based on epidemiologic surveillance data, mainly as a consequence of the mild or unapparent disease that escaped medical attention [18].
Here we report on the genetic variability and molecular evolution of pH1N1 viruses characterized during this prospective follow-up programme and attempt to understand their evolutionary implications in the geographic context of the South West Indian Ocean (SWIO) region.

Clinical Samples
The CoPanFlu-RUN prospective study was conducted between July 21 (week 30) and October 31 2009 (week 44) The CoPanFlu-RUN cohort was selected to be representative of the whole Reunion Island population. We took special attention to select households representing a wide range of geographic locations in order to minimize the repartition bias. For more details about the cohort design, see [18]. A total of 772 households (2,164 individuals) were included in the study. An active telephonic inquiry was conducted to record Influenza-Like Illness (ILI) symptoms occurring in households. Reports of ILI, defined as documented fever ($37.8uC) with at least one symptom of Upper Respiratory Tract Infection (URTI: sore throat, cough, running and or stopped nose or systemic symptom like aching), in at least one member of a household, led to 3 consecutive visits by a nurse (at days 0, 3 and 8 post-report), during which nasal swabs were collected from all family members regardless of their clinical presentation. ILI alerts were managed for the study period and led to the collection of 1,196 nasal swabs belonging to 443 individuals living in 125 households.

Sample Nomenclature
Reunion Island sequences were labelled according to the origin of the sample, whilst retaining anonymity of the participants : For example, ''2133-5-M1E'' corresponds to the fifth member of the family designated as number ''2133'', whereas ''2133-6-M1E'' refers to the sixth household member. ''M1E'', ''M2E'' and ''M3E'' refer to the first, second or third sampling performed at days 0, 3 or 8 post ILI report, respectively. For households that were sampled twice because of the recurrence of ILI alerts, pH1N1 viruses identified in the three successive sampling events were designated as ''M4E'' ''M5E'' or ''M6E''.

Detection of pH1N1
All nasal swab samples (Sigma-VirocultH, Medical Wire [MWE]) were spiked before nucleic acid extraction with an internal RNA phage control [20]. RNA was extracted from 140 mL of swab supernatant using the QIAamp Viral RNA Mini Kit (Qiagen), according to the manufacturer's protocol.
Samples were subsequently screened for the presence of Influenza A virus RNA by qRT-PCR using a pan-Influenza A SYBR Green qRT-PCR assay targeting the M gene [21] (Quantitect SYBR Green qRT-PCR, Qiagen). pH1N1 detection was assessed using a pH1N1-specific TaqMan probe qRT-PCR assay targeting the HA gene (SuperScript III Platinum one-step qRT-PCR system, Invitrogen), according to the recommendations of the Pasteur Institute (Van der Werf, S. & Enouf, V., SOP/ FluA/130509).

Genome Sequencing and Alignment
Out of the 101 nasal swabs detected positive for pH1N1 (corresponding to 62 individuals) we decided to sequence 28 strains (corresponding to 26 individuals belonging to 21 households) distributed so as to reflect the epidemiologic and temporal dynamic of the epidemics in the cohort [18]: epidemic (W33-37, n = 26) and post-epidemic (W38-40, n = 2) periods. No nasal swabs were detected positive for pH1N1 during the pre-epidemic period (W30-32).
Reverse transcription was performed on freshly extracted RNA using the Superscript III Reverse Transcriptase (Invitrogen, USA), according to the manufacturer's instructions, then cDNA was aliquoted and stored at 280uC. For genetic characterization of viruses, genome amplification (and subsequent sequencing) was performed from RNAs directly extracted from nasal swabs without resorting to cell-culture viral isolation, to avoid any significant alteration in the representation of the different viral populations present in each swab [22]. All PCR amplifications were performed with the Hot Start Taq DNA Polymerase (Promega, France) and a specific set of primers designed for this study (primers available on request), based on a consensus sequence of all the complete sequences of pH1N1 available at the day of design (2009). To sequence the complete open reading frames (ORFs) of each segment, primers were designed to amplify 3-6 overlapping fragments, providing a minimum of 3-fold genetic coverage. Briefly, amplifications were performed with 200 nM of each primer, 2.5 mM of MgCl 2 , 0.2 mM of dNTP, 1.25 U of Hot Start Taq DNA polymerase and 5 mL of cDNA. Cycling conditions were: 95uC -2 min followed by 35 cycles of 94uC -5 sec, Tm -30 sec, 72uC -1 min 30 and a final elongation step at 72uC -7 min. When needed, nested-PCR reactions were performed to increase the amount of cDNA. In this particular cases, PCR reactions were performed as described above, except that only 20 cycles were ran for the first PCR round, following by 35 cycles for the second PCR round, in order to minimize mutations introduced by PCR reactions. Sequences alignments were trimmed and assembled in Geneious Pro 5.3.4 software package [23] using MUSCLE alignment method [24].
More phylogenetic analyses were conducted in detail for SWOI region, based on available sequences recovered from neighboring countries in the EPIFLU TM GISAID database and the previously selected 101 GenBank sequences. As sequences from EPIFLU TM GISAID database present only partial HA (1,174/1,710 nt) and NA (complete ORF) we limited our analysis to these portion of concatenation for all the used sequences.
The DNA substitution model that best fitted our data for both separate segments and concatemers was performed by the software jModelTest 0.1.1 [25] and was considered for phylogenetic and evolutionary analyses. We selected different models of nucleotide substitution using the corrected Akaike information criterion.
Phylogenetic trees were constructed by Maximum Likelihood (ML) within PHYML v.2.1b1 [26], according to the selected nucleotide substitution model. Nodal support was evaluated by 1000 bootstrap replicates. Bayesian phylogenetic Inference (BI) was carried out using MrBayes 3.1.2 [27]. BI for each data set based on the best-fitting model, was conducted with two independent runs of four incrementally heated, Metropolis Coupled Markov Chain Monte Carlo (MCMC) starting from a random tree. MCMC were run for 10,000,000 generations with trees and associated model parameters being sampled every 400 generations. The initial 2000 trees in each run were discarded as burning samples and the harmonic mean of the likelihood was calculated by combining the two independent runs.

Estimation of Evolutionary Distance
Evolutionary distances between sequences of distinct phylogenetic groups in these analyses were calculated using MEGA 5 [28], taking into account the best model of sequences evolution allowing correction of the estimates of evolutionary distance. Since MEGA does not contain the HKY model proposed, we used the Tajima-Nei correction [29], which is the nearest model to those proposed by jModelTest.

Molecular Clock Phylogenies and Date Estimation
Molecular clock phylogenies were estimated using the MCMC method and were calculated by using a maximum of available sequences from Reunion Island (Concat-6). GTR + C was used in the analysis as proposed by jModelTest. Only sequences for which full date information was available were retained for this estimation. The Bayesian MCMC analyses were performed using BEAST v.1.6.1 [30] under a strict molecular clock setting. An exponential-growth coalescent model was chosen as a prior on the tree. We used an UPGMA starting tree and ran a chain length of 50 million by sampling trees every 10,000 generations. Convergence and burning were assessed using Tracer v1.4.1b (http:// tree.bio.ed.ac.uk/software/tracer/). The maximum clade credi-bility tree for analyzing the MCMC data set was annotated by TreeAnotator in the BEAST package. The tree was visualized using FigTree v1.2.2. (http://tree.bio.ed.ac.uk/software/figtree/).

Mutation Analysis
Viral pH1N1 sequences isolated in 2009 from human hosts and for which full-genome information was available were downloaded from the EPIFLU TM GISAID database. Sequences containing residues specific to global clade 7 were identified (,1,600 fullgenome sequences) and concatenated. In order to identify mutations that were specific to the sequences from Reunion Island, by-eye comparison with global sequences was facilitated using Geneious Pro 5.3.4 [23].

Ethics Statements
The protocol was conducted in accordance to the principles expressed in the Declaration of Helsinki and French law for biomedical research. This protocol was approved by the Ethical/ Institution Review Board CPP (Comité de Protection des Personnes of Bordeaux 2 University) and as required by the French low regulation at AFSSAPS under the Nu: ID RCB AFSSAPS: 2009-A00689-48. Every eligible person for participation was asked for giving their written informed consent. All participants gave their written informed consent. We also obtained informed written consent from the next of kin. All samples were de-identified and analyzed anonymously for the study.

Results
The CoPanFlu-RUN study allowed collection of 1,196 nasal swabs belonging to 443 individuals and 125 households that were analysed for the presence of pH1N1. A total of 101 nasal swabs (8.4%) corresponding to 62 individuals (14.0%) tested pH1N1 positive. Amplified material from 28 different swabs (26 individuals from 21 households) was successfully characterised: 15 viruses were fully sequenced (8 segments, complete ORFs) and 13 were partially sequenced (6 segments: PA, HA, NP, NA, M and NS, complete ORFs). This includes 5 household-derived pairs of sequences (i.e. sequences obtained from two distinct individuals living in the same household), and 2 individual-derived pairs of sequences (i.e. sequences obtained from the same individual from two consecutive swabs [ Table 1]). None of the individuals in our cohort population, including the 26 individuals from which the 28 sequences were obtained, developed severe clinical symptoms.

Phylogenetic Analysis
In agreement with previous studies [31,32], which have reported different mutation rates of the different segments of the Influenza viruses, jModelTest analysis demonstrated that different nucleotide substitution models were most relevant for the different genomic segments. ML analyses were carried out using the identified substitution model (GTR + C) for both Concat-6 and Concat-8. However, for the BI, we were able to apply the appropriate model to the analysis of each segment partition (HKY + C for HA, PA and PB1, GTR for NA, HKY for NS and M, HKY + I for NP and PB2).
Global context. The global phylogenetic analyses conducted on concatemers of 8 ( Figure 1) and 6 ( Figure S1) segments from Reunion Island and GenBank sequences support the presence of 7 distinct global clades (major nodes have PP = 1.0 and BP ML .71). Genetic distances between the 7 major clades varied from 0.14% to 0.22% (Table 2).
Sequences from Reunion Island were found to cluster together and with a single sequence from Japan (A/Sapporo/1/  Table 2). Among the analysed set of sequences, no other strong geography-based clustering could be observed between the other members of clade 7, in agreement with previous studies [6,33,34]. Clade RUN sequences were then compared with 1597 pH1N1 clade 7 complete genome sequences retrieved from the EPI-FLU TM GISAID database. When compared to the generated clade 7 consensus sequence, a total of 64 silent mutations, and 48 non-silent mutations were identified in all sequences obtained from the CoPanFlu-RUN cohort. Nucleotide changes that were common to more than one individual and that were uncommon or absent in all other included sequences are listed in Table S1.
Sequences belonging to clade 7 can be characterized by fixed amino acid changes in HA (S220T), NA (V106I) and NS1 (I123V) [5] (Table 3a). All sequences obtained from the CoPanFlu-RUN cohort contained each of these mutations. In addition, mutations NP (V100I) and NA (N248D), which are not systematically detected within clade 7 isolates were seen to be fixed in all sequences from clade RUN. These data confirm that clade RUN originated from clade 7. All clade RUN sequences also exhibited a fixed amino acid mutation in HA (D239E) and a silent mutation in NA (g873a). None of the Reunion sequences contained the specific mutations that are associated with oseltamivir resistance [35,36].
Local and community level context. In order to investigate genetic diversity within clade RUN, the maximum number of viral sequences from Reunion Island (n = 28) were considered in the phylogenetic analysis by considering the Concat-6 ( Figure S1). Reunion Island sequences could be separated into two major subclades: sub-clade ''RUN-A'' clustering the majority of sequences (n = 25, PP = 1 and BP ML = 94), while ''RUN-B'' forms a smaller sub-clade (n = 3, PP = 1 and BP ML = 100) containing the sequence originating from Sapporo (Japan). RUN-A sequences were characterized by mutations in NS1 (N133D), NP (a366g), HA (c42a and t333c) and PB2 (g120a and g1665a), whereas RUN-B sequences lacked all these mutations (Table 3b). Fixed mutations were also observed for RUN-B sequences in PB2 (a807g and V414I), PA (c1794t), NA (a48t) and M (t823c). No correlation was found between the sub-clades and the temporal and geographical distributions of the individuals in Reunion Island, nor the clinical status of the individuals (data not shown).
Among the 28 viral sequences from Reunion Island, there were 5 household pairs of sequences (i.e. sequences generated from two different members of 5 households). Sequences derived from within households showed a high level of genetic similarity, with H2, H3 and H5 forming distinct evolutionary lineages with strong nodal support (Figures 1b and S1), suggestive of direct intrahousehold viral transmission. Similarly, the two individual pairs of sequences obtained from the same individual at d0 and d3 (I1 and I2) were also seen to cluster into distinct evolutionary lineages with strong nodal support, though genetic divergence was still observed between these sequences (0.01%-0.02%).
The mismatches between these related sequences (household members and individuals), are indicative of viral mutations occurring over a short period of time or relating to individual transmission events, and were identified for all segments ( Table 4). The largest number of mismatches was observed in segments PB1 (5/14), PA (3/14) and M (3/14), whereas segments HA (1/14) and NA (0/14) showed little variation. Interestingly, each of the three mutations that were observed three days apart within the same individual was a reversion from a Reunion-specific sequence to that of the clade 7-consensus sequence, meaning that the sequences obtained at day 3 were likely more similar to the ancestral sequence than those obtained on the sample collected three days earlier. As the regeneration of a lost sequence by random mutation is extremely unlikely, this observation suggests the simultaneous presence of both reference and mutant viral quasispecies within these individuals with a shift in quasispecies dominance occurring over the course of three days. The presence of distinct quasispecies in these individuals was verified by analysis of the original sequence chromatograms, where regions with high coverage had differing but unambiguous chromatogram peaks at the corresponding positions (data not shown).
Context at the regional level. In order to trace back early viral circulation of pH1N1 within the SWIO region, further phylogenetic analysis was performed using available sequences from viruses characterized in neighbouring countries. The analysis included segments for which sequence data was most commonly available (HA and NA). Viral strains from both Tanzania and Mauritius clustered with the sequences from the CoPanFlu-RUNcohort suggesting that there had been an active circulation of this variant across the SWIO area. However, sequences from some SWIO islands (the Seychelles archipelago and Madagascar) did not cluster with those of clade RUN, suggesting circulation of multiple viral strains within the region at this time ( Figure 2). Mutations that had been identified as characteristic of clade RUN could be detected within some of the Mauritian and Tanzanian viral segments (Table 5). Available sequences of pH1N1 viruses isolated in Mauritius in August 2009 (at the peak of the epidemic in Reunion Island) could be ascribed to sub-clade RUN-B suggesting transmission of this Influenza virus between the two islands. Interestingly, although the sequences from the Tanzanian and Mauritian viruses identified in July, as well as those from the Japanese isolate from Sapporo in June, clustered with sequences from Reunion Island, these early regional sequences possessed none of the mutations which were associated with either of sub-clades RUN-A or RUN-B.

Evolutionary Characteristics
The mean estimated dates of emergence for each clade are provided in figure 3. All of the obtained dates were in agreement with previously published studies [5,6,11], except for small differences in the estimated dates for clade 1 and 4 which can be explained by differences in sampling. Mean estimated dates of emergence were similar between Concat-8 and Concat-6 (data not shown).
The Most Recent Common Ancestor (TMRCA) of Clade RUN was estimated as May 19 and the estimated dates of emergence of sub-clades RUN-A and RUN-B were June 26 and July 8, respectively. TMRCA of Clade RUN suggests that this viral lineage was in circulation approximately 8-9 weeks before its initial detection in Reunion Island on July 5 and 11-12 weeks before the first autochthonous transmission (July 21) in Reunion Island detected by the epidemiological surveillance network. In addition, our previous observations of specific mutations in sequences obtained from Mauritius (strains MP-t and MP-n) and Tanzania (strain 88) show that regional circulation of pH1N1 preceded viral emergence in Reunion Island (Table 5).

Discussion
Phylogenetic studies have demonstrated that pH1N1 virus has rapidly diversified into 7 discrete global clades. Even though clade 7 soon became largely prevailing while the other clades were gradually fading out in various geographic regions, studies in India [6], Japan [11], Argentina [8], Canada [14] and other countries have reported co-circulation of multiple pH1N1 strains belonging to different clades as a result of several viral introductions from different origins. In Reunion Island all amplified and sequenced pH1N1 viruses stem from clade 7, and were clustered in a local specific clade (clade RUN [Figures 1 & S1]). Evolutionarily, clade RUN was as distant from clade 7 as the other 6 global clades are from each other ( Table 2). Viral sequences from Reunion Island have fixed all the mutations that had previously been identified as being specific to clade 7, including the [V100I]-NP and [N248D]-NA mutations which were not fixed among all clade 7 isolates [5]. In addition, clade RUN was characterized by 2 mutations that were systematically found in all local HA and NA sequences, respectively, the D239E mutation and the silent g873a mutation (Table 3).
There have been many descriptions of the HA D239E mutation [37,38] including predictive structural studies [39]. It has been suggested that this residue likely plays an important structural role for the recognition and attachment of Influenza virus to its host receptor; changes at this position may therefore modulate the host immune response [38]. Recent studies in hospitalized patients suffering pH1N1 infection [40,41] have shown that the D239G, D239N and D239E mutations were associated with severe (i.e. fatal cases reported), less severe, and mild illness, respectively [42]. Moreover, the D239E variant was found to preferentially colonize the upper respiratory tract, while D239G/N mutants also colonize  Table 2. Estimates of Evolutionary Divergence of pH1N1 among different clades. the lower respiratory tract, hence causing severe acute respiratory syndromes, as is often observed in H5N1 Influenza [40]. Of note, none of the participants to the CoPanFlu-RUN cohort, infected by pH1N1, suffered serious medical complications. We have previously shown, based on serological data, that two thirds of individuals in Reunion Island that were infected by pH1N1 escaped medical detection by health services, likely because they developed only mild or inconspicuous disease [18]. This observation is in keeping with reports showing that infections with [D239E]-HA variants have generally speaking, been less severe than the one provoked by other variants. Another salient observation from our population study is the clear division of clade RUN into two sub-clades, RUN-A and RUN-B. The divergence between RUN-A and RUN-B and their Table 3. Specific clade mutations in discrete phylogenetic clades.  Table 4. Genetic sequence mismatches between related sequences. For each segment, the total number of DNA mismatches between individuals within a household (H) and sequences obtained from the same individual on separate days (I) is indicated (cf Table 1 respective TMRCA estimation, suggest plausible multiple viral entries to Reunion Island even though they were closely related members of clade 7. Although the first report of pH1N1 infection detected in Reunion Island concerned a traveler coming from Australia [43], none of the selected Australian sequences appeared to share a common history with Reunion Island viruses. However, common history could be observed with available isolates from Tanzania and Mauritius (Figures 2, 3b and Table 5), with evidence of active circulation of multiple viral strains in the region but no solid conclusion could be made as to the origin of the genetic lineages present in Reunion Island. Several reasons limit the extent of the conclusions that could be inferred from phylogenetic analyses of pH1N1: i) Despite the large number of pH1N1 genome sequences deposited in databases, the information is available only for a small subset of viruses that are circulating at the global level due to the rapid rate of Influenza mutation and spread; ii) Globalization means that autochthonous transmission results in complex patterns of viral emergence that are not limited by geographical boundaries. iii) A large number of sequenced Influenza isolates originate from patients suffering from severe Influenza illness; variants such as those from clade RUN that concerned individuals with mild or inconspicuous disease screened in the course of a prospective study conducted at the community level, represent a minority of sequences available in databanks. This point stresses the importance of prospective, community studies as an unbiased source of genetic material and a representation of the real natural history of disease in population.
Our estimates suggest that the common ancestral strain of Clade RUN emerged 8-9 weeks prior to the first identified case in Reunion Island and 11-12 weeks prior to the first case indicating autochthonous transmission (Figure 3). A similar conclusion was also drawn from Australian sequences [44]. A study reported from Taiwan [45] showed evidence of pre-epidemic subclinical community transmission as proved by seroconversion occurring several weeks before report of the first documented case in the island.
Our prospective study based on the follow-up of a community based cohort investigated viral transmission at the regional, community, intra-household levels as well as the temporal dynamics of Influenza viruses within an individual. Only few studies have developed so far a similar approach and they have focused only on the HA and HA/NA segments [46,47]. These studies, as well as ours, highlight the fact that one can associate genetic variations to intra-household transmission of Influenza virus. However one should be aware that the variability observed in the incubation period, responsible for the ''data stretch'', could lead to misinterpretations in transmission studies. Let us consider a household where two subjects 1 and 2 were concomitantly infected. Subject 1 has a shorter incubation period than subject 2, then subject 1 will be considered as an index case whereas subject 2 will be considered as a transmission case. Similarly, it could lead to a confusion of secondary and tertiary cases [48]. In such situations, mutation data are not sequential but rather linked to the original contaminating viral mixture.
Our experimental protocol did not include viral extraction from cell cultures in order to limit the risk of cell culture driven viral mutations. It has been reported by Zhirnov and collaborators that after isolation in cell culture (MDCK cell line) Influenza viruses often differ from those present in the clinical specimens, since adaptive changes occur during virus transmission from the human host to cells of heterologous origin [22]. Omitting the cell culture stage also offers the advantage of retaining the viral populations balance within the quasispecies mixture that was present in the first sample. As a consequence, we were able to observe in two individuals a rapid shift in the dominance of one viral population within the initial quasispecies. In each case, the characteristic mutation of the amplified material from the initial swab reverted, within 3 days, to the consensus sequence. Similar changes in quasispecies dominance were also observed between household  The presence of mutations with segments HA and NA that had been identified as specific to each genetic grouping (cf.   Figures 1 and S1). BEAST analysis is based on concatemers of 6 segments (complete ORF) of the 28 Reunion Island sequences and 101 globally-representative GenBank sequences. b. TMRCA of sub-clades RUN-A and RUN-B. Red points indicate strains originating from the South West Indian Ocean region that were identified as belonging to the specified sub-clades via identification of specific mutations in segments HA and NA (see Table 5). Throughout, points indicate date positions. Solid lines represent observed dates of an effective circulation for available isolates included in the analysis. Dashed lines represent estimated dates of circulation, dating back to the mean estimated TMRCA (95% confidence intervals). Lines are colored by clade as indicated. Stars nu1 and nu2 indicate the dates of the first imported case of pH1N1 (July 5) and the first autochthonous case (July 21) of pH1N1 in Reunion Island, respectively, as estimated by the regional epidemiological surveillance network. doi:10.1371/journal.pone.0043742.g003 members, and account for intra-household viral variability, reflecting the selection of viral population generated by the transmission bottleneck. The populations best fitted to the infected host, (hence, most represented within the quasispecies), are likely the ones that ultimately will emerge. Interestingly, studies have suggested that the genetic variation at the HA and NA levels, which are under the pressure of the host immune response, are mostly selected on the long-term [49]. Indeed, in our study the majority of shifts occurring on the short term, were not observed in segments HA and NA, but rather in PB1, PA and M, which are internal elements likely submitted to a different selective pressure. The quasispecies mixture that occurs within the individuals, as shown by deep sequencing studies [39,41,42,50,51], allows for diverse mutations to coexist over short periods of time during the pre-immune period. This phenomenon shows that the mechanisms to achieve best fit over the whole viral population at the individual host level, is at work following the inter-individual transmissions [52].
Most studies reported in the literature were conducted on single viral segments (generally HA or NA) of isolates from individuals with severe disease and hence have an inherently biased nature that can lead to important features of viral evolution being overlooked. Our observations highlight the importance of collecting data at the community level and conducting whole genome analysis to accurately understand viral dynamic. Table S1 Characteristic mutations of pH1N1/2009 influenza viral sequences from the CoPanFlu-RUN cohort. Mutations were identified by comparison with reference sequences from closely-related viral strains from outside of Reunion Island. Only mutations that were present in more than one individual are included, and mutation prevalence within sequences obtained from the cohort is indicated. Characteristic mutations found in all sequences are highlighted in bold. (DOC)