Molecular Epidemiology of A/H3N2 and A/H1N1 Influenza Virus during a Single Epidemic Season in the United States

To determine the spatial and temporal dynamics of influenza A virus during a single epidemic, we examined whole-genome sequences of 284 A/H1N1 and 69 A/H3N2 viruses collected across the continental United States during the 2006–2007 influenza season, representing the largest study of its kind undertaken to date. A phylogenetic analysis revealed that multiple clades of both A/H1N1 and A/H3N2 entered and co-circulated in the United States during this season, even in localities that are distant from major metropolitan areas, and with no clear pattern of spatial spread. In addition, co-circulating clades of the same subtype exchanged genome segments through reassortment, producing both a minor clade of A/H3N2 viruses that appears to have re-acquired sensitivity to the adamantane class of antiviral drugs, as well as a likely antigenically distinct A/H1N1 clade that became globally dominant following this season. Overall, the co-circulation of multiple viral clades during the 2006–2007 epidemic season revealed patterns of spatial spread that are far more complex than observed previously, and suggests a major role for both migration and reassortment in shaping the epidemiological dynamics of human influenza A virus.


Introduction
Intensive study of the molecular evolution of influenza A virus has provided important insights into its seasonal genesis and spread in human populations [1][2][3][4]. The rapidity with which both epidemics and pandemics of influenza A virus arise and spread globally has also generated great interest in understanding the spatial-temporal dynamics of this important human pathogen [5][6][7][8][9]. Phylogenetic trees of the epitope-rich HA1 domain of subtype H3N2 influenza A viruses sampled since its emergence in 1968 exhibit a distinctive 'cactus-like' pattern, in which most lineages go extinct within a few years of their genesis, so that usually only a single lineage persists between seasonal epidemics [10,11]. This is most likely the result of strong host-mediated selection pressure, resulting in continual evolution at key antigenic sites, a process termed 'antigenic drift' [11,12]. This antigenic evolution is also episodic, with major changes in antigenicity occurring with a periodicity of approximately 3 years [13]. A variety of epidemiological and evolutionary models have been developed to explain this phylogenetic pattern [14,15], and how the evolution of the HA1 domain relates to that in the rest of the viral genome [16].
Although antigenic drift is clearly a key determinant of influenza A virus evolution, this process has rarely been observed in a single locality over a single epidemic season [17,18]. Rather, multiple viral introductions appear to drive evolution at the scale of local epidemics, allowing for the co-circulation of multiple clades of the same subtype [16,18]. At a global scale, viral migration from regions characterized by more persistent influenza transmission, notably East and South-East Asia, appears to be important in determining large-scale epidemiological patterns [19,20,21]. In addition, reassortment events between viruses of the same subtype occur frequently, and are sometimes associated with major antigenic changes in both the A/H3N2 [22] and A/H1N1 subtypes [23]. However, a complete understanding of the evolutionary and epidemiologic dynamics of influenza A virus at all spatial and temporal scales remains an important goal [24].
Every winter, epidemics of human influenza recur in the United States, and are associated with an annual average of 226,000 hospitalizations and 36,000 deaths, mainly caused by secondary bacterial pneumonia in the elderly and young children [25,26]. Epidemiological models have found a strong correlation between the regional spread of influenza virus infection in the United States and the movement of people to and from their workplace [9]. In addition, US influenza epidemics tend to originate in California, which may reflect this region's interconnectivity to Asia and Australia [9]. Although of great importance, most spatial models have utilized mortality cases due to pneumonia and influenza (P & I) and hence do not consider the evolutionary history of the viruses involved. Indeed, it is striking that detailed phylogenetic analyses of influenza A viruses from a single season at a national level have not been undertaken, even though the rapid rate of influenza A virus evolution [14,[27][28][29] means that viral genome sequences may contain important information on country-wide spatial dynamics.
Our goal here is to determine the spatial-temporal dynamics of influenza A virus during a single epidemic season (2006)(2007) in the United States through the phylogenetic analysis of wholegenome sequence data. Since the 1968 pandemic, A/H3N2 viruses typically dominate most influenza seasons, including 16 of the past 20 US epidemics ( [30], for example), and are associated with higher levels of morbidity and mortality [31], higher rates of evolutionary change [14], and greater synchrony in the timing of local epidemics across the United States than A/H1N1 viruses [9]. However, during the 2006-2007 US influenza epidemic, more viruses reported by the CDC were of the A/H1N1 (62.3%) than the A/H3N2 subtype (37.7%) [30]. The evolutionary dynamics of this epidemic were particularly complex, including a late-season switch in dominance from the A/H1N1 to the A/H3N2 subtype, the co-circulation of multiple antigenically distinct lineages within both A/H1N1 and A/H3N2, an A/H3N2 vaccine mismatch, and the co-circulation of adamantane resistant and sensitive viral lineages in both subtypes [30,32]. Our analysis of 353 wholegenome influenza A virus sequences of both the A/H1N1 (n = 284) and A/H3N2 (n = 69) subtypes from this 2006-2007 US season represents the first attempt to investigate the spatialtemporal spread of a nationwide influenza virus epidemic within the context of genomic-scale evolutionary dynamics.

Multiple introductions of A/H1N1 influenza virus during the 2006-2007 US season generate complex spatial patterns
Our phylogenetic analysis of 284 whole-genome A/H1N1 influenza viruses sampled between December 2006 and March 2007 in 17 US states revealed substantial genetic diversity for all eight segments of the viral genome. In particular, eight phylogenetically distinct clades (denoted A-H), defined by both high bootstrap values and long branch lengths, are evident on the trees of each genome segment, as exemplified by the HA phylogeny ( Figure 1). The phylogenies of the seven other genome segments contain clades identical to those on the HA phylogeny (Figures S1, S2, S3, S4, S5, and S6, with the PB1 phylogeny presented in Figure 2). Previous studies [20,[22][23] suggest that each clade is likely to represent a separate introduction of the virus into the United States, although the small sample of sequences available mean that individual clades may sometimes represent multiple introduction events. One clade, herein denoted clade A, was clearly dominant, as it comprised the majority of isolates (175/ 284 isolates, 61.6%, Table 1). Minor clades B,C,D,E,F,G,and H contained only 47,12,35,6,6,1, and 2 isolates each, respectively (Table 1).
Clade A was the most geographically and temporally pervasive of the eight clades, circulating in 24/30 localities and 14/15 weeks studied, although allowall clades were sampled over wide temporal and geographic scales (Table 1, Figures 3,4). Notably, there was no association between the phylogenetic positions of isolates and their week of collection ( Figure 3) or geographic region ( Figure 4). Rather, clades co-circulated in both time and space, with small clades that are detected in only a single region (E, G, and H) to likely be an artifact of limited sampling. The largest clades A and B were highly geographically dispersed, containing isolates collected from both relatively isolated areas and major US cities spanning all six US regions, including 24 and 18 out of 30 localities sampled, respectively (Table 1, Figure 4). However, in contrast to a simplified spatial model in which a single lineage spreads in a unidirectional manner, we observed no strong signal for viral migration among the co-circulating clades, even when individual clades were studied in isolation (Table 2). Indeed, a parsimonybased analysis in which the US state of origin of each isolate is coded as an extra character and mapped onto each ML tree revealed a strong clustering by US state (p,0.001), but only weak evidence for movement among states (data not shown; available from the authors on request).
The number of isolates collected from different US localities varied widely (ranging from 1 isolate from Detroit, Michigan to 42 isolates from Houston, Texas, Table 1), and such geographical biases in our data had a profound effect on spatial patterning. Accordingly, the number of clades identified in a locality was strongly associated with the number of isolates sampled from that locality (Spearman rho = 0.77, P,0.0001), while the population size of each locality was not associated with the number of viruses or clades identified (P.0.69). In addition, the first virus isolated in our A/H1N1 sample was from Cincinnati, Ohio (Table 2), likely an artifact of the relatively large sample collected from this city (30 isolates, Table 1).
The peak in A/H1N1 genetic diversity occurred during early February (corresponding to week 10, Table 2), with six of the eight clades co-circulating during this week. Geographically uneven sampling also meant that the most clades were detected in the most heavily sampled localities. For example, six clades (A, B, C, D, E, and F) were detected in Houston, Texas, the most intensively sampled locality (Table 1). Extensive genetic diversity was also detected within a single week: as a case in point, at least four clades (A, B, D, F), representing two major antigenically distinct lineages circulating globally (see below), were all present in Houston, Texas during week 10 ( Table 2). Abundant viral diversity was also detected in localities that contributed relatively few (6)(7)(8)(9)(10)(11)(12)(13)(14) isolates, including both urban and remote areas. Three different clades circulated in all of the following localities: Los Angeles, California (clades A, C, F); Denver, Colorado (clades A, C, G); New York

Author Summary
This study is the first of its kind to reconstruct the spread of an epidemic of influenza A virus across a single country, in this case the United States. In contrast to a single viral lineage spreading across this country, a phylogenetic analysis of the whole-genome sequences of more than 300 influenza A viruses of the A/H1N1 and A/H3N2 subtypes sampled from the 2006-2007 epidemic season reveals that multiple phenotypically and antigenically distinct viral lineages of entered and co-circulated in the US during this time. Furthermore, the widespread co-circulation of multiple lineages, even in geographically remote localities, allowed for frequent reassortment between influenza A viruses of the same subtype. Through reassortment, a minor lineage of A/H3N2 viruses surprisingly re-acquired sensitivity to the adamantane class of antiviral drugs, and a new A/H1N1 antigenic variant emerged that later became globally dominant. In sum, these results highlight the complexity of the spread of influenza A virus in time and space, and highlight the need for intensified global surveillance involving whole-genome sequence data.  Table 2). In fact, more than one clade was observed in every locality from which .1 viral sample was obtained ( Table 2).

At least three antigenically distinct clades of A/H1N1 virus co-circulated
To view the phylogenetic relationships among A/H1N1 clades from the 2006-2007 epidemic in a wider geographical context, we included 48 background A/H1N1 influenza viruses sampled from the northern and southern hemispheres between 2001-2006, years that were dominated by viruses antigenically similar to A/New Caledonia/20/1999 ('New Caledonia-like') [30]. These sequences were available for the HA and NA segments, including three antigenically distinct influenza vaccine reference strains selected  Figure 1). Due to their extensive phylogenetic divergence, we define clades F, G, and H as 'set 2' clades, in contrast to the 'set 1' clades A, B, C, D, and E.
We inferred the antigenic characteristics of these eight clades based on their phylogenetic relationships and the number of amino acid differences at antigenic sites in the HA from vaccine reference strains of known antigenicity. Accordingly, set 1 clades A, B, C, D, and E are likely to be New Caledonia-like in antigenicity, given that (a) 90% of A/H1N1 viruses from this US epidemic were New Caledonia-like (as characterized by the CDC surveillance [30]) and set 1 clades were most prevalent, 10-14 amino acids, 3-5 in antigenic sites for set 2 clades ( Figure 5, Table 3). It is possible that clades C and D represent additional antigenic variants of New Caledonia-like viruses, given the higher number of amino acid changes in antigenic sites (2) also observed in these viruses ( Figure 5). However, given the uncertainties involved in inferring antigenic properties from genetic data alone, our antigenic assignments should not be considered definitive. In contrast, set 2 clades F, G, and H appear to be related to two emerging antigenic variants. Clade H may be antigenically similar to the A/Solomon Islands/3/2006 vaccine strain selected for 2007-2008, based on their close phylogenetic relationship ( Figure 1) and the low number of amino acid differences in antigenic sites (1 site, Table 3). Clade F is more phylogenetically related to the A/Brisbane/59/2007 2008-2009 vaccine strain, and there are no differences at antigenic sites in these viruses. Clades F and G differ by nine amino acids in the HA, but only one difference occurs at an antigenic site, suggesting that, although phylogenetically distinct, clade G may also be A/Brisbane/59/ 2007-like in antigenicity.
Also of note was the observation that of the 284 A/H1N1 influenza viruses sequenced in this study, only one isolate-A/ Colorado/UR06-0053/2007-the sole member of clade G, contained the S31N amino acid replacement in the M2 protein that is associated with resistance to the adamantane class of antivirals (Table 4) [35].

Clade F was generated by intra-subtype reassortment between antigenic variants
Although clade F was classified as a member of clade set 2 due to the phylogenetic relatedness of its HA gene segment to clades G and H, this clade in fact appears to be set 1-set 2 reassortant. Specifically, on trees inferred for the PB2, PA, HA, and NA segments, clade F isolates are related to Solomon Islands-like set 2 clades G and H (as exemplified by the phylogeny of the HA gene segment, Figure 1). However, clade F is more closely related (with high bootstrap support) to the New Caledonia-like set 1 clades of A, B, C, and D in segments PB1, NP, M, and NS (as exemplified by phylogeny of PB1 gene segment, Figure 2; see Figures S1, S2, S3, S4, S5, and S6 for phylogenies of other segments). As half of the genome (PB1, NP, M, and NS) of these reassortant viruses was acquired from set 1-like viruses that began circulating in 2005, this reassortment event most likely occurred between 2005-2006.  Figure 6) [30]. This A/Wisconsin/ 67/2005-like clade first emerged in 2005 and represented a class of viruses that were adamantane-resistant due to the S31N mutation in M2; it was termed the 'N-lineage' in previous work [36]. This N-lineage is closely related to some 2003 isolates (previously termed 'clade B' [36]) in 4 of the 8 segment phylogenies (PB1, PA, NP, and M) (Figures 7,8,9, Figure S8), confirming that a 4+4 reassortment event was responsible for the genesis of the N-lineage [36]. As with the N-lineage, all isolates in clade a contained the S31N mutation in M2 that confers adamantane resistance ( Figure 6).
In addition to the major clade a, a minor clade of five A/H3N2 viruses, denoted clade b, also circulated during the 2005-2006 season ( Figure 6). Although clades a and b both descend from the adamantane resistant N-lineage, every isolate in clade b contains the adamantane-sensitive serine (S) at position 31 of the M2, indicating that a reversion has occurred. In addition, clades a and b may vary antigenically, as they differ in numerous amino acids in HA, five of which occur in antigenic sites A, B, and C (amino acid sites 50, 140, 142, 157, 173) and one-site 142-in the HA1 domain that was previously identified as undergoing positive selection [37]. Four singleton A/H3N2 viruses (labeled s1, s2, s3, and s4) also circulated during this season ( Figure 6). Isolates s1, s2, and s3 are members of the older N-lineage and possess the associated adamantane-resistance S31N mutation. In contrast, isolate s4 is adamantane sensitive and clusters with other adamantane sensitive isolates, including clade b. The HA of isolate s4 differs from that of the major clade a by 12 amino acids, 8 of which occur at antigenic sites and 2 at previously identified positively selected sites (amino acid sites 193 and 275) [37] (Table 5). Similarly, the HA of s4 differs from clade b by 12 amino acids, 7 of which occur at antigenic sites and 2 at positively selected sites (142 and 193). In contrast, the HA of isolates s1, s2, and s3 differs from clade a by only 6, 2, and 4 amino acids in 3, 1, and 2 antigenic sites, respectively. In sum, as many as four antigenic variants of A/ H3N2 influenza virus may have co-circulated this season (although this will be to be confirmed experimentally), each of which is likely to represent a separate introduction event: A/Wisconsin/67/ 2005-like (isolates s1, s2, and s3), A/Brisbane/10/2007-like (major clade a), clade b, and isolate s4 (Table 4).

Multiple reassortment events involving A/H3N2 influenza viruses from the 2006-2007 US epidemic
Major topological differences between the eight phylogenies of the A/H3N2 virus genome strongly suggest that several reassortment events took place involving multiple clades from the 2006-2007 US epidemic. Whereas the adamantane-resistant clade a and the sensitive clade b both appear to derive from the N-lineage on the trees for the PB2, PA, HA, NA and NS segments, clade b instead derives from the adamantane sensitive clades from 2004-2005 on the M and PB1 trees (Figures 7,8). This major phylogenetic incongruity strongly suggests that clade b viruses reacquired sensitivity to adamantane by acquiring an older adamantane-sensitive M segment (with a serine at site 31 of the M2 gene) through reassortment. The NP segment also has undergone a major reassortment event, as on the NP phylogeny both clades a and b descend from adamantane sensitive clades, rather than from the N-lineage ( Figure 9). The varying phylogenetic positions of the s4 isolate across the genome also suggest that this singleton virus resulted from multi-segment reassortment (Figures 6-9, Figures S7, S8, S9, S10). The s4 isolate is closely related to clade b on phylogenies of the PB2, PB1, NP, M, and NS segments (having reassorted along with clade b in the Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15

Spatial dynamics of A/H3N2 influenza viruses
No clear signal of the geographical spread of A/H3N2 influenza viruses could be detected due to our small sample size. All clades were geographically widespread and a secondary parsimony character mapping analysis again revealed strong population subdivision and weak migration (results not shown; available from authors upon request). The major clade a was present in all thirteen localities in which A/H3N2 viruses were collected, and the five isolates contained in minor clade b were geographically dispersed across both urban and remote areas spanning four of five US regions: Los Angeles, California; Chicago, Illinois; Hopkinsville, Kentucky; Madison, Alabama; New York City, New York; and Houston, Texas (Table 2). New York City exhibited the most A/H3N2 diversity, as major clade a, minor clade b, and singleton viruses s3 and s4 all were detected, which is remarkable given that only six total A/H3N2 isolates were collected from this locality ( Table 2). In some cases, multiple clades of both A/H3N2 and A/ H1N1 viruses co-circulated over restricted spatial-temporal scales.
As a case in point, at least two A/H1N1 clades and two A/H3N2 clades circulated during week 12 in Chicago, Illinois, week 13 in Houston, Texas, and week 14 in Los Angeles, California (Table 2). Considering both A/H1N1 and A/H3N2 isolates together, large amounts of genetic diversity circulated in both urban and remote areas of the US: a total of 8 clades of influenza A virus circulated in Houston, Texas during the epidemic, 7 in New York City, New York,6 in Los Angeles, California, and 5 clades each in Denver, Colorado, Cincinnati, Ohio, and Hopkinsville, Kentucky (Table 2).

Discussion
This study utilized whole-genome sequence data from a surveillance initiative of unprecedented scope and scale that sampled both A/H1N1 and A/H3N2 influenza viruses across the US over the course of a single season through the Influenza Genomics Sequencing Project [38]. Rather than a single viral lineage spreading across the US, multiple lineages of both A/ H3N2 and A/H1N1 influenza virus were separately introduced and co-circulated, allowing for reassortment within subtypes and greatly complicating patterns of spatial-temporal spread. Given the extent of genetic diversity observed during this season, obtaining a strong signal for the spatial-temporal pattern of spread of multiple different lineages clearly would entail a large increase in sampling.  Table 2. Clades of A/H1N1 (A-H, Figure 1) and A/H3N2 (a, b, s1, s2, s3, s4) influenza virus that co-circulate among multiple localities by epidemic week.

Epidemic
Week Localities listed by longitudinal ordinates (uW) from west to east. The total number of clades that co-circulate in a given locality over the entire time period listed in the far right column.
Week 1 [39]. Given that the antigenic evolution of A/H1N1 influenza virus is thought to be slower than the A/H3N2 virus, as reflected by eight consecutive years of dominance by A/New Caledonia/20/1999-like viruses, the rapid emergence of two new antigenic variants of A/H1N1 virus in a single year was particularly notable ( [30], for example).
The extensive genetic diversity present in both A/H1N1 and A/ H3N2 viruses suggests that multiple introductions of virus have taken place during the 2006-2007 season, particularly as our method of collecting viruses clearly underrepresented areas that are major ports of international travel. As a case in point, further sampling in the Los Angeles and New York City regions, where our study still detected significant diversity even at very low sampling levels, would likely augment the total number of viral lineages detected, including those imported from South-East Asia. By sampling in both metropolitan and relatively isolated areas, our study yielded important information on the geographic distribution of viral genetic variation: namely, that extensive viral diversity, including multiple antigenically distinguishable lineages, disseminated widely across the entire United States during the epidemic, even into relatively remote areas, so that it was not confined to the major cities where the virus is thought to enter. As a particular case in point, even relatively low-density areas or those distant from major metropolitan areas, such as Hopkinsville, Kentucky (population size ,30,000), harbor significant amounts of both genetic and antigenic diversity, suggesting that influenza viruses of multiple antigenic (and other phenotypic) types extensively infiltrate the United States over the course of a single season. However, it is important to note that our analyses cannot exclude that a single co-infected individual could have introduced multiple clades of influenza virus into the United States, as the frequency of co-infection among patients in this study is unknown and represents a key area for further research. Importantly, it is also possible that the 2006-2007 US epidemic was particularly difficult to reconstruct due to the unusual complexity of its evolutionary dynamics, which likely relates to the incomplete dominance of either the A/H1N1 or A/H3N2 subtype. The dynamics of influenza virus epidemics vary greatly on an annual basis, and influenza epidemics that are dominated by the A/H3N2 virus have been associated with higher disease transmission and more rapid spread than milder A/H1N1dominated seasons, as well as stronger synchrony in timing across the United States [9]. Hence, epidemics that are dominated by a single A/H3N2 clade (such as the 2004-2005 season [18]) may exhibit stronger signals of spatial spread, and repeating this sampling effort during an A/H3N2-dominated influenza season potentially could yield a stronger spatial pattern. A sampling scheme that minimizes geographical biases and maximizes the number of samples collected early in the epidemic also could increase the likelihood of obtaining a stronger spatial signal.
Additional sequencing of influenza viruses in areas outside the United States is also essential to understand the global context of the diversity that enters the US during a given epidemic. From this, and previous studies [18,21], it is clear that influenza A virus is introduced into the United States multiple times during an epidemic. However, the availability of global sequences, particularly at the genomic scale, is currently inadequate to draw any conclusions about the geographic origins of each viral introduction. It has been suggested that US epidemics originate more frequently in California than other states, due to high interconnectivity with Asia and Australia [9], but further wholegenome sequencing of viruses from Asia is clearly needed to test this hypothesis. Although the tendency of US epidemics to originate in the relatively warm state of California suggests that human movements are more important than climatic factors in the seasonal onset of influenza virus epidemics, further documentation of the complex spatial-temporal dissemination of the virus over an  epidemic is required to elucidate the seasonality of influenza. Additionally, the extent of viral and antigenic diversity and the frequent circulation of minor clades that is detected by intensified surveillance efforts, such as the present study, suggest that much more diversity circulates at a global scale than is identified by routine surveillance. In particular, early detection of minor clades, particularly in the source populations of East and South-East Asia [21], could improve recognition of emerging lineages and prediction of future dominant strains for vaccine design. Indeed, the antigenically variant A/Brisbane/59/2007(H1N1)-like reassortant clade F detected in this study may not have been picked up by routine global surveillance until later, as no other publicly available global isolates from 2006 were found within this clade. Our findings also suggest that the genetic diversity of the A/ H3N2 virus is substantial even when A/H3N2 is not the dominant subtype, as was the case for most of the 2006-2007 epidemic. A major clade (a), a minor clade (b), a reassortant singleton (s4), and three singletons (s1, s2, s3) that appear to be descendents of the Nlineage [36] all co-circulated during this epidemic. All these clades differed in numerous amino acids in the HA, including those in antigenic and positively selected sites [37]. Both clades a and b, as well as the s4 singleton, were involved in at least three separate reassortment events: (a) clade b and singleton s4 (PB1 and M segments), (b) clades a and b (NP segment), and (c) singleton s4 only (PA, HA, and NA). As a caveat, because our study does not involve plaque-purified viruses, it is theoretically possible that the amplification of segments from different viruses co-infecting a single patient could produce a false signal for reassortment, particularly for those putative reassortment events that involve a single virus (for example, the s4 singleton). However, even with this potential source of bias, the frequency of definitive reassortment events among A/H3N2 clades is striking, especially compared to the single reassortment event observed among the A/H1N1 viruses that dominated this season. This most likely reflects the usually lower prevalence of A/H1N1, which in turn means a reduced likelihood of mixed infection and hence reassortment. In addition, given the importance of other geographical regions, particularly South-East Asia, in the evolution of the influenza A virus [21], as well as the fact that A/H3N2 was the dominant subtype in Canada and Europe during this season [33], the A/ H3N2 virus likely circulated at higher levels outside the US, providing greater opportunity for reassortment. Of further interest is why inter-subtype reassortment between A/H1N1 and A/H3N2 viruses is not observed more commonly, despite the apparent cocirculation of both subtypes over both time and space (Table 2). In this case, it is possible that a virus produced by inter-subtype reassortment has a lower fitness, because the greater genetic distance between the A/H1N1 and A/H3N2 subtypes means that reassortment events are more likely to disrupt essential functional interactions among segments. Finally, the existence of the adamantane-sensitive clade b (A/ H3N2) during this epidemic was surprising, given that global resistance to adamantanes among influenza A/H3N2 viruses has increased dramatically in recent years, with more than 95% of A/ H3N2 influenza viruses classified as resistant in the previous 2005-2006 season in the US [32]. Even more striking was that most of the genome of clade b isolates was more closely related to the adamantane-resistant clade a than to older adamantane-sensitive clades, indicating that this clade did not evolve directly from adamantane-sensitive viruses as may have been presumed. Rather, clade b viruses re-acquired sensitivity to adamantane by acquiring two segments (PB1 and M) from older adamantane-sensitive viruses through reassortment. This finding supports prior conclu-sions that sensitivity and resistance to adamantane can be acquired through genomic reassortment, rather than by direct selection on the M2 gene for drug resistant mutations [36].

Phylogenetic analysis of influenza A viruses from the 2006-2007 US epidemic
All viruses were collected as part of a larger 2006-2007 US surveillance effort conducted by Surveillance Data Inc., in which a total of 610 influenza virus specimens of both type A and type B were obtained from nasal and nasopharyngeal swabs from patients seen with influenza-like illness. At the time of this study, 353 type A influenza virus genomes had been sequenced and were available for study. Fifty-six participating physicians, primarily located at family practices, were recruited from 21 states that were geographically among-site rate variation with four rate categories (C 4 ) estimated from the empirical data (parameter values available upon request). In all cases tree bisection-reconnection (TBR) branch-swapping was utilized to determine the globally optimal tree. To assess the robustness of each node on the phylogenetic tree, a bootstrap resampling process (1,000 replications) using the neighbor-joining (NJ) method was used, incorporating the ML substitution model. Clades of related isolates were identified by high bootstrap values (.70) and exceptionally long branch lengths.

Amino acid comparisons between clades of A/H1N1 and A/H3N2 influenza viruses
The parsimony-based MacClade program [44] was used to determine those amino acid changes in both the HA and NA gene segments (Table S9) that occurred between each of the eight clades of A/H1N1 virus from the US, as well as global background viruses from 2001-2005 and A/H1N1 vaccine strains. Changes were also identified in potential glycosylation sites, antigenic regions (Sa, Sb, Cb, Ca 1 , Ca 2 ) [45], and the receptor-binding site [46]. The MacClade program also was employed to identify amino acid changes between clades of A/H3N2 and influenza virus vaccine strains, including those in antigenic sites and at eighteen sites previously identified as undergoing positive selection [37].  Table 5. Amino acids at variable sites of the HA gene segment of A/H3N2 influenza viruses from the 'N-lineage', clade a, clade b, and singleton isolates s1, s2, s3, and s4 ( Figure 6), with differing amino acids in bold.

Supporting Information
Antigenic sites are specified A-E (listed in parentheses) and sites that have been identified as undergoing positive selection denoted by an asterisk ( * ) [37].  Figures 1-5, and S1, S2, S3, S4, S5, S6. GenBank accession number, isolate name, subset membership, clade membership, date of collection, week of collection, age and sex of patient from whom isolate was collected, and county in which isolate was assembled for 284 A/H1N1 influenza A viruses collected from December 6, 2006-March 13, 2007 from 19 U.S. states. Subset 1 refers to those isolates included in the 100-isolate subset sampled from all clades; subset 2 refers to those isolates included in the 100-isolate subset sampled from the major clade. Week 1 denotes the first week that isolates in this study were sampled (week of December 2, 2006). GenBank accession numbers from the Influenza Virus Resource refer to the PB2 gene segment (http://www.ncbi.nlm.nih.gov/genomes/ FLU/FLU.html). Clade membership corresponds to the HA phylogeny ( Figure 1). Found at: doi:10.1371/journal.ppat.1000133.s012 (0.49 MB DOC)  Figure S4. GenBank accession numbers and collection dates for the NA gene of 17 A/ H1N1 influenza viruses sampled globally from 2006, including the A/H1N1 components of the influenza vaccines for 2006-2007(A/ New Caledonia/20/1999(A/Solomon Islands/ 3/2006 Figures  S7, S8, S9, S10. GenBank accession number, isolate name, subset membership, clade membership, date of collection, week of collection, age and sex of patient from whom isolate was collected, and county in which isolate were assembled for 69 A/H3N2 influenza A viruses collected from December 6, 2006-March 14, 2007 Week 1 denotes the first week that isolates in this study were sampled (week of December 2, 2006