Static and Evolving Norovirus Genotypes: Implications for Epidemiology and Immunity

Noroviruses are major pathogens associated with acute gastroenteritis worldwide. Their RNA genomes are diverse, with two major genogroups (GI and GII) comprised of at least 28 genotypes associated with human disease. To elucidate mechanisms underlying norovirus diversity and evolution, we used a large-scale genomics approach to analyze human norovirus sequences. Comparison of over 2000 nearly full-length ORF2 sequences representing most of the known GI and GII genotypes infecting humans showed a limited number (≤5) of distinct intra-genotypic variants within each genotype, with the exception of GII.4. The non-GII.4 genotypes were comprised of one or more intra-genotypic variants, with each variant containing strains that differed by only a few residues over several decades (remaining “static”) and that have co-circulated with no clear epidemiologic pattern. In contrast, the GII.4 genotype presented the largest number of variants (>10) that have evolved over time with a clear pattern of periodic variant replacement. To expand our understanding of these two patterns of diversification (“static” versus “evolving”), we analyzed using NGS the nearly full-length norovirus genome in healthy individuals infected with GII.4, GII.6 or GII.17 viruses in different outbreak settings. The GII.4 viruses accumulated mutations rapidly within and between hosts, while the GII.6 and GII.17 viruses remained relatively stable, consistent with their diversification patterns. Further analysis of genetic relationships and natural history patterns identified groupings of certain genotypes into larger related clusters designated here as “immunotypes”. We propose that “immunotypes” and their evolutionary patterns influence the prevalence of a particular norovirus genotype in the human population.

Introduction strain emerged (New_Orleans_2009) that co-circulated with the 2006 variants for almost three years until the current predominant variant emerged in 2012 (Sydney_2012) [25,31,33]. Interestingly, this epidemiological pattern has not been reported for any other norovirus genotype, until recently. A novel GII.17 variant has emerged, potentially displacing an older GII.17, causing large outbreaks in different countries from Asia [22,[34][35][36]. Although GII.4 is overall the most prevalent genotype in the human population, multiple norovirus genotypes co-circulate in children with low to high incidence. Genotypes GII.3, GII.6 and GII.2 (in addition to GII. 4) have consistently been linked to infection in children under 5 years of age [17,21,37,38].
Initial challenge studies in human volunteers suggested a lack of protective responses between strains from the two major genogroups (GI and GII), as cross-challenge between Norwalk virus (the GI.1 prototype strain) and Hawaii virus (the GII.1 prototype strain) did not induce protection. In addition, duration of immunity might be short (less than 6 months), as individuals re-challenged with the same virus became ill during the second exposure [39]. It has been noted that the high titer of challenge virus administered in the early volunteer studies might not reflect that during natural exposure [40,41] and recent studies have focused on the natural history of noroviruses. Based on epidemiological data, Simmons et al. modeled that norovirus genotype-specific immunity could last up to 9 years [42], which would enhance the duration of vaccine-induced immunity. The diversity of genotypes has also been addressed. Children can be re-infected multiple times during the first 5 years of life [17,18,38] , with the majority of re-infections occurring with different norovirus genotypes. These data suggest that genotypes may represent distinct serotypes, which would complicate vaccine design.
In this study, we integrated large-scale genomics analysis with natural history data to investigate mechanisms involved in the diversification and evolution of norovirus genotypes and their variants. Most norovirus intra-genotypic variants displayed a striking genetic stability over long periods of time, with GII.4 as the notable exception. We detected patterns of reinfection and susceptibility consistent with genetic and antigenic clustering of certain genotypes and propose that these relationships may be relevant in the design of norovirus vaccines.

Intra-genotypic diversity
To investigate the diversity and evolutionary differences of the distinct norovirus genotypes, more than 2000 sequences of the gene (ORF2) encoding the VP1 were retrieved from Gen-Bank, with 101 and 1909 genes from GI strains and GII strains, respectively (Table 1). Individual sequences were vetted with an online norovirus typing tool that follows a widely-used universal classification and nomenclature system for norovirus genotypes [25], and an effort was made to verify the date of occurrence with the supporting documentation. Genotypes with 10 or more complete (or nearly complete) ORF2 sequences were selected for further phylogenetic analysis at both the nt and amino acid (aa) level, and included 16 out of the 31 current GI and GII genotypes (Table 1). We defined an intra-genotypic variant as a group of strains (! 2) that clustered together in the phylogenetic tree and that showed <5% difference in their nt or aa sequences, but !5% difference compared to other strains. Most genotypes segregated into 1 to 5 phylogenetic variants when nt sequences were analyzed (Table 1), with the exception of genotype GII.4 that displayed at least 10 different variants (Table 1 and Fig 1A). The number of variants was lower in seven genotypes (GI.1, GI.4, GI.6, GII.2, GII.12, GII.13 and GII.14) when aa sequences were used for tree reconstruction and distance analyses (Table 1). To link the different intra-genotypic variants to phenotypic characteristics in the VP1, we focused on the analysis of aa sequences. Interestingly, non-GII.4 genotypes presented variants with strains that have been detected many years apart (mean: 24.9 years, standard deviation: 12.97 years) while having only a few differences in their aa sequence (Table 1). An example is the GII.6 genotype, with each of the three variants (A-C) containing strains that differed by only a few aa residues ( 1.2%) but that were detected up to 41 years apart (Fig 2A). In contrast, the GII.4 genotype was comprised of variants that were present in the human population from 3 to 8 years (mean: 5.3 years, Fig 1A). We developed an algorithm to illustrate visually the relationship between amino acid diversity and time among strains from a given genotype. The algorithm generated a heat map in which each square represents the number of strains with such aa difference in their VP1 and plotted against the timespan of detection. Analysis of 16 genotypes with sufficient data in GenBank revealed two distinct patterns of variant evolution; one in which the number of aa differences accumulated continually over time (Fig 1B), and one in which the number of aa differences remained relatively constant over time, regardless of the timespan between strains (Fig 2B). The first pattern (Fig 1B) was related to a constantly changing "evolving genotype" represented by the GII.4, while the second pattern ( Fig 2B) was a highly conserved or "static genotype" represented by the 15 other genotypes with sufficient data for analysis ( Table 1). The static genotypes resolved readily into distinct intra-genotypic variants, with one exception. The diversity plot of the GII.12 genotype strains displayed a subtle accumulation of differences over a time period of approximately 20 years; however, those differences were not associated with different intra-genotypic variants (S1 Fig). A larger number of sequences over a longer period will be helpful for defining variant diversity in GII.12 and other static genotypes.
During 2014-2015, a sharp increase in the number of gastroenteritis outbreaks was reported in Asia [22,34,43] that were associated with the emergence of a new variant of the genotype GII.17, i.e. variant C or Kawasaki_2014 [35,43]. Our phylogenetic analysis of this genotype revealed four distinct variants, with one of the variants having strains that spanned over 37 years with a low level of sequence diversity consistent with a static pattern (S2 Fig). In contrast, the emerging strains showed multiple substitutions at the nt and aa sequence level [22,35] [44], which led to diversification into two separate phylogenetic variants (C and D). These differences appeared to accumulate over time, with predominant strains that circulated during the 2014-2015 season (variant D) differing by 5.4±1.1% from those in the 2013-2014 season (variant C) in their aa sequence.
To gain insight into the evolutionary differences noted at the aa level, the nt rate of evolution and nonsynonymous substitutions (dN)/synonymous substitutions (dS) ratio were calculated for each of the genotypes included in this analysis. The nt rates of evolution were similar among all the norovirus genotypes (range: 5.40x10 -3 -2.23x10 -4 nt substitutions/site/year). However, differences were seen in the dN/dS ratios, with genotypes GII.4 and GII.17 presenting slightly higher values than any other norovirus genotype ( Table 1). Note that in five different genotypes (GI.3, GI.4, GII.4, GII.13, and GII.17), the dN/dS ratio in the P2 encoding region was at least two times higher than the complete ORF2 (encoding VP1) dN/dS ratio. The dN/dS ratio has been used to inform the evolutionary pressures on a gene: a dN/dS > 1 (higher number of non-synonymous mutations) indicates positive evolution as the phenotype is changing due to pressures of the environment (e.g. immune responses), while a dN/dS < 1 indicates a purifying selection (also known as negative selection), where the new phenotype is mostly deleterious and eliminated from the population [45][46][47]. Despite the small differences in the dN/dS ratios, purifying selection (dN/dS < 1) is strongly acting at the VP1 protein level in all norovirus genotypes, including GII.4.

Intra-and inter-host evolution of norovirus genotypes
We next examined whether the different patterns of diversification observed in ORF2 extended to other regions of the genome. We analyzed by NGS full-length norovirus genomes that a higher number of strains accumulate at~5% aa difference (25/539) and 5 years of detection difference, which correlates with the timespan where a novel variant displaces the old one. Heat map represents the number of pairwise comparisons, red being the highest and green the lowest number of pairwise comparisons.  Evolutionary Patterns of Norovirus Genotypes from immunocompetent individuals infected in different settings. The first set of samples was from a child who was consecutively infected with three different genotypes (GII.4, GII.6 and GII.17) over a 3-year period [18,35]. Although in each episode the child resolved the symptoms within~72 hours, viral RNA was detected in stools for weeks after onset of symptoms. The full-length genome sequences were compared within the first 3 weeks for GII.4 and GII.6 viruses and the first 2 weeks for GII.17 viruses. A total of up to 78,000 reads/site (mean: 14688, standard deviation: 5675) were obtained for each sample by NGS (S1 Table). All consensus sequences were identical to the reference (day 1 [d1]); however, mutations ranging from 5 to 50% of the total reads (S1 Table and Fig 3) were found at later time points for each virus. In GII.4 viruses, 12 nt mutations arose in the subpopulations of the sample collected at d14, with nine of them being non-synonymous mutations. Three aa mutations were located in the P domain, the capsid surface-exposed region of the VP1 protein, with two present in antigenic sites A (E368A) and E (S412N) [48] Table).
Although a large amount of norovirus is shed in stool, the infectious dose in natural transmission is likely low [41]. Thus, during inter-host transmission events noroviruses may undergo an initial reduction in the number of replicating viruses, creating a bottleneck effect. To compare inter-host evolution in individuals involved in outbreaks, we analyzed samples from outbreaks that occurred in the state of Maryland where the causative agents were identified as GII.6 (a hospital outbreak in 1971) [49] or GII.4 noroviruses (nursing home outbreaks in the 1987-1988 winter season) [32]. The samples and their dates of collection are indicated in Fig 4. Comparison of the NGS sequences with the outbreak consensus sequence revealed only a few substitutions, 5 nt and 2 aa, among samples from the same outbreak ( Fig 4A  and Fig 4B). However, when the consensus sequences from the GII.4 outbreaks were compared, a progressive accumulation of mutations (up to 86 nt and 16 aa) were detected in a period of three months, with no aa substitutions detected in the ORF2 (Fig 4B). To confirm these observations, we compared 151 genomes from GII.4 viruses (variant Den Haag) detected during three epidemic seasons in Japan [50,51], and showed the accumulation of nt and aa substitutions over time (S4 Fig). In both sets of samples, the ORF2 (encoding VP1) acquired fewer amino acid substitutions as compared with ORF1 and ORF3, and thus maintained the VP1 phenotype for the GII.4 variant circulating in that given season. The data from full-length genome analyses were consistent with those from analyses of ORF2 in the GenBank database: different patterns of evolution exist among the norovirus genotypes in an acute outbreak setting.

Norovirus re-infection and genotype clustering
To reconcile our observations on the different mechanisms of diversification and data on reinfection and epidemiology of noroviruses, we investigated whether additional relationships might exist among the genotypes from the two major genogroups. Our phylogenetic tree constructed with representative strains from each genotype (strains from each lineage described here were included) showed clustering among certain genotypes (e.g. GI.3, GI.7, GI.8 and GI.9), while others appeared as single genotypes (e.g. GI.1, GII.3, GII.6; Fig 5A, S3 Table). The genotypic clustering was reproducible with a second phylogenetic methodology (S5 Fig). We designated each of the separate branches as groups A-L (Fig 5A), and the deduced aa sequences showed an approximate cut-off value of !20% aa differences between groups (Fig 5B). In a review of data from research groups that have documented norovirus re-infections and determined the genotype for each infection [17,18,38,[52][53][54], we observed that the pattern of reinfection might be consistent with the new grouping system as a predictor of antigenically- clusters containing two or more genotypes and 5 comprised of a single genotype, which define the 12 larger order clusters designated as immunotypes (A-L) for this study. The phylogenetic tree was constructed using three representative strains from each variant described for each genotype except for GII.4, where only one strain from each of the variants was used. The distinct strains. To test the hypothesis that these groups, provisionally designated here as "immunotypes," might play a role in norovirus immunity, we developed a matrix that recorded the data from each of the consecutive re-infection cases with documented norovirus genotyping [17,18,22,38,[52][53][54]. For example, a child consecutively infected with a GII.4 ("immunotype" G), GII.6 ("immunotype" H), and GII.17 ("immunotype" J) norovirus would count as one individual for the cell of the matrix that compares immunotype G and H, and as one individual for the cell that compares immunotype H and J. A matrix was constructed using re-infection data available from 116 children and 2 adults (Fig 5C, S6 Fig). Overall, the majority of re-infections occurred with viruses from different immunotypes, with re-infection rare from strains within an immunotype. A notable exception was immunotype G, which is comprised of genotypes GII.4 and GII.20. Re-infection of eight children (as shown by the 8 individuals in the black cell) was documented to have occurred with different variants of GII.4 viruses [17,38].

Discussion
Viruses are genetically and structurally diverse. Depending on their genome and/or replication strategies, viruses can present different rates of evolution (range: 10 −2 -10 −9 nt substitutions/ site/year) [3,47,55]. As with many other RNA viruses, noroviruses have been regarded as rapidly evolving viruses [22,48,56]. The overall rate of evolution for the norovirus genotypes included in this study ranged from 5.40x10 -3 -2.23x10 -4 nt substitutions/site/year for the VP1 encoding region, which were similar to those described previously for norovirus GII.4, GII.3, GII.6, and GI.1-GI.6 [21,[57][58][59][60][61], and within the range for positive-strand RNA viruses [3]. Despite this high nt mutation rate, the number of non-synonymous substitutions were on average~18 times lower than the synonymous substitution (dN/dS average: 0.06), suggesting that purifying selection (dN/dS <1) acts strongly in the VP1 protein. Similar observations have been made for other RNA viruses, where the rate of evolution reached up to 10 −2 nt substitutions/site/year (depending on the region of the genome used for analyses) but was mostly dominated by high synonymous substitution rates [46,55,62]. In noroviruses, positive selection has been reported for certain codons of the VP1 for GII.4, GII.3, GII.6 and GII.17 viruses [21,22,57,58,63], and codon changes in the antigenic sites of GII.4 viruses (which are located in loops of the P2 domain) have correlated with the emergence of new variants [24,27,48]. Taken together, our findings suggest that the capsid protein of all noroviruses evolve with strong structural constraints, with only a limited number of codons that can evolve and, perhaps confer adaptive advantages to infect human hosts.
Epidemiological studies coupled with sequence data from field isolates have indicated that the most predominant norovirus genotype, GII.4, is evolving similarly to influenza H3N2 viruses; i.e. with a temporal replacement of predominant variants that is driven by the immune response of the host [27,48,64]. By exploring the intra-genotypic diversity from representative human norovirus genotypes we verified that GII.4 noroviruses produce the largest number of intra-genotypic variants, and that these variants last (on average)~5 years in the human population. In contrast, non-GII.4 noroviruses sustain a low number of intra-genotypic variants evolutionary distances of the amino acid sequences were computed using the Poisson correction method and the tree inferred using the Neighbor-Joining method and bootstrap test (1000 replicates). (b) Amino acid (aa) differences within (intra-immunotype) and between (inter-immunotype) the proposed immunotypes. The bars indicate the average of the aa differences and standard error. The red dotted line indicates the cut-off value (!20%) of aa differences between groups. (c) Matrix showing the frequency of re-infection and the genotypes detected. Data was obtained from studies that followed the natural history of norovirus infection [17,18,22,38,[52][53][54]. Each genotype is grouped within their respective immunotype. Re-infection with strains from the same immunotype are indicated by black cells. Note that the eight cases of re-infection within the immunotype G represent re-infections with different GII.4 variants. doi:10.1371/journal.ppat.1006136.g005 Evolutionary Patterns of Norovirus Genotypes with a limited number of aa differences among strains within that given variant; even if decades apart in occurrence. Interestingly, different variants from a given genotype can often be co-circulating within the same year and geographical location causing gastroenteritis [37,44,65,66]. The GII.4 viruses, and to a lesser extent one variant of the GII.17 viruses, acquired aa substitutions over time that created phenotypically different variants. In contrast, all other genotypes retained similar sequences within variants that might have arisen early in the origin of that genotype and that persisted over time. This led us to discriminate two different patterns of evolution in norovirus: evolving and static. Evolving viruses continually accumulate mutations in their genome over time, and static viruses do not.
The concept of evolving versus static norovirus genotypes may be helpful in understanding the spread of pandemic strains. The recent emergence of GII.17 viruses resulted in the rapid replacement of one variant (variant C) with another (variant D) [22,44]. This pattern of very rapid replacement, occurring within two consecutive seasons, in the emerging GII.17 viruses is notably different from that of GII.4 viruses, in which each emerging GII.4 variant is replaced every 3 to 8 years. Thus, since the GII.17 genotype presents other variants shown to be "static," the recent global spread of the GII.17 genotype might be the moment when a new genotypic variant (variant C) emerged and is quickly adapting to reach maximum fitness in the human host (variant D) to become static. Since the emergence of this GII.17 strain has only recently occurred and most of the available GII.17 sequences (136/143) correspond to these two variants, more information on pre-2013 strains and the future epidemiological behavior of the GII.17 strains will be helpful in establishing the evolutionary pattern of this genotype. Because recombination has been suggested to play an important role on the emergence of many GII.4 variants [67], and the emerging GII.17 strains presented a novel polymerase (encoded by ORF1) [22,34,35,44], further studies should be conducted on the role of recombination in norovirus VP1 diversification into variants.
To determine the role of intra-host evolution at the genomic level, we developed a method to generate and analyze full-length norovirus genomes with NGS technologies and bioinformatics. The strategy of amplification was similar to that published by Eden et al. [67] for GII.4 viruses, and our method was robust for a number of GII noroviruses (GII.1, GII.2, GII.3, GII.4, GII.6, GII.12, and GII.17), and from samples stored for over 40 years [35]. Several groups have explored the intra-host diversity of noroviruses by NGS using partial regions of the genome [23,68]; however, our approach extended these findings by allowing high-resolution analysis at every nt position in the coding sequence of the genome. We first examined the intra-host evolution of GII.4, GII.6 and GII.17 noroviruses within a single patient, and observed that only the GII.4 viruses presented a gradual increase in the number of mutations, which in some cases resulted in aa substitutions in areas regarded as important antigenic sites. The limited intra-host diversity found during the shedding phase of an infection in immunocompetent individuals contrasts with the vast diversity of viruses found in immunocompromised patients [68]. Due to the diversity found in immunocompromised patients and prolonged shedding, it was suggested that they might be a source of new GII.4 variants to the human population [69]. Noroviruses are highly transmissible; however, there is little evidence that norovirus can be efficiently transmitted during the chronic phase of the infection [19]. A more likely source for new GII.4 variants might be immunocompetent individuals, where we show that mutations can arise during inter-host transmission events, and accumulate during the intra-variant period. Although noroviruses belonging to the "static" genotypes can also accumulate mutations during inter-host transmission events, those mutations would likely be eliminated from the viral population by purifying selection. Viruses that better tolerate the introduction of mutations are regarded as genetically robust, and this robustness has been shown to be beneficial for virus survival and prevalence [70]. Overall our data suggest that GII.4 noroviruses are genetically robust. In contrast, noroviruses with "static" genotypes may be genetically fragile, which limits their antigenic diversity and prevalence.
How do "static" genotypes prevail in the human population, in the face of limited antigenic diversity within the genotype? To address this question, genotypes were grouped together based on phylogenetic clustering and aa differences in their capsid proteins. These groups, or "immunotypes," were applied to the interpretation of epidemiological observations. When examining data from a birth cohort study, or reports where children and adults were followed for years to study norovirus re-infections, genotypes belonging to the same immunotype generally did not re-infect these individuals. Thus, most of these individuals were re-infected with a varying series of genotypes (predominantly containing combinations of GII.4, GII.6, GII.3, GII.17 or GII.2), but all of them belonging to different immunotypes as defined in Fig 5C. The exception to this was the GII.4 strains in immunotype G, in which a few re-infections were observed, albeit with different GII.4 variants. Based on these data, we propose a model for norovirus re-infection in which naïve children are constantly exposed and infected with strains from each of the different immunotypes until a broad immunity develops. In contrast, older individuals (i.e. older children and adults) are more likely to become ill from evolving genotypes, as they have already acquired immunity against a number of static genotypes (Fig 6). This model not only explains the differences in the genotype distribution often seen when comparing children and adult populations [17,37,38], but also suggests that immunity against norovirus may be longer than initially suggested [39,42]. Immunotypes represented by static genotypes can only re-infect individuals naïve to that particular immunotype, while the GII.4 evolving genotype can reinfect individuals by periodically replacing its variants. This model predicts that children are constantly exposed and infected with strains from each of the different immunotypes (until a broad immunity develops), while older individuals are more likely to become ill from evolving genotypes. This model would explain the epidemiological differences reported in the distribution of norovirus genotypes in children and adults [17,21,37,38]. doi:10.1371/journal.ppat.1006136.g006

Evolutionary Patterns of Norovirus Genotypes
For decades understanding of norovirus immunity was based on human volunteer challenge studies and animal models or in vitro surrogates of neutralization tests [27,39,71,72]. Initial cross-challenge studies, conducted in the 1970s using the prototype GI.1 Norwalk virus and GII.1 Hawaii virus, showed a lack of protection between these two genogroups [39]. Further epidemiological data and in vitro assays, such as antibody blockage of carbohydrate binding to VLPs, suggested a role for immunity against the different intra-genotypic variants of GII.4 [27,33,58]. Norovirus vaccines are currently based on the premise to include at least two major antigens for noroviruses representing GI and GII [29,30,71,73,74]. However, recent data indicating that certain genotype-specific immune responses were unable to confer natural protection against disease raised concerns that a prohibitive number of components (almost 30) might be needed in a norovirus vaccine [17,18,35,38]. Although additional studies will be needed to confirm the existence of shared antigenic groups among the norovirus genotypes, preferably by neutralization assays or animal models, our analysis provides a new perspective on the genetic and antigenic diversity of noroviruses that could lead to the identification of cross-protective strains and inform vaccine design.

Ethics statement
Stool specimens from the child were obtained with the written informed consent of the parent, and enrollment in National Institutes of Health (NIH) clinical study NCT01306084. Archival stool samples stored in the Laboratory of Infectious Diseases Calicivirus Repository were waived as exempt from IRB review by the NIH Office of Human Subjects Research and Protection (OHSRP 11833). Epidemiological information relating to the sample collection has been published elsewhere [18,32,35,49].

Data mining and genomics
The full-length (or nearly full-length) ORF2 sequences (encoding for VP1) from each of the 31 genotypes described for GI and GII were retrieved from GenBank (accessed on March 2015) for analyses. Alignments were performed with Clustal W as implemented in MEGA v6 [75]. Sequences from each genotype were aligned separately to minimize the presence of insertions or deletions (indels), which can arise when different genotypes are compared. Phylogenetic trees were constructed using Kimura 2-parameter as method of nt substitution and Neighbor-Joining as algorithm of reconstruction as implemented in MEGA v6 with default settings. Phylogenetic trees that used aa sequences were reconstructed using a Poisson method of aa substitution. Bootstrap analyses were used to support the clustering of the variants. Information on the strains used for the phylogenetic analyses is provided in S4 Table. Evolutionary rates (nt substitutions/ site/year) for each genotype were estimated using the ORF2 sequences and the Bayesian Markov Chain Monte Carlo (MCMC) approach as implemented in the BEAST package [76]. For each set of data the General Time Reversible (GTR) model with gamma rate distribution and invariable sites parameter was used and the MCMC was run for a sufficient number of generations to reach convergence of all parameters. All evolutionary rates were calculated using strict clock model and coalescent constant size tree prior, except for genotypes GI.4, GI.6, GII.14 and GII.16, which reached convergence using Bayesian Skyline and random local clocks. Selection pressures acting in the VP1 sequences were investigated by estimating the mean rate of nonsynonymous substitutions (dN) and synonymous substitutions (dN) and the dN/dS ratio as implemented in MEGA v6. The nearly full-length genome sequences from 151 GII.4 viruses detected in Japan during 2006-2009 [50, 51] were downloaded from GenBank and analyzed using MEGA v6 and Prism software (GraphPad Prism version 7).

Heat map plots of genotypic diversity
To visualize the aa substitutions within each genotype, a Python script (available upon request) was developed to calculate the number of aa differences and the isolation year differences between two individual strains. Isolation years were extracted from strain descriptions. The difference values were added into a matrix where the y-axis represents the isolation year differences and the x-axis the amino acid differences. Note that some cells will present more than one comparison, since strain pairs presenting the same number of aa differences and the same year difference, despite the years detected, will be included in the same cell. Heat map plots were calculated for each genotype using GraphPad Prism version 7 (GraphPad Software, La Jolla California USA), with the values representing the number of strains compared.

Full-length genome amplification
A platform was developed to analyze the plasticity of norovirus genotypes at the full genome level. Briefly, viral RNA was extracted from 10% (w/v) stool suspensions using the MagMax Viral RNA Isolation Kit (Ambion, California, USA) following manufacturer's recommendations. Complementary DNA was synthesized from the viral RNA using the Tx30SXN primer (GACTAGTTCTAGATCGCGAGCGGCCGCCCTTTTTTTTTTTTTTTTTTTTTTTTTT TTTT [77]) at 5μM final concentration, and the Maxima H Minus First Strand cDNA Synthesis Kit (Thermo Fisher Scientific, California, USA) following manufacturer's recommendations except that only 0.1 μL of Enzyme Mix was used per reaction. Amplification of the fulllength genome was performed using 5 μl of the RT reaction, a set of primers that target the conserved regions of the 5'-and 3'-end of GII noroviruses (GII1-35: GTGAATGAAGATGGC GTCTAACGACGCTTCCGCTG, and Tx30SXN), and the SequalPrep Long PCR Kit (Invitrogen, California, USA) following manufacturer's recommendations. Amplicons were excised from an agarose gel and purified with the QIAquick Gel Extraction Kit (Qiagen, California, USA).

Next-generation sequencing
Ion Torrent libraries were prepared by using 300-500 ng of full-length genome PCR amplicons following standard Ion Torrent library prep protocol. DNA was fragmented followed by the introduction of ligation barcode adapters. Adapted-ligated libraries were amplified using 13 PCR cycles, and size selected from agarose gels. Final libraries were quantified by Qubit (Invitrogen, California, USA), Bioanalyzer (Agilent), and qPCR. Libraries were normalized to 1nM, pooled at an equal molar ratio, and loaded onto a 318 v2 Chip in an Ion OneTouch2 machine. The sample from Ion OneTouch2 was transferred to an Ion OneTouch ES and then to an Ion PGM for sequencing with a 400bp kit (Life Technologies, California, USA). Ion Torrent sequence reads were de-multiplexed, and each individual set of reads was aligned to reference sequences using Bowtie2 and SAMtools [78,79]. Aligned reads were visualized in the Integrative Genomics Viewer (IGV) [80] for single nt polymorphisms (SNPs) identification. Consensus sequence for each full-length genome was calculated using IGV. Read coverage (reads/nt position) was calculated using the genomecov command from BEDTools [81]. Sequence analyses were performed using MEGA v6 and Sequencher 5.4 (Gene Codes Corporation, Michigan, USA). The consensus sequence was calculated using default settings in Sequencher v5.4, and genomic sequences determined in this study were deposited into Gen-Bank under Accession numbers KY424328 through KY424350. All other relevant data are within the paper and its Supporting Information files.
Supporting Information S1 Fig. Diversity in GII.12 viruses. (a) Phylogenetic trees showing the relationship of the different strains from the GII.12 genotype. Three variants (Clusters A-C) can be discriminated when nucleotides were used for tree reconstruction. No discrete variant was detected when amino acids were used. Trees were constructed using sequences encoding the VP1 and Neighbor-Joining method as implemented in MEGA v6. (b) Diversity plot showing the accumulation of amino acid mutations over time in the VP1 from GII.12 viruses. Despite accumulation of mutations, the genotype did not diversify into different variants (dashed line representing the cut-off value for variant designation). Heat map represents the number of pairwise comparisons, red being the highest and green the lowest number of pairwise comparisons. Data was obtained from studies that followed the natural history of norovirus infection [17,18,22,38,[52][53][54]. Every possible combination was recorded from the re-infection cases. Re-infection with strains from the same immunotype are indicated by black cells. For immunotype designation of each norovirus genotype refer to Fig 5. (TIF) S1 Table. Average reads per nucleotide position obtained for the twenty-four nearly fulllength norovirus genomes analyzed by next-generation sequencing in this study.