Skip to main content
Advertisement
  • Loading metrics

Weak selection and stochastic processes limit the emergence of antigenic variants during household transmission of influenza A viruses

  • Hunter J. Ries,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Pathobiological Sciences, University of Wisconsin–Madison, Madison, Wisconsin, United States of America

  • Joseph Lalli,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Writing – review & editing

    Affiliation Department of Pathobiological Sciences, University of Wisconsin–Madison, Madison, Wisconsin, United States of America

  • Kelsey R. Florek,

    Roles Data curation, Investigation, Project administration, Resources, Writing – review & editing

    Affiliation Wisconsin State Laboratory of Hygiene, Madison, Wisconsin, United States of America

  • Shari Barlow,

    Roles Data curation, Investigation, Project administration, Resources, Writing – review & editing

    Affiliation Department of Family Medicine and Community Health, University of Wisconsin–Madison, Madison, Wisconsin, United States of America

  • Maureen Goss,

    Roles Data curation, Investigation, Project administration, Resources, Writing – review & editing

    Affiliation Department of Family Medicine and Community Health, University of Wisconsin–Madison, Madison, Wisconsin, United States of America

  • Richard Griesser,

    Roles Data curation, Investigation, Resources, Writing – review & editing

    Affiliation Wisconsin State Laboratory of Hygiene, Madison, Wisconsin, United States of America

  • Tonya Danz,

    Roles Data curation, Investigation, Resources, Writing – review & editing

    Affiliation Wisconsin State Laboratory of Hygiene, Madison, Wisconsin, United States of America

  • Amra Uzicanin,

    Roles Conceptualization, Methodology, Project administration, Supervision, Writing – review & editing

    Affiliation National Center for Emerging and Zoonotic Infectious Diseases, US Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America

  • Jonathan Temte,

    Roles Conceptualization, Funding acquisition, Methodology, Project administration, Supervision, Writing – review & editing

    Affiliation Department of Family Medicine and Community Health, University of Wisconsin–Madison, Madison, Wisconsin, United States of America

  • Thomas C. Friedrich

    Roles Conceptualization, Data curation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    tfriedri@wisc.edu

    Affiliations Department of Pathobiological Sciences, University of Wisconsin–Madison, Madison, Wisconsin, United States of America, Wisconsin National Primate Research Center, University of Wisconsin–Madison, Madison, Wisconsin, United States of America

Abstract

Influenza viruses undergo antigenic drift, the gradual accumulation of mutations that cause antigenic changes in the viral surface proteins hemagglutinin (HA) and neuraminidase (NA). Although selection for antigenic variants is detectable on the global scale, the processes by which antigenic variants are generated and selected in individual hosts remain unclear. It has been hypothesized that selection for antigenic variants may occur during the establishment of a new infection, rather than over time in a single host. Here, we leveraged a large household cohort study to assess whether selection was detectable between acutely infected hosts. We investigated influenza A virus evolution using specimens from 384 children and household contacts with RT-PCR-confirmed influenza A infection, representing infections with A(H1N1)pdm09 and A(H3N2) viruses from 2017–19. In agreement with prior studies, we found that acute infections involved weak purifying selection across the viral genome. In addition, we identified 40 transmission events occurring in 31 households. During transmission, evolution between hosts was characterized by tight transmission bottlenecks and weak purifying selection. We found variability in the strength and direction of selection on antigenic regions of HA, but no clear evidence for selection of antigenic variants during transmission. Together, our results indicate that stochastic processes and weak natural selection dominate most acute influenza A virus infections and transmission events, and that selection of antigenic variants during transmission between acutely infected hosts is likely to be exceedingly rare.

Author summary

Influenza viruses clearly evolve under selective pressure from immune responses in human populations, but recent work suggests that within individual infections random effects are stronger than selection. New viral variants that spread globally must nonetheless emerge in one person and be transmitted onwards—how does this happen? We characterized viral genomes collected over two influenza seasons from 384 children and their household contacts. We detected 40 transmissions among 31 of the households, allowing us to examine how selection acts during infection and transmission. We found that influenza virus genetic diversity is low in infected individuals, and mutations arising in one person are rarely transmitted to their household contacts, consistent with prior reports that influenza virus evolution is tightly constrained within hosts. We further examined all transmission events for evidence of selection between hosts, finding only one mutation that could plausibly affect antibody recognition. However, we found no evidence that this mutation was favored by natural selection. Our results suggest that chance events, together with weak selection, are the main forces affecting influenza virus evolution within and between hosts during typical acute infections. Selection for new variants may be more likely to occur over longer transmission chains and/or during prolonged infections.

Introduction

Like other rapidly evolving RNA viruses, influenza A virus causes significant global morbidity and mortality [1]. Influenza vaccine effectiveness is often reduced by antigenic drift, the emergence and preferential propagation of antibody-escape mutations in the antigenic regions of the hemagglutinin (HA) and neuraminidase (NA) proteins. Influenza vaccine antigens must be frequently updated to reflect these changes in the HA protein of circulating influenza strains. The National Institute for Allergy and Infectious Diseases has thus identified “accurately predicting how influenza viruses will evolve” as critical to developing an effective universal influenza vaccine [2]. Although influenza A virus evolution is detectable on the global scale, it remains unclear how new antigenic variants arise in one host and are propagated through transmission chains and among populations. An improved understanding of these processes will enhance our ability to forecast trends in viral evolution.

Variant viruses encoding one or more amino acid substitutions quickly arise within hosts after infection [37]. These intra-host single-nucleotide variants (iSNVs) cause nonsynonymous (amino-acid changing) or synonymous (“silent”) mutations, which may impact pathogenesis [8]. Purifying selection purges deleterious mutations from viral populations, while diversifying selection favors changes away from the within-host consensus sequence. Globally, seasonal influenza viruses show an excess of synonymous variants throughout most of the genome, suggesting that most nonsynonymous mutations are deleterious and are quickly purged from circulation [8]. Classically defined antigenic regions of HA are the exception and are characterized by an excess of nonsynonymous substitutions. This excess suggests that HA sequences are subject to positive selection at a global scale, which is reflected in the “ladder-like” HA phylogeny and the continual emergence of new HA clades [8]. However, the action of natural selection appears to be relatively weak within individual hosts, even in HA antigenic sites [8]. Studies by our group [3,4] and others [714] have thus far failed to detect evidence of selection for antigenic variants during acute infections, where even potential immune-escape mutations remain at low frequency. Interestingly, influenza viruses appear capable of accumulating antigenic changes during chronic infections in immunocompromised individuals [8,15,16], suggesting that acute infection may not allow enough time for antigenic variants to emerge [14].

Every positively selected variant must originate from a mutation in a single host [17,18]. What evolutionary processes allow some mutations to “escape” from the forces of randomness and weak selection in individual hosts to become dominant in the global population? Recently, Morris et al. [19] suggested that immune selection is strongest at the time a new infection is being established at a respiratory site in the presence of pre-existing virus-specific antibodies. In this model, incoming viruses are subject to “sieving” by pre-existing antibodies in the recipient, such that viruses capable of evading antibody detection are more likely to successfully found a new infection. Antigenic variants that are present at low frequency in a donor may therefore become fixed in a new host during a selective transmission bottleneck. In contrast, because antibodies specific for the incoming virus become detectable only near the end of a typical acute infection, there is a temporal disconnect between the peak of virus replication and the appearance of either recall or de-novo antibody responses capable of selecting for new antigenic variants. Transmission of these variants is further impacted by the transmission bottleneck, which involves a sharp reduction in viral diversity—according to most estimates, only 1–2 unique viral genomes from the donor establish infection in the recipient [7,20]. In their model, Morris et al. suggest that minor iSNVs under positive selection could reach 100% within-host frequency within the first 24 hours of infection [19]. In support of this, we have previously shown in a ferret model system that influenza A virus variants present at less than 10% frequency in the donor can survive the transmission bottleneck and reach fixation in a new host [21].

Detecting potential selection on viral genomes during the establishment of a new infection requires characterizing virus populations in donor-recipient pairs and cannot be done through studies of individual infections alone. Our objective was to determine whether immune selection for antigenic variants is detectable during acute, person-to-person transmission of influenza A virus. We hypothesized that if antibody-mediated sieving acts at transmission, it might be possible to detect events in which low-frequency antigenic variants in donors would become enriched or fixed in recipients. Here, we sequenced seasonal influenza A viruses from 384 influenza-positive individuals enrolled in a school-associated, community-based influenza surveillance study, the Oregon Child Absenteeism Due to Respiratory Disease Study (ORCHARDS; Oregon, Wisconsin, USA) [22,23]. We then quantified within-host diversity and selection, linked donors and recipients using epidemiologic and genetic criteria, and tested whether variants—particularly in HA antigenic sites—were preferentially transmitted to recipients.

Results

Study population and viral genome sequencing

For this study, we used specimens collected from participants in the Oregon Child Absenteeism Due to Respiratory Disease Study (ORCHARDS) study. Full methodology was reported previously [22,23], and the protocol was approved by the University of Wisconsin–Madison Health Sciences Institutional Review Board. Participants aged 7 and older provided written consent; participants aged 4–6 provided verbal assent with written consent from parents or guardians. Briefly, the study enrolled school-aged children from the Oregon School District (Oregon, Wisconsin, USA) who reported at least two acute respiratory symptoms (e.g., cough, sore throat, nasal congestion, runny nose, sneezing, or fever). Study staff performed home visits, collected specimens from the children, and offered to enroll other household members in a sub-study assessing within-household influenza transmission. Study staff then collected nasal swabs from any other participating household members. Respiratory specimens that tested RT-PCR-positive for influenza A virus were used for the work described here. In total, we sequenced 384 samples from the 2017–18 and 2018–19 influenza seasons, generating 283 complete viral genomes that passed our quality-control thresholds (see Methods) [Table 1]. The average Ct value for all samples included in our study was 27.2, with a standard deviation of 3.9, and a range of 18.52–35.37.

We first used deep sequencing to assess viral diversity within all samples. We mapped sequence reads using each season’s A(H1N1)pdm09 or A(H3N2) vaccine strain as a reference and identified intra-host single-nucleotide variants (iSNVs) in each sample using conservative frequency thresholds (3%) and quality control criteria (see Methods). Then, we aligned all high-quality participant consensus sequences for each season and subtype and used the majority nucleotide at each position in this alignment to define a “within-season” reference sequence. Raw sequencing reads were then re-processed and aligned against the within-season reference sequence to contextualize the variants relative to the strains in local circulation at the time of sampling. We included the major genes in the 6 largest viral gene segments in our alignment and variant-calling pipeline. We excluded M1/M2 and NS1/NEP genes because their overlapping reading frames can confound site-based diversity metrics [8]. Some samples contained very high numbers of iSNVs, which could reflect truly high levels of within-host diversity, but may also indicate co-infection with distinct strains or possibly method errors. To remove such outliers, as in [8], we quantified the number of iSNVs (>0.5% allele frequency) for each sample, ranked them in descending order within their season-subtype group, and then excluded samples that were in the top 10%.

Low-frequency variation and purifying selection dominate within-host viral evolution

To assess within-host diversity in our cohort, we plotted minority iSNV frequencies (3–50%) for each gene and annotated them by mutation type: nonsynonymous (amino-acid changing), synonymous (“silent”), or stop gained (Fig 1A). We detected nonsynonymous and synonymous iSNVs across the genome, with a large proportion (77.3%) of iSNVs detected below 10% allele frequency in all samples. The pattern of mostly low-frequency iSNVs found throughout the viral genome was consistent across influenza seasons and viral subtypes (S1 Fig). We then quantified the total number of iSNVs per sample: 68 samples contained no detectable iSNVs, and the median count was 4 (S1 Fig). We found no statistically significant difference in iSNV counts between individuals reporting recent influenza vaccination and those reporting no recent influenza vaccination (S1B Fig). Notably, 2 of 283 samples were clear outliers, each with more than 20 iSNVs. One sample (Ct 28.67) from the 2017–18 A(H3N2) season contained 28 iSNVs (3–12.5%) across HA, NA, NP, PA, PB1, and PB2 (S1 Table). Notable among these was an iSNV in HA causing an M168I change in antigenic site A at 6.73% (H3 numbering; see Methods). A sample from the 2018–19 A(H3N2) season (Ct 29.66) contained 62 iSNVs (3–5.6%) across HA, NA, PA, PB1, and PB2 (S1 Table), of which 54 were present in HA (26 non-antigenic, 28 antigenic). Fifteen of the 28 antigenic iSNVs in HA were nonsynonymous, including antigenic site B mutations S159Y and K160T. Of the 1226 iSNVs detected in our 283 samples, we identified just 64 iSNVs in classical HA antigenic sites (40 nonsynonymous, 24 synonymous) in 31 samples. Taken together, our data support the current consensus that most within-host variants detected during acute influenza virus infections are present at low frequencies, and most samples contain only a few iSNVs—although occasional outliers exhibit substantially greater counts.

thumbnail
Fig 1. Low-frequency within-host variants predominate across influenza A/H3N2 genes and seasons.

(A) Intra-host single-nucleotide variants (iSNVs) in at least 3% of sequencing reads (horizontal grey line) in a given sample were plotted for 2017–19 A(H3N2) according to their nucleotide position and frequency. Each dot represents an iSNV colored by mutation type: nonsynonymous (orange), synonymous (blue), or stop gained (green). Genes shown: PB2 (polymerase basic 2), PB1 (polymerase basic 1), PA (polymerase acidic), HA (hemagglutinin), NP (nucleoprotein), and NA (neuraminidase). Not included: MP (matrix; M1/M2) and NS (nonstructural; NS1/NEP) due to overlapping reading frames. (B and C) iSNV frequency spectra showing the mean proportion of each mutation type for each bin for 2017–18 A(H3N2) and 2018–19 A(H3N2) samples. Error bars denote the standard deviation for each bin’s mutation type. The grey line indicates the proportion of iSNVs in each frequency bin expected under genetic drift [24,25]. Per-sample proportions are shown as dots for each frequency bin.

https://doi.org/10.1371/journal.ppat.1013689.g001

We next asked which evolutionary force(s) were most likely to be shaping viral evolution in these acutely infected individuals, as we have done previously [24,25]. Most minority iSNVs are detected below 10% frequency, as shown for A(H3N2) in the 2017–18 (Fig 1B) and 2018–19 (Fig 1C) seasons. Indeed, iSNVs detected below 10% frequency accounted for a greater proportion of all iSNVs than would be expected under neutral evolution (genetic drift; see [24,25]), while, as expected, iSNVs that created stop codons were detected at very low frequencies (15 iSNVs at 3.0–7.5% and one iSNV in PB1 at 13.3%). Such stop-gained iSNVs were detected in PB2, PB1, NP, PA, and NA, but not HA. We next counted the cumulative number of iSNVs for all samples from 3–100% to assess the distribution of iSNVs across various allele frequency bins. Cumulative iSNV counts similarly show that most mutations are present at either very low or very high frequencies within hosts (S1 Fig). Together, these results suggest that influenza virus diversity is limited within acutely infected hosts and characterized by weak purifying selection.

Most iSNVs within hosts are synonymous

To quantify the synonymous and nonsynonymous nucleotide diversity in viral populations within hosts, we calculated π, the mean pairwise differences per nucleotide site for a set of sequences [26]. We use π here because its value is less sensitive to sequencing read depth than other common diversity metrics [27]. Our analysis revealed that synonymous diversity (πS) was significantly greater than nonsynonymous diversity (πN) in every viral gene, both when we considered all seasons and subtypes together (Fig 2A; p < 0.001, Wilcoxon signed-rank tests) and when we analyzed each influenza season and subtype independently (S2A–D Fig; p < 0.015). We next sought to quantify the magnitude and directionality of selection occurring within hosts by calculating the difference between πN and πS for each gene. The distribution of πNπS values was slightly, but significantly, less than zero in all viral genes, both when we considered all data together across season and subtype (Fig 2B; p < 0.001, one-sample Wilcoxon signed-rank tests) and also when we analyzed each season and subtype individually (S2E–H Fig; p < 0.015). Together, these findings suggest that influenza A viruses are generally subjected to purifying selection within acutely infected hosts across influenza seasons, subtypes, and host vaccination status. This is also consistent with the finding that the preponderance of iSNVs are present below 10% frequency (Fig 1).

thumbnail
Fig 2. Synonymous diversity exceeds nonsynonymous diversity with evidence of purifying selection.

(A) The average number of pairwise nonsynonymous differences per nonsynonymous site (πN, orange) and synonymous differences per synonymous site (πS, blue) across all analyzed viral genes, subtypes, and seasons. Boxplots are overlaid to indicate the median, interquartile range, and values within 1.5 × the interquartile range. (B) The difference between πN and πS across all viral genes, subtypes, and seasons. Boxplots are overlaid as described in A. The horizontal grey line represents πN = πS, with values above zero suggesting diversifying selection and values below zero suggesting purifying selection. (C) Mean πN (orange) and πS (blue) values for 30-codon sliding windows in the hemagglutinin gene for 2017–18 A(H3N2) viruses. (D) Mean πNπS values for 30-codon sliding windows in the hemagglutinin gene for 2017–18 A(H3N2) viruses, colored by antigenic region. Shaded areas denote the standard error of the mean for each codon position. Genes shown: PB2 (polymerase basic 2), PB1 (polymerase basic 1), PA (polymerase acidic), HA (hemagglutinin), NP (nucleoprotein), and NA (neuraminidase). Not included: MP (matrix; M1/M2) and NS (nonstructural; NS1/NEP) due to overlapping reading frames. Paired comparisons of πN and πS were tested in A using Wilcoxon signed-rank tests. Median differences between πN and πS were tested in B against zero using one-sample Wilcoxon signed-rank tests. Asterisks indicate significance thresholds: p < 0.05 (*), p < 0.01 (**), and p < 0.001 (***); ns = not significant.

https://doi.org/10.1371/journal.ppat.1013689.g002

Calculating nucleotide diversity for an entire gene could mask selection directed at a small number of specific sites. We focused on HA to address the hypothesis that pre-existing antibody responses may exert selective pressure on viruses that could be detected as elevated πNπS values in certain antigenic regions. We calculated πN and πS (Fig 2C), as well as the difference between πN and πS (Fig 2D), using 30-codon sliding windows in HA for each sample. Our analysis of antigenic and non-antigenic regions (defined in H3 as antigenic regions A-E [28,29] and in H1 as sites Ca, Cb, Sa, Sb [30]) revealed that, while values for both πN and πS fluctuate along the length of the HA gene, πS exhibits larger fluctuations. Local spikes in πS lead to larger (i.e., more negative) πNπS differences in locations that include both classical antigenic sites and non-antigenic regions. This trend is present, with somewhat different patterns, in each influenza season and subtype (S3 Fig). Together, these findings show that, although HA genes are under overall purifying selection, the magnitude of purifying selection may vary over the length of HA as well as with season and viral subtype.

Tight bottlenecks and weak purifying selection shape between-host influenza virus evolution

Our study includes 36 households comprising 46 putative transmission events, which enables us to compare differences in virus population diversity between donors and recipients and test for evidence of selection during transmission. The first individual to report symptoms was identified as the donor case for each household in our dataset. Every other household member who had RT-PCR-confirmed influenza infection within ten days of the donor case was identified as a potential recipient case. Prior work has noted that household members might coincidentally acquire independent infections from outside the household [3133], potentially confounding analyses of transmission events. To address this, we implemented a genetic criterion for household pairing, as described previously [24]. Briefly, we calculated the proportion of all iSNVs (3–100% allele frequency) shared between each donor-recipient pair and excluded pairs with shared proportions below the 95th percentile of those observed in random community pairs (Fig 3A). The random community pairing followed the expected distribution: 95% of random community pairs shared fewer than 11.2% of variants (vertical dashed line). That is, 40 of the 46 potential household pairs shared a higher proportion of variants than 95% of randomly paired community samples. As a complementary approach to determine the relatedness of viruses infecting members of the same household, we generated maximum likelihood phylogenies for the HA consensus sequences of each influenza season and subtype (S4 Fig). Viruses from members of the same household clustered closely together on all trees, with identical or near-identical consensus sequences. Together, our epidemiological and genetic criteria enable us to rigorously identify cases of within-household transmission.

thumbnail
Fig 3. Donor–recipient variant frequencies and nucleotide diversity across genes.

(A) The proportion of variants shared between putative household transmission pairs (red) or randomly paired virus sequences from a given season and subtype (grey) for all samples included in this study. The vertical line denotes the threshold below which 95% of random community pairs fell. (B) iSNV frequency in all donors and recipients as defined in A. Frequencies are shown as 0% in the paired individual if the iSNV was absent or below the 3% quality-control threshold. Each point represents the frequency of an alternate (non-reference) allele for a given donor-recipient pair. Boxplots are overlaid to indicate the median, interquartile range, and values within 1.5 × the interquartile range for (C) πN, (D) πS, and (E) πNπS values for all donor (black) and recipient (red) samples across all viral genes analyzed. Genes shown: PB2 (polymerase basic 2), PB1 (polymerase basic 1), PA (polymerase acidic), HA (hemagglutinin), NP (nucleoprotein), and NA (neuraminidase). Not included: MP (matrix; M1/M2) and NS (nonstructural; NS1/NEP) due to overlapping reading frames. The horizontal grey line in E represents zero, where πN = πS. Paired comparisons of nucleotide diversity between donors and recipients were tested in C and D using Wilcoxon signed-rank tests. Median differences between πN and πS were tested in E against zero using one-sample Wilcoxon signed-rank tests. Asterisks indicate significance thresholds: p < 0.05 (*), p < 0.01 (**), and p < 0.001 (***); ns = not significant.

https://doi.org/10.1371/journal.ppat.1013689.g003

We next determined how transmission affected iSNV frequencies in the donor-recipient pairs. If transmission bottlenecks are loose, we expect iSNV frequencies in the recipient to resemble those in the donor. In total, we detected 1330 iSNVs in either donors or recipients. We categorized these iSNVs into three groups: present in both donor and recipient, present only in the donor, or present only in the recipient. The majority (72%) of iSNVs were detected in both the donor and recipient; of these, 99% exceeded 50% frequency (consensus level) and 95% were present above 90% frequency in both the donor and recipient. In contrast, only eight iSNVs (five synonymous, three nonsynonymous) below 90% were observed in both the donors and recipients at similar frequencies. These results indicate that highly abundant donor iSNVs compose most of the diversity shared between donors and recipients.

In one household pair, a synonymous substitution in PB1 codon 346 (PB1: N346N) was observed at 57% frequency in the donor and 76% in the recipient (Fig 3B, S2 Table). In a separate household pair, we detected a synonymous PB1 substitution (PB1: P627P) at 10% frequency in the donor and 29% in the recipient (S2 Table). In a third household cluster, a nonsynonymous PB2 substitution (PB2: A270S) was present at 60% frequency in the donor and 0%, 6%, and 15% in the three recipients (S2 Table). In a fourth household cluster, three intermediate-frequency iSNVs were potentially transmitted from the donor to three recipients (S2 Table). The donor sample harbored two NP iSNVs: T22A at 24% and G187G at 10%. Both mutations were lost in Recipient 1, while only T22A persisted at 30% in Recipient 2, and only G187G was maintained at 15% in Recipient 3. Similarly, in the same household, a synonymous HA mutation (HA: C471C, H3 numbering) was found at 32% in the donor, but was present at 100%, 71%, and 14% in the three recipients. Conversely, a nearby synonymous HA mutation (HA: G463G, H3 numbering) was present at 14% in the donor but was undetected in all recipients. The fact that intermediate-frequency donor iSNVs are lost, maintained, and/or fixed in different patterns in different recipients of the same donor is consistent with the idea that transmission of seasonal influenza A viruses from acutely infected humans typically involves a very tight, stochastic bottleneck in agreement with previous reports [7,20].

Twelve percent of iSNVs were only present in the donor, with 91% of these detected below 50% frequency. Meanwhile, 16% of iSNVs were detected only in the recipient, and 89% of these recipient-only iSNVs were minority variants (3–50%). We cannot determine whether such iSNVs were present in the donor below our detection limits or whether they emerged de novo in the recipient. Notably, 7% of the recipient-only iSNVs (representing 1% of all iSNVs in donors and recipients) were fixed in the recipient despite being undetected in the donor. Of these fixed recipient-only iSNVs, 4 were synonymous and 7 were nonsynonymous. Together, our results suggest that transmission of seasonal influenza A viruses generally involves very tight bottlenecks in which the likelihood that an iSNV is transmitted depends largely on its frequency in the donor.

While substantial within-host diversity is lost during transmission, the extent to which selection shapes this reduction remains unclear. We therefore next compared πN and πS between donors and recipients to assess whether transmission bottlenecks preferentially restrict nonsynonymous mutations. Overall, πN and πS values did not change significantly during transmission in any viral genes (Fig 3CD), except in PB2, which showed slightly but significantly higher πN values in recipients relative to donors (p = 0.025). This significant increase was driven by higher πN values specifically in the 2017–18 A(H3N2) group (p = 0.003; S5A Fig). In the 2017–18 A(H3N2) group, we also observed several additional genes in which recipients had slightly but significantly higher πN values compared to donors (PA and HA: p < 0.05), along with a modest increase in πS values for PB2 and NA (p < 0.05). These patterns were not observed in other groups (S5 Fig, S6 Fig). Across all genes when aggregating all seasons and subtypes in this study, the distribution of πNπS values was significantly below zero (p < 0.05), further suggesting that purifying selection was the dominant selective force in donors and recipients (Fig 3E). Statistical comparisons between donors and recipients in Fig 3E yielded no significant differences (S3 Table). Taken together, our results indicate that, on balance, purifying selection appears to be equally strong in both donors and recipients.

Antigenic sites on hemagglutinin are under distinct selective pressures during transmission

The model of influenza virus antigenic evolution proposed by Morris et al. [19] posits that selection operates mainly at the time when a new infection is established. Because this is expected to be a rare event, detecting it would require access to many well-defined transmission pairs. To ascertain whether this phenomenon was detected in our donor-recipient pairs (Fig 3B), we investigated whether specific nonsynonymous mutations in antigenic regions of HA were acquired or increased in frequency during transmission. We detected the transmission of a nonsynonymous antigenic site variant in just one transmission pair, in which both the donor and recipient were vaccinated. A nonsynonymous mutation encoding HA I185T (H3 numbering, antigenic site Sb of 2018–19 A(H1N1)pdm09) became fixed across transmission (from 38% in the donor to 100% in the recipient; S2 Table). This event is consistent with selection during transmission; however, it could also have become fixed because of the tight transmission bottleneck and not because it conferred a fitness benefit. Four other transmission pairs maintained HA I185T at 100% frequency, and this mutation was present in 19 other samples from our cohort at 100% frequency, suggesting that it was circulating in the community during the study period. The four donor samples contained an additional 23 mutations in HA, which were fixed relative to the within-season reference, including 2 synonymous mutations in antigenic region Ca1 (HA: E235E and HA: G237G). All of these mutations were present in the recipients, indicating that viruses infecting these donor-recipient pairs were divergent relative to most viruses sampled from the community during this time period (S4D Fig, taxon boxed in red). In all other donor-recipient pairs, we saw no stark increase in the frequency of iSNVs in classical antigenic sites, consistent with the expectation that this phenomenon is infrequent.

Although there was only one instance of an iSNV with potential antigenic impact increasing in frequency during transmission, we reasoned that patterns of nucleotide diversity could indicate whether selection acted specifically on antigenic sites of HA. To test this, we quantified the difference between πN and πS values for each codon in antigenic and non-antigenic regions (defined in H3 as antigenic regions A-E [28,29] and in H1 as sites Ca, Cb, Sa, Sb [30,34]) in each donor and recipient sample, then we normalized per-sample differences by the number of sites in each region. Across donors, recipients, and the pooled “all” group (i.e., all samples in the dataset considered together), antigenic and non-antigenic regions displayed significantly negative per-site πNπS values (p < 0.0231), consistent with purifying selection (Fig 4). These values remained stable during transmission, with no significant differences between donors and recipients for either region—a pattern that held across all seasons and subtypes analyzed in this study (S7 Fig). We detected a significant difference between antigenic and non-antigenic regions only in the “all” group (p = 3.69e-11), where non-antigenic regions showed stronger purifying selection. In this group, the median per-site πNπS value for antigenic regions was zero, with 32% of values exactly zero and 40% falling below zero. These results indicate that purifying selection is generally weaker in antigenic regions of HA than in non-antigenic regions. Still, it remains unclear whether this reflects reduced constraint, increased selective pressure, or both. If distinct selection pressures do act on antigenic sites during transmission, they appear too weak or rare to detect with this approach.

thumbnail
Fig 4. Nucleotide diversity across antigenic and non-antigenic regions of HA.

Per-site πNπS values in antigenic and non-antigenic regions of HA for all samples in this study. πNπS values were normalized to account for differences in the amount of antigenic and non-antigenic sites, denoted πNπS per site. Boxplots are overlaid to indicate the median, interquartile range, and values within 1.5 × the interquartile range for each group: Donor (black), Recipient (red), or All (blue). The “All” group includes values for all viral genomes sampled (donors, recipients, and also those without a linked household contact). The horizontal grey line represents zero, where πN = πS. Median differences between πN and πS were tested against the null hypothesis (πNπS = zero) using one-sample Wilcoxon signed-rank tests. Asterisks indicate significance thresholds: p < 0.05 (*), p < 0.01 (**), and p < 0.001 (***); ns = not significant.

https://doi.org/10.1371/journal.ppat.1013689.g004

Discussion

On the scale of the human population, influenza viruses clearly evolve under immune selection pressure. However, we are confronted by a paradox of scale: new antigenic variants must arise within individual infected hosts, but viral evolution within and between infected hosts appears to be dominated by randomness and purifying selection, making it difficult for new variants to emerge and transmit. Defining the evolutionary mechanisms connecting the individual to the global scale will enhance our ability to understand and forecast trends in influenza virus antigenic evolution [8]. Morris et al. predict that selection occurs not over time within hosts, but during the establishment of a new infection [19]. Here, we used 283 high-quality influenza A-positive samples from a community cohort study. Of these, 71 samples formed 40 validated donor–recipient pairs across 31 households for transmission analyses, and the remaining 212 single infections informed within-host analyses. We used these pairs to assess whether there was evidence for selection during transmission. Our study supports previous findings suggesting that the short duration of acute influenza virus infection, coupled with purifying selection and anatomical compartmentalization, dampens the generation of substantial diversity within hosts [68,17,27,3537]. We also find that narrow transmission bottlenecks between hosts result in a substantial loss in low-frequency diversity and dampen the ability of selection to act efficiently during transmission.

Mutations arise during acute influenza virus infection and are culled or amplified by purifying or diversifying selective pressures, respectively. Consistent with previous studies, we observe low-frequency iSNVs across the viral genome (Fig 1) and signals of purifying selection on every influenza virus gene (Fig 2B) [7]. To become dominant in human populations, variants must survive transmission bottlenecks and propagate in additional hosts. If transmission bottlenecks were extremely loose (allowing many viruses to be transferred from donor to recipient) and under neutral selection, we would expect iSNVs to be present in the donor and recipient at similar frequencies [24,25]. Conversely, under a tight bottleneck, low-frequency iSNVs are rarely transmitted, but if they are, the limited number of unique founding genomes often causes these variants to be present at high frequency in the recipient. In the 31 households and 40 transmission events we analyzed, low-frequency donor iSNVs were generally lost, and the vast majority of transmitted iSNVs were ≥90% frequency in the recipient. This pattern is consistent with a narrow, stochastic bottleneck, as we and others have reported previously for acute transmission of influenza A and SARS-CoV-2 [7,24,25,37,38].

Importantly, selection for specific variants during transmission may also result in reduced viral genetic diversity in recipients in a manner that may be difficult to distinguish from a stochastic bottleneck [39]. We have observed such selective bottlenecks ourselves in experimental systems of influenza A virus transmission in ferrets [4,21]. However, in the present work, our data are mostly consistent with the stochastic model: we see dramatic shifts in iSNV frequency, in which iSNVs are more likely to transmit if they are at high frequency in the donor. There is no single iSNV that is consistently transmitted. In several households, we capture transmission from a single donor to multiple recipients, and in these cases, we see that each recipient acquired a different combination of iSNVs. This suggests that the iSNVs are present in different viruses (i.e., not genetically or physically linked) and that each recipient received a random sample of viruses present in their donor.

We expect selective bottlenecks to be rare in acute human influenza transmission, but it is interesting that we did observe one nonsynonymous mutation in an HA antigenic region, HA: I185T, which is in antigenic site Sb. This mutation was a minority variant in the donor and became fixed in the recipient. Fixation of a potential antigenic variant during transmission could be consistent with the Morris et al. hypothesis [19], but it is also important to consider the broader context of contemporaneous viral diversity: A(H1N1)pdm09 viruses circulating both globally and in the sampled community were polymorphic at HA:185 site during the study period. Threonine (T185) had been dominant at this position, but isoleucine (I185) was becoming more frequent in the 2018–19 season, present in 51% of our samples and 66% of global sequences (S8 Fig) [40,41]. The donor in our transmission pair carried 185T at 38% frequency and 185I at 62%, yet the recipient became productively infected with 185T alone, the minority variant of the donor and the globe at the time. Thus, given the cocirculation of both T185 and I185 in our community and our lack of information about immune responses to influenza A viruses in our cohort, we cannot determine whether transmission of this mutation was favored by natural selection in this donor-recipient pair.

Our findings are consistent with previous studies suggesting that tight transmission bottlenecks reduce the ability of selection to fix beneficial iSNVs during transmission [7,20,21,37,38]. When selection is weak or absent, the probability that an iSNV is transmitted will be a function of its frequency rather than fitness. To test whether selection might still act preferentially on certain regions, we compared levels of purifying selection across antigenic and non-antigenic sites of HA. As a whole, both antigenic and non-antigenic regions of HA were under purifying selection, and our analysis detected no significant difference in purifying selection on antigenic or non-antigenic regions between donors and recipients (Fig 4). Non-antigenic regions showed slightly stronger purifying selection than antigenic regions, but this difference was not significant across most comparisons. Overall, within hosts, antigenic regions of HA appear to be under very weak purifying selection, while non-antigenic regions experience slightly stronger, yet still weak, purifying selection. This pattern was not clearly observed during transmission. While we did not observe any striking differences between the antigenic evolution of A(H1N1)pdm09 and A(H3N2), we are limited by the relatively small number of annotated antigenic sites for these viruses (50 and 130, respectively). Our results further support the idea that stochastic processes dominate influenza A virus evolutionary dynamics during transmission and that selective transmission leading to the global emergence of HA antigenic variants is an extremely rare, high-consequence event that will be difficult to observe in nature.

Our study has several limitations. We were unable to perform deep sequencing in technical replicates for this work, which could reduce the accuracy of iSNV frequencies, particularly near the 3% frequency threshold. To account for this deficit, we implemented stringent quality control criteria to eliminate the inclusion of samples of insufficient quality to detect low-frequency variants. Further, we only included samples with high whole-genome coverage and depth to facilitate comparison between genes. Our findings are also partially limited in that we have excluded M1/M2 and NS1/NEP genes from diversity analyses due to their overlapping reading frames and small size, which can confound site-based diversity metrics [8]. Our study design has additional limitations in both demographic and sampling structure. By nature of the study design, school-aged children reporting respiratory symptoms are the first household members enrolled in each household, and as a result, are likely to be recognized as donors in our transmission study. Indeed, all donors in our cohort’s transmission networks were children. This raises a potential bias, as children may have had more recent or limited exposure to influenza viruses compared to adult household members, potentially affecting their within-host diversity [31,42,43]. The study design also means that our cohort may be biased toward symptomatic infections in the donors. Importantly, we must caution that we do not know precisely when transmission occurred between putative donors and recipients. As such, donor samples may be derived later in the donor’s acute infection than recipient samples. Additionally, misclassification of donor-recipient direction remains possible, particularly when household members have shared community exposure or separate introductions of genetically similar viruses. The key limitation of our study is our relatively few validated transmission events. For future studies to rigorously evaluate the possibility that selection for antigenic variants occurs during transmission, larger cohorts will be required that focus on enrolling households or people in other settings where transmissions are likely to be captured. Future studies should employ similar prospective cohort study designs and analyze deep-sequencing efforts alongside immunological and epidemiological data to expand upon our findings.

Materials and methods

Ethics statement

This study uses samples originally collected as part of the Oregon Child Absenteeism Due to Respiratory Disease Study (ORCHARDS; Oregon, Wisconsin, USA); no new samples were collected and no new participants were enrolled for the analyses reported here. The ORCHARDS study was approved by the Human Subjects Committees of the Education and Social/Behavioral Sciences institutional review board (IRB; initial approval on September 4, 2013) and the University of Wisconsin Health Sciences IRB (initial approval on December 5, 2013, with additional approvals as the protocol expanded and modified). The ORCHARDS study complies with the Health Insurance Portability and Accountability Act of 1996 (HIPAA) and FERPA. The US Office of Management and Budget has approved all forms used in this study.

ORCHARDS participant recruitment and sampling

The full ORCHARDS study methodology can be found in Temte et al. 2022 [23]. Briefly, ORCHARDS was a community cohort study designed to evaluate utility of cause-specific student absenteeism as a method for early detection of increased influenza activity in schools and surrounding community; ORCHARDS also served as a platform to study the characteristics and impact of different respiratory pathogens affecting school-aged children. A subset of participants in this cohort was recruited into a sub-study to assess within-household transmission of respiratory pathogens. Household members were responsible for collecting specimens on Day 0 (within 24 hours of the home visit by study coordinators) and Day 7 (seven days after the initial collection). Each household participant collected the specimen without staff observation on the day of the home visit (Day 0) and again seven days later (Day 7). Two participants deviated from this protocol: one collected their first sample on Day 1 and another collected their second sample on Day 8. Both samples were retained for analysis (S9 Fig). In total, the present study uses specimens from 384 children and their household contacts that tested positive for influenza A virus by RT‐PCR (IVD CDC Human Influenza Virus RT‐PCR Diagnostic Panel [44]) using the Applied Biosystems 7500 Real-Time PCR instrument (Thermo Fisher) at the Wisconsin State Laboratory of Hygiene (WSLH).

Illumina library preparation and sequencing

Following the manufacturer’s protocol, influenza virus RNA was extracted from nasal swabs using the QIAcube HT (Qiagen) with the QIAamp 96 Virus QIAcube HT Kit (Qiagen). RT-PCR was then performed on extracted viral RNA using the SuperScript III One-Step RT-PCR System with Platinum Taq High Fidelity DNA Polymerase (Thermo Fisher) and the primers described by Zhou et al. [45], which amplify all of the eight influenza A virus gene segments in a single RT-PCR reaction, with the following thermal cycling conditions: 42˚C for 50 min, 50˚C for 10 min, 94˚C for 2 min, four cycles of the following: [94˚C for 30 sec, 43˚C for 30 sec, 68˚C for 3 min 50 sec], 30 cycles of the following: [94˚C for 30 sec, 57˚C for 30 sec, 68˚C for 3 min 30 sec increasing by 10 sec every cycle], then finally 68˚C for 10 min. The resulting PCR amplicons are then treated with Exonuclease I (Thermo Fisher) and incubated at 37˚C for 15 min and 80˚C for 15 min. The DNA is then quantified using Victor X2 (Perkin Elmer) and the Quant-iT High-Sensitivity dsDNA kit (Thermo Fisher) and normalized to 0.2 ng/μL. The DNA library is prepared for sequencing using the Nextera XT DNA Library Prep Kit (Illumina) following the manufacturer’s protocol and sequenced using the MiSeq platform (Illumina).

iSNV calling pipeline

Intra-host single-nucleotide variants (iSNVs) were called using bespoke scripts through the Center for High Throughput Computing (CHTC) at the University of Wisconsin–Madison. All read processing scripts were executed through the rieshunter/tcflab:v2.04 Docker image (https://hub.docker.com/r/rieshunter/tcflab/tags) on a CHTC server node. This Docker image consists of a Linux GNU OS (18.04.6 LTS “Bionic Beaver”) with all necessary commands and program files (Docker version 20.10.13).

Raw, paired FASTQ files from samples positive for the subtype of interest were adapter-trimmed and quality-trimmed to an average 5-base sliding-window quality of Q20 using Trimmomatic (version 0.39). Additionally, all reads less than 100 bases in length were discarded. Trimmed, paired reads were then merged using BBMerge.sh (BBMap version 38.96), outputting merged pairs and unmerged files. Merged and unmerged reads were normalized to 2000 reads for all 31-mers above 200 depth. Normalized, merged, and unmerged reads were aligned to the season-appropriate vaccine influenza strain, determined by sample collection time and PCR subtyping, using bwa mem (BWA version 0.7.17). Each sample was aligned to its respective season- and subtype-specific reference vaccine strain: A/Hong Kong/4801/2014 (2017–18 A(H3N2)), A/Singapore/INFIMH-16–0019/2016 (2018–19 A(H3N2)), or A/Michigan/45/2015 (2017–19 A(H1N1)pdm09) [44,46]. Alignments were concatenated with samtools merge to produce a single sample alignment file from the merged file and the R1 and R2 unmerged files (Samtools version 1.15). iSNVs were called using callvariants.sh (BBMap version 38.96) to a minimum Phred quality score of Q30 and minimum position coverage of 100. A minimum frequency of 3% was applied after post-processing analysis in RStudio.

To generate consensus sequences for each sample, variants above 50% were applied to each sample’s respective season- and subtype-specific reference vaccine strain sequence using BCFtools (version 1.15). Samples that failed to map to one or more gene segments, had < 100x coverage at one or more segments, or <99% genome coverage were discarded. High-quality sample consensus genomes were split by segment and aligned to one another using ClustalO (version 1.2.4), and these alignments were visualized with JALview (version 2.11.2.6) to generate a single within-season consensus sequence for each season-subtype combination.

We then realigned all normalized reads to their respective within-season reference sequence to contextualize iSNVs relative to circulating strains in the community. Alignment and variant-calling were performed using the same quality metrics above to produce “realigned” alignment files and “recalled” iSNV files. Samples failing whole-genome coverage thresholds (described above) after realignment were excluded from downstream analyses. To detect and remove potential outliers with excess within-host diversity, we quantified the number of iSNVs > 0.5% allele frequency for each sample, ranked them in descending order within their season-subtype group, and excluded samples that were in the top 10%, as in Xue et al. [8].

Diversity statistics

Variants were annotated using snpEff (version 5.1) relative to the within-season reference sequence to categorize mutation type (synonymous, nonsynonymous, stop-gained, etc.). Antigenic regions were defined in H3 as antigenic regions A-E [28,29] and in H1 as sites Ca, Cb, Sa, Sb [30,34]. Amino acids composing antigenic sites H3 and H1 HAs are listed in S4 Table and S5 Table, respectively. The per-segment, per-gene, and per-codon π diversity were calculated with SNPGenie (version 2019.10.31). Sample π diversity was computed as a per-nucleotide weighted average of each segment’s π diversity.

Diversity statistics, data processing, and figure generation were performed with GNU bash (version 3.2.57(1)-release) and R (version 4.4.1 “Race for Your Life”) through macOS Sequoia 15.5. Multiple packages were used in RStudio (version 2023.12.1 + 402 “Eye Holes”): tidyverse 2.0.0 (which loads tibble 3.2.1, dplyr 1.1.4, purrr 1.0.2, readr 2.1.5, tidyr 1.3.1, stringr 1.5.1, forcats 1.0.0), vcfR 1.15.0 for parsing VCF files, reshape2 1.4.4 for reshaping data, lubridate 1.9.3 for date-time manipulation, ggplot2 3.5.1 for graphics, plus cowplot 1.1.3, gridExtra 2.3, ggpubr 0.6.0, scales 1.3.0, and ggbeeswarm 0.7.2 for figure layout and annotation. All scripts are publicly available via GitHub (https://github.com/RiesHunter/ORCHARDS).

Statistical comparisons

Statistical comparisons were performed in R using the base-R wilcox.test function. We first assessed normality with the Shapiro–Wilk test: (shapiro.test). For non-normally distributed data, we employed non-parametric statistical tests. Paired Wilcoxon signed-rank tests (wilcox.test(x, y, paired = TRUE)) were used for matched (dependent) samples. The null hypothesis is that the median of the paired differences equals zero. One-sample Wilcoxon signed-rank tests (wilcox.test(x, mu = 0)) were used to test whether the sample median differs from zero. Unpaired two-sample Wilcoxon rank-sum tests (wilcox.test(x, y, paired = FALSE)), also known as Mann–Whitney U tests, were used for unmatched (independent) samples. The null hypothesis is that the two groups have the same distribution. All tests were two‐sided, and p < 0.05 was considered significant. Statistical test p-values for each figure can be found in Supplemental Table 3.

Donor-recipient pairing

Every ORCHARDS sample that passed quality control was paired based on household status and timing between infections. Individuals must have been within the same household to be considered pairs. Further, “donors” must have tested positive on the same day or before the “recipient’s” sample was collected. To improve confidence in household pairing status, all samples within the same season and influenza subtype were randomly paired, and a proportion of shared variants was calculated for each random pair, as we have done previously [24]. This proportion was also calculated for potential household pairs. The proportion of shared variants was calculated as twice the number of shared variants between the two samples divided by the total number of variants between the two samples. Household pairs sharing a higher proportion of variants than 95% of random pairings within the season were considered valid donor-recipient pairs. Validated transmission pairs are shown in Supplemental Figure 9.

Maximum likelihood tree generation

All hemagglutinin consensus sequences were stratified by season and subtype and aligned with ClustalW (version 2.1) [47]. Aligned sequences were the input for RAxML-NG (version 1.2.2) [48] using the GTR+Gamma substitution model. RAxML-NG bestTree files were analyzed in RStudio and R using APE 5.8 [49], phangorn 2.12.1 [50], cowplot 1.1.3, RColorBrewer 1.1-3, ggplot 3.5.1 [51], and ggtree 3.12.0 [5254]. Briefly, trees were imported with read.tree, roots were resolved against the respective vaccine strain reference sequence, and tips were colored by household.

Supporting information

S1 Fig. Within-host iSNV counts and cumulative frequency by influenza season and subtype.

(A) Histogram depicting the number of samples containing the indicated numbers of intra-host single-nucleotide variants (iSNVs) across all genes included in this study. (B) Number of iSNVs detected per sample (dots), stratified by reported recent influenza vaccination status: unvaccinated (red), vaccinated (purple). Vaccination refers to current-season influenza vaccination at specimen collection, confirmed in the Wisconsin Immunization Registry [23]. Either (grey) includes all samples regardless of vaccination status. Boxplots are overlaid to indicate the median, interquartile range, and values within 1.5 × the interquartile range for iSNV counts. Histograms of the cumulative number of iSNVs by frequency for (C) 2017–18 A(H3N2), (D) 2018–19 A(H3N2), (E) 2017–18 A(H1N1)pdm09, and (F) 2018–19 A(H1N1)pdm09. Each bin spans a 10% frequency interval of the form [a, b), where the lower bound is included and the upper bound is excluded—except for the final bin, which includes values equal to 100%. Variants below 3% frequency are excluded due to quality filtering. Paired comparisons of iSNV counts between vaccinated and unvaccinated groups within each season-subtype were tested in B using Wilcoxon rank-sum tests. Asterisks indicate significance thresholds: p < 0.05 (*), p < 0.01 (**), and p < 0.001 (***); ns = not significant.

https://doi.org/10.1371/journal.ppat.1013689.s001

(TIF)

S2 Fig. Gene-wise nucleotide diversity across influenza seasons and subtypes.

(A–D) The average number of pairwise nonsynonymous differences per nonsynonymous site (πN, orange) and the average number of pairwise synonymous differences per synonymous site (πS, blue) across all genes included in this study for (A) 2017–18 A(H3N2), (B) 2018–19 A(H3N2), (C) 2017–18 A(H1N1)pdm09, and (D) 2018–19 A(H1N1)pdm09. Boxplots are overlaid to indicate the median, interquartile range, and values within 1.5 × the interquartile range. (E–H) The difference between πN and πS for (E) 2017–18 A(H3N2), (F) 2018–19 A(H3N2), (G) 2017–18 A(H1N1)pdm09, and (H) 2018–19 A(H1N1)pdm09. Genes shown: PB2 (polymerase basic 2), PB1 (polymerase basic 1), PA (polymerase acidic), HA (hemagglutinin), NP (nucleoprotein), and NA (neuraminidase). Not included: MP (matrix; M1/M2) and NS (nonstructural; NS1/NEP) due to overlapping reading frames. Boxplots are overlaid as described in above. The horizontal grey line represents πN = πS. Paired comparisons of πN and πS were tested using Wilcoxon signed-rank tests. Median differences between πN and πS were tested against the null hypothesis (i.e., πNπS = zero) using one-sample Wilcoxon signed-rank tests. Asterisks indicate significance thresholds: p < 0.05 (*), p < 0.01 (**), and p < 0.001 (***); ns = not significant.

https://doi.org/10.1371/journal.ppat.1013689.s002

(TIF)

S3 Fig. Sliding-window hemagglutinin nucleotide diversity across influenza seasons and subtypes.

(A, B, E, F) Mean πN (orange) and πS (blue) values for 30-codon sliding windows in the hemagglutinin gene for (A) 2017–18 A(H3N2), (B) 2018–19 A(H3N2), (E) 2017–18 A(H1N1)pdm09, and (F) 2018–19 A(H1N1)pdm09. (C, D, G, H) Mean πNπS values (black) for 30-codon sliding windows in the hemagglutinin gene for (C) 2017–18 A(H3N2), (D) 2018–19 A(H3N2), (G) 2017–18 A(H1N1)pdm09, and (H) 2018–19 A(H1N1)pdm09, colored by respective antigenic region. Shaded areas denote the standard error for each codon position.

https://doi.org/10.1371/journal.ppat.1013689.s003

(TIF)

S4 Fig. Maximum-likelihood hemagglutinin phylogenies by influenza season and subtype.

(A) 2017–18 A(H3N2), (B) 2018–19 A(H3N2), (C) 2017–18 A(H1N1)pdm09, and (D) 2018–19 A(H1N1)pdm09 (divergent taxon boxed in red). Tips for validated donor–recipient pairs are colored by household; all other tips are black. The season-appropriate vaccine strain reference used as the outgroup is shown in white. Trees were inferred with RAxML-NG under GTR+Gamma and rooted to the vaccine reference; scale bars indicate substitutions per site.

https://doi.org/10.1371/journal.ppat.1013689.s004

(TIF)

S5 Fig. Gene-wise πN, πS, and πNπS values in A(H3N2) donors and recipients.

(A and B) πN, (C and D) πS, and (E and F) πNπS values for all donor (black) and recipient (red) samples across all genes included in this study, for 2017–18 A(H3N2) and 2018–19 A(H3N2), respectively. Genes shown: PB2 (polymerase basic 2), PB1 (polymerase basic 1), PA (polymerase acidic), HA (hemagglutinin), NP (nucleoprotein), and NA (neuraminidase). Not included: MP (matrix; M1/M2) and NS (nonstructural; NS1/NEP) due to overlapping reading frames. The horizontal grey line in E and F represents zero, where πN = πS. Paired comparisons of πN and πS between donors and recipients were performed using Wilcoxon signed-rank tests. Median differences between πN and πS were tested against zero using one-sample Wilcoxon signed-rank tests. Asterisks indicate significance thresholds: p < 0.05 (*), p < 0.01 (**), and p < 0.001 (***); ns = not significant.

https://doi.org/10.1371/journal.ppat.1013689.s005

(TIF)

S6 Fig. Gene-wise πN, πS, and πNπS values in A(H1N1)pdm09 donors and recipients.

(A and B) πN, (C and D) πS, and (E and F) πNπS values for all donor (black) and recipient (red) samples across all genes included in this study, for 2017–18 A(H1N1)pdm09 and 2018–19 A(H1N1)pdm09, respectively. Genes shown: PB2 (polymerase basic 2), PB1 (polymerase basic 1), PA (polymerase acidic), HA (hemagglutinin), NP (nucleoprotein), and NA (neuraminidase). Not included: MP (matrix; M1/M2) and NS (nonstructural; NS1/NEP) due to overlapping reading frames. The horizontal grey line in E and F represents 0, where πN = πS. Paired comparisons of πN and πS between donors and recipients were performed using Wilcoxon signed-rank tests. Median differences between πN and πS were tested against zero using one-sample Wilcoxon signed-rank tests. Asterisks indicate significance thresholds: p < 0.05 (*), p < 0.01 (**), and p < 0.001 (***); ns = not significant.

https://doi.org/10.1371/journal.ppat.1013689.s006

(TIF)

S7 Fig. Per-site πNπS values in antigenic and non-antigenic regions of HA.

(A) 2017–18 A(H3N2), (B) 2018–19 A(H3N2), (C) 2017–18 A(H1N1)pdm09, and (D) 2018–19 A(H1N1)pdm09. πNπS values were normalized to account for differences in the amount of antigenic and non-antigenic sites, denoted πNπS per site. Boxplots are overlaid to indicate the median, interquartile range, and values within 1.5 × the interquartile range for each group: Donor (black), Recipient (red), or All (blue). The “All” group includes values for all viral genomes sampled (donors, recipients, and also those without a linked household contact). The horizontal grey line represents zero, where πN = πS. Median differences between πN and πS were tested against zero using one-sample Wilcoxon signed-rank tests. Asterisks indicate significance thresholds: p < 0.05 (*), p < 0.01 (**), and p < 0.001 (***); ns = not significant.

https://doi.org/10.1371/journal.ppat.1013689.s007

(TIF)

S8 Fig. Global dynamics of HA1 site 185 amino acid identities.

(A) Time-resolved phylogeny of H1 consensus sequences (n = 1,461), constructed using Nextstrain and GISAID data. Branches and tips are colored by amino acid identity at HA1 site 185: threonine (T, yellow), proline (P, blue), isoleucine (I, dark orange), asparagine (N, green), and valine (V, light orange). (B) Temporal dynamics of HA1 185 genotypes among global sequences, showing the proportion of each amino acid over time. The dashed vertical line marks the approximate timing of I185T fixation in a specific donor-recipient pair, during a period when 185I was present in 66% of global sequences. Both panels were adapted from Nextstrain [41] and modified to align x-axes and improve label clarity.

https://doi.org/10.1371/journal.ppat.1013689.s008

(TIF)

S9 Fig. Household sample collection timelines relative to donor sample collection date.

Sample collection timelines for the 31 households meeting both epidemiologic and genetic criteria for influenza transmission. Each row represents a single household, with the donor sample shown as a black circle and recipient sample(s) shown as red circles. Dotted gray lines connect each donor to their corresponding recipient(s). When multiple samples within a household were collected on the same day, recipient points are offset by +0.125 days along the x-axis to avoid overlap.

https://doi.org/10.1371/journal.ppat.1013689.s009

(TIF)

S1 Table. Within-host variants from two outlier samples.

Intra-host single-nucleotide variants (iSNVs) from two samples with greater than twenty iSNVs. The table was generated from a Variant Call Format file from each sample. Individual sample iSNVs were annotated with snpEff, and antigenicity was annotated with RStudio (see Methods).

https://doi.org/10.1371/journal.ppat.1013689.s010

(XLSX)

S2 Table. Notable iSNVs present in both donor and recipient samples.

Merge_ID denotes the year and subtype of the sample. Genes shown: PB2 (polymerase basic 2), PB1 (polymerase basic 1), PA (polymerase acidic), HA (hemagglutinin), NP (nucleoprotein), and NA (neuraminidase). Not included: MP (matrix; M1/M2) and NS (nonstructural; NS1/NEP) due to overlapping reading frames.

https://doi.org/10.1371/journal.ppat.1013689.s011

(XLSX)

S3 Table. Statistical tests and results from the study.

Genes shown: PB2 (polymerase basic 2), PB1 (polymerase basic 1), PA (polymerase acidic), HA (hemagglutinin), NP (nucleoprotein), and NA (neuraminidase). Not included: MP (matrix; M1/M2) and NS (nonstructural; NS1/NEP) due to overlapping reading frames.

https://doi.org/10.1371/journal.ppat.1013689.s012

(XLSX)

S4 Table. Genomic positions for H3 antigenic sites.

Genes shown: HA (hemagglutinin) only.

https://doi.org/10.1371/journal.ppat.1013689.s013

(XLSX)

S5 Table. Genomic positions for H1 antigenic sites.

Genes shown: HA (hemagglutinin) only.

https://doi.org/10.1371/journal.ppat.1013689.s014

(XLSX)

Acknowledgments

We would like to thank the participants from this study as well as the Oregon School District staff, ORCHARDS team, and Wisconsin State Laboratory of Hygiene colleagues for their contribution to this study.

Disclaimers: The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the US Centers for Disease Control and Prevention (CDC).

References

  1. 1. Krammer F, Smith GJD, Fouchier RAM, Peiris M, Kedzierska K, Doherty PC. Influenza. Nat Rev Dis Primer. 2018;4:3.
  2. 2. Paules CI, Sullivan SG, Subbarao K, Fauci AS. Chasing Seasonal Influenza - The Need for a Universal Influenza Vaccine. N Engl J Med. 2018;378(1):7–9. pmid:29185857
  3. 3. Dinis JM, Florek KR, Fatola OO, Moncla LH, Mutschler JP, Charlier OK, et al. Deep Sequencing Reveals Potential Antigenic Variants at Low Frequencies in Influenza A Virus-Infected Humans. J Virol. 2016;90(7):3355–65. pmid:26739054
  4. 4. Moncla LH, Zhong G, Nelson CW, Dinis JM, Mutschler J, Hughes AL, et al. Selective Bottlenecks Shape Evolutionary Pathways Taken during Mammalian Adaptation of a 1918-like Avian Influenza Virus. Cell Host Microbe. 2016;19(2):169–80. pmid:26867176
  5. 5. Debbink K, McCrone JT, Petrie JG, Truscon R, Johnson E, Mantlo EK, et al. Vaccination has minimal impact on the intrahost diversity of H3N2 influenza viruses. PLoS Pathog. 2017;13(1):e1006194. pmid:28141862
  6. 6. Valesano AL, Fitzsimmons WJ, McCrone JT, Petrie JG, Monto AS, Martin ET, et al. Influenza B Viruses Exhibit Lower Within-Host Diversity than Influenza A Viruses in Human Hosts. J Virol. 2020;94(5):e01710-19. pmid:31801858
  7. 7. McCrone JT, Woods RJ, Martin ET, Malosh RE, Monto AS, Lauring AS. Stochastic processes constrain the within and between host evolution of influenza virus. Elife. 2018;7:e35962. pmid:29683424
  8. 8. Xue KS, Bloom JD. Linking influenza virus evolution within and between human hosts. Virus Evol. 2020;6(1):veaa010. pmid:32082616
  9. 9. Zhao L, Abbasi AB, Illingworth CJR. Mutational load causes stochastic evolutionary outcomes in acute RNA viral infection. Virus Evol. 2019;5(1):vez008. pmid:31024738
  10. 10. Nelson MI, Simonsen L, Viboud C, Miller MA, Taylor J, George KS, et al. Stochastic processes are key determinants of short-term evolution in influenza a virus. PLoS Pathog. 2006;2(12):e125. pmid:17140286
  11. 11. Martin MA, Berg N, Koelle K. Influenza A genomic diversity during human infections underscores the strength of genetic drift and the existence of tight transmission bottlenecks. Virus Evol. 2024;10(1):veae042. pmid:38883977
  12. 12. Han AX, Maurer-Stroh S, Russell CA. Individual immune selection pressure has limited impact on seasonal influenza virus evolution. Nat Ecol Evol. 2019;3(2):302–11. pmid:30510176
  13. 13. Nelson MI, Holmes EC. The evolution of epidemic influenza. Nat Rev Genet. 2007;8(3):196–205. pmid:17262054
  14. 14. Petrova VN, Russell CA. The evolution of seasonal influenza viruses. Nat Rev Microbiol. 2018;16(1):47–60. pmid:29081496
  15. 15. McMinn P, Carrello A, Cole C, Baker D, Hampson A. Antigenic drift of influenza A (H3N2) virus in a persistently infected immunocompromised host is similar to that occurring in the community. Clin Infect Dis. 1999;29(2):456–8. pmid:10476769
  16. 16. Xue KS, Stevens-Ayers T, Campbell AP, Englund JA, Pergam SA, Boeckh M, et al. Parallel evolution of influenza across multiple spatiotemporal scales. eLife. 2017;6:e26875. pmid:28653624
  17. 17. Xue KS, Moncla LH, Bedford T, Bloom JD. Within-Host Evolution of Human Influenza Virus. Trends Microbiol. 2018;26(9):781–93. pmid:29534854
  18. 18. Pauly MD, Procario MC, Lauring AS. A novel twelve class fluctuation test reveals higher than expected mutation rates for influenza A viruses. eLife. 2017;6:e26437.
  19. 19. Morris DH, Petrova VN, Rossine FW, Parker E, Grenfell BT, Neher RA, et al. Asynchrony between virus diversity and antibody selection limits influenza virus evolution. eLife. 2020;9:e62105. pmid:33174838
  20. 20. Xue KS, Bloom JD. Reconciling disparate estimates of viral genetic diversity during human influenza infections. Nat Genet. 2019;51(9):1298–301. pmid:30804564
  21. 21. Wilker PR, Dinis JM, Starrett G, Imai M, Hatta M, Nelson CW, et al. Selection on haemagglutinin imposes a bottleneck during mammalian transmission of reassortant H5N1 influenza viruses. Nat Commun. 2013;4:2636. pmid:24149915
  22. 22. Temte J, Barlow S, Landsverk M, Temte E, Schemmel A, Maerz B. Cause-Specific School Absenteeism Monitoring Identifies Community Influenza Outbreaks. Online J Public Health Inform. 2018;10:e103.
  23. 23. Temte JL, Barlow S, Goss M, Temte E, Bell C, He C, et al. The Oregon Child Absenteeism Due to Respiratory Disease Study (ORCHARDS): Rationale, objectives, and design. Influenza Other Respir Viruses. 2022;16(2):340–50. pmid:34623760
  24. 24. Braun KM, Moreno GK, Wagner C, Accola MA, Rehrauer WM, Baker DA, et al. Acute SARS-CoV-2 infections harbor limited within-host diversity and transmit via tight transmission bottlenecks. PLoS Pathog. 2021;17(8):e1009849. pmid:34424945
  25. 25. Braun KM, Haddock Iii LA, Crooks CM, Barry GL, Lalli J, Neumann G, et al. Avian H7N9 influenza viruses are evolutionarily constrained by stochastic processes during replication and transmission in mammals. Virus Evol. 2023;9(1):vead004. pmid:36814938
  26. 26. Nelson CW, Moncla LH, Hughes AL. SNPGenie: estimating evolutionary parameters to detect natural selection using pooled next-generation sequencing data. Bioinformatics. 2015;31(22):3709–11. pmid:26227143
  27. 27. Zhao L, Illingworth CJR. Measurements of intrahost viral diversity require an unbiased diversity metric. Virus Evol. 2019;5(1):vey041. pmid:30723551
  28. 28. Wolf YI, Viboud C, Holmes EC, Koonin EV, Lipman DJ. Long intervals of stasis punctuated by bursts of positive selection in the seasonal evolution of influenza A virus. Biol Direct. 2006;1:34. pmid:17067369
  29. 29. Alymova IV, York IA, Air GM, Cipollo JF, Gulati S, Baranovich T, et al. Glycosylation changes in the globular head of H3N2 influenza hemagglutinin modulate receptor binding without affecting virus virulence. Sci Rep. 2016;6:36216. pmid:27796371
  30. 30. Matsuzaki Y, Sugawara K, Nakauchi M, Takahashi Y, Onodera T, Tsunetsugu-Yokota Y, et al. Epitope mapping of the hemagglutinin molecule of A/(H1N1)pdm09 influenza virus by using monoclonal antibody escape mutants. J Virol. 2014;88(21):12364–73. pmid:25122788
  31. 31. Tsang TK, Lau LLH, Cauchemez S, Cowling BJ. Household Transmission of Influenza Virus. Trends Microbiol. 2016;24(2):123–33. pmid:26612500
  32. 32. Gubareva LV, Novikov DV, Hayden FG. Assessment of hemagglutinin sequence heterogeneity during influenza virus transmission in families. J Infect Dis. 2002;186(11):1575–81. pmid:12447732
  33. 33. Tsang TK, Cowling BJ, Fang VJ, Chan K-H, Ip DKM, Leung GM, et al. Influenza A Virus Shedding and Infectivity in Households. J Infect Dis. 2015;212(9):1420–8. pmid:25883385
  34. 34. Xu R, Ekiert DC, Krause JC, Hai R, Crowe JE Jr, Wilson IA. Structural basis of preexisting immunity to the 2009 H1N1 pandemic influenza virus. Science. 2010;328(5976):357–60. pmid:20339031
  35. 35. Sobel Leonard A, McClain MT, Smith GJD, Wentworth DE, Halpin RA, Lin X, et al. Deep Sequencing of Influenza A Virus from a Human Challenge Study Reveals a Selective Bottleneck and Only Limited Intrahost Genetic Diversification. J Virol. 2016;90(24):11247–58. pmid:27707932
  36. 36. Moncla LH, Bedford T, Dussart P, Horm SV, Rith S, Buchy P, et al. Quantifying within-host diversity of H5N1 influenza viruses in humans and poultry in Cambodia. PLoS Pathog. 2020;16(1):e1008191. pmid:31951644
  37. 37. Amato KA, Haddock LA 3rd, Braun KM, Meliopoulos V, Livingston B, Honce R, et al. Influenza A virus undergoes compartmentalized replication in vivo dominated by stochastic bottlenecks. Nat Commun. 2022;13(1):3416. pmid:35701424
  38. 38. Varble A, Albrecht RA, Backes S, Crumiller M, Bouvier NM, Sachs D, et al. Influenza A virus transmission bottlenecks are defined by infection route and recipient host. Cell Host Microbe. 2014;16(5):691–700. pmid:25456074
  39. 39. Lumby CK, Nene NR, Illingworth CJR. A novel framework for inferring parameters of transmission from viral sequence data. PLoS Genet. 2018;14(10):e1007718. pmid:30325921
  40. 40. Ketklao S, Boonarkart C, Phakaratsakul S, Auewarakul P, Suptawiwat O. Responses to the Sb epitope contributed to antigenic drift of the influenza A 2009 H1N1 virus. Arch Virol. 2020;165(11):2503–12. pmid:32783078
  41. 41. Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics. 2018;34(23):4121–3. pmid:29790939
  42. 42. Bodewes R, de Mutsert G, van der Klis FRM, Ventresca M, Wilks S, Smith DJ, et al. Prevalence of antibodies against seasonal influenza A and B viruses in children in Netherlands. Clin Vaccine Immunol. 2011;18(3):469–76. pmid:21209157
  43. 43. Longini IM Jr, Koopman JS, Monto AS, Fox JP. Estimating household and community transmission parameters for influenza. Am J Epidemiol. 1982;115(5):736–51. pmid:7081204
  44. 44. Garten R, Blanton L, Elal AIA, Alabi N, Barnes J, Biggerstaff M, et al. Update: Influenza Activity in the United States During the 2017-18 Season and Composition of the 2018-19 Influenza Vaccine. MMWR Morb Mortal Wkly Rep. 2018;67(22):634–42. pmid:29879098
  45. 45. Zhou B, Deng Y-M, Barnes JR, Sessions OM, Chou T-W, Wilson M, et al. Multiplex Reverse Transcription-PCR for Simultaneous Surveillance of Influenza A and B Viruses. J Clin Microbiol. 2017;55(12):3492–501. pmid:28978683
  46. 46. Xu X, Blanton L, Elal AIA, Alabi N, Barnes J, Biggerstaff M, et al. Update: Influenza Activity in the United States During the 2018-19 Season and Composition of the 2019-20 Influenza Vaccine. MMWR Morb Mortal Wkly Rep. 2019;68(24):544–51. pmid:31220057
  47. 47. Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22(22):4673–80. pmid:7984417
  48. 48. Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A. RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics. 2019;35(21):4453–5. pmid:31070718
  49. 49. Paradis E, Claude J, Strimmer K. APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics. 2004;20(2):289–90. pmid:14734327
  50. 50. Schliep KP. phangorn: phylogenetic analysis in R. Bioinformatics. 2011;27(4):592–3. pmid:21169378
  51. 51. Wickham H. ggplot2: Elegant Graphics for Data Analysis. In: Create Elegant Data Visualisations Using the Grammar of Graphics [Internet]. 2016 [cited 5 May 2023]. Available: https://ggplot2.tidyverse.org/
  52. 52. Yu G, Smith DK, Zhu H, Guan Y, Lam TT. ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol. 2016;8(1):28–36.
  53. 53. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5(10):R80. pmid:15461798
  54. 54. Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods. 2015;12(2):115–21. pmid:25633503