Skip to main content
Advertisement
  • Loading metrics

The impact of viral and host factors on the influenza A virus transmission bottleneck

  • Kathryn C. Krupinsky,

    Roles Data curation, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Microbiology & Immunology, University of Michigan, Ann Arbor, Michigan, United States of America

  • Emily E. Bendall,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Resources, Visualization, Writing – review & editing

    Affiliation Department of Microbiology & Immunology, University of Michigan, Ann Arbor, Michigan, United States of America

  • Yuwei Zhu,

    Roles Data curation, Investigation

    Affiliation Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America

  • Melissa S. Stockwell,

    Roles Investigation

    Affiliation Department of Pediatrics and Department of Population and Family Health, Columbia University Irving Medical Center, New York, United States of America

  • Huong Q. Nguyen,

    Roles Investigation

    Affiliation Marshfield Clinic Research Institute, Marshfield, Wisconsin, United States of America

  • Jennifer K. Meece,

    Roles Investigation

    Affiliation Marshfield Clinic Research Institute, Marshfield, Wisconsin, United States of America

  • Yvonne Maldonado,

    Roles Investigation

    Affiliation Department of Pediatrics, Stanford University, Palo Alto, California, United States of America

  • Katherine D. Ellingson,

    Roles Investigation

    Affiliation Department of Epidemiology & Biostatistics, University of Arizona, Tucson, Arizona, United States of America

  • Karen Lutrick,

    Roles Investigation

    Affiliation Department of Family and Community Medicine, University of Arizona, Tucson, Arizona, United States of America

  • Edwin J. Asturias,

    Roles Investigation

    Affiliation Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, Colorado United States of America

  • Suchitra Rao,

    Roles Investigation

    Affiliation Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, Colorado United States of America

  • Natalie M. Bowman,

    Roles Investigation

    Affiliation Division of Infectious Diseases, University of North Carolina, Chapel Hill, North Carolina United States of America

  • Melissa Rolfes,

    Roles Formal analysis

    Affiliation Influenza Division, Centers for Disease Control and Prevention, Atlanta, Georga, United States of America

  • Jessica E. Biddle,

    Roles Formal analysis

    Affiliation Influenza Division, Centers for Disease Control and Prevention, Atlanta, Georga, United States of America

  • Alexandra Mellis,

    Roles Formal analysis

    Affiliation Influenza Division, Centers for Disease Control and Prevention, Atlanta, Georga, United States of America

  • Jonathan E. Schmitz,

    Roles Investigation

    Affiliation Department of Pathology, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America

  • James D. Chappell,

    Roles Investigation

    Affiliation Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America

  • Natasha B. Halasa,

    Roles Investigation

    Affiliation Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America

  • William J. Fitzsimmons,

    Roles Data curation, Methodology

    Affiliation Division of Infectious Diseases, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America

  • Emily T. Martin,

    Roles Funding acquisition, Investigation, Project administration

    Affiliation Department of Epidemiology, University of Michigan, Ann Arbor, Michigan, United States of America

  • Carlos G. Grijalva,

    Roles Funding acquisition, Investigation

    Affiliation Department of Health Policy, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America

  • H. Keipp Talbot,

    Roles Funding acquisition, Investigation

    Affiliations Department of Health Policy, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America

  •  [ ... ],
  • Adam S. Lauring

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Writing – original draft, Writing – review & editing

    alauring@med.umich.edu

    Affiliations Department of Microbiology & Immunology, University of Michigan, Ann Arbor, Michigan, United States of America, Division of Infectious Diseases, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America

  • [ view all ]
  • [ view less ]

Abstract

Transmission bottlenecks are defined by the number of unique virions or genotypes that establish an infection, and they restrict viral diversity that passes from one infected host to another. Previous work identified a tight transmission bottleneck for seasonal influenza A virus (IAV) based on analysis of 43 household pairs, largely from a single A(H3N2) predominant season. While many viral and host factors are known to influence IAV transmission in households, their impact on the transmission bottleneck is not clear. Nasal swabs were collected daily from IAV infected individuals enrolled in two case-ascertained U.S. household transmission studies, FluTES (2017/2018–2019/2020 seasons) and RVTN (2021/2022 season). Viruses were sequenced in duplicate, and intrahost single nucleotide variants (iSNV) were identified at a 0·5% frequency threshold using a benchmarked pipeline with >99·99% specificity for mutations present in both replicates. Transmission pairs were defined based on co-residence, test date, and genetic distance. For each possible transmission pair, the bottleneck was estimated using a beta binomial and a clonal mutation model. We sequenced 567 samples from 319 individuals and 102 households in duplicate. Based on epidemiologic linkage and a sequence-based cut-off, we defined 59 transmission pairs for the beta binomial model and 56 transmission pairs for the clonal mutation model. Across all pairs, we identified a transmission bottleneck of 1 both using the beta-binomial model (CI 1, 1) and the clonal mutation model (CI: 1.00, 1.22). In our cohort, influenza season, subtype, and host factors (influenza vaccination status, sex, and age) did not alter the transmission bottleneck. IAV is subject to a tight genetic bottleneck during transmission, which limits onward propagation of newly arising nucleotide variants. Tight bottlenecks appear to be intrinsic to the transmission process, as host and viral factors within households do not affect its size.

Author summary

Previous studies suggest a tight transmission bottleneck for IAV. Host factors such as sex at birth, age, and immune status are well documented modulators of IAV shedding, suggesting they may also play a role in IAV transmission bottleneck size. Year to year variation in circulating strains also suggests that there may be viral factors that are important. However, the impact of viral and host factors on the bottleneck size is presently unclear. In this study, we define and characterize 60 new IAV transmission pairs and find a tight transmission bottleneck regardless of season, subtype, and host factors (sex at birth, age, and immune status). To our knowledge, this is the largest set of IAV transmission pairs analyzed to date and provides additional estimates of the IAV transmission bottleneck. We also use two validated and orthogonal models for inferring the size of the transmission bottleneck and find consistent estimates regardless of method used. Together with prior work, this study demonstrates that the IAV transmission bottleneck is intrinsically tight and not impacted by host or viral factors. Given well-established patterns of IAV evolution, these findings suggests that there is a complex interplay between within-host and population-scale processes that warrant further investigation.

Introduction

Many respiratory viruses, and influenza virus in particular, undergo antigenic drift – the selection of newly arising strains that escape host immunity. This complicates long-term within-host immune memory as well as disease management and control [1,2]. Rapid evolution can also potentially lead to new strains that differ in their transmissibility and pathogenicity [35]. While viral nucleotide variants are typically identified through population-level surveillance, each nucleotide variant ultimately arises from mutations that occur during a host’s infection. Whether or not a given nucleotide variant reaches a secondary host and subsequently propagates within the population is a highly complex process. One important factor is the size of the transmission bottleneck, the number of viral particles transmitted from a donor to a recipient host that successfully establish genetic lineages within the recipient [6]. When a transmission bottleneck is tight, few nucleotide variants transfer from one host to another, and the virus is largely identical across a transmission chain. Conversely, when a transmission bottleneck is loose, a larger number of nucleotide variants transfer from one host to another, and a combination of rare and common nucleotide variants establish infection in the next host. Because bottlenecks determine which mutations propagate along a transmission chain, they are important determinants of evolutionary rate [7,8].

Given technical limitations in capturing the precise moment of infection establishment, it is difficult to observe transmission bottlenecks. In experimental systems, infections of animals with barcoded viruses and subsequent barcode quantification have been used to directly determine bottleneck size [9]. However, in observational human studies, we rely on inference methods to determine bottleneck size. During an acute infection, mutations are constantly being introduced into the viral genome. Depending on timing of a mutation and its persistence, the relative frequency of each intrahost single nucleotide variant (iSNV) can vary. Like how barcodes are used in experimental systems, inference methods can use these iSNV frequencies to estimate the transmission bottleneck. Multiple methods exist to estimate bottleneck size. McCrone et al. used an early method that relies on the presence or absence of iSNVs between two members of a transmission pair [10]. While conceptually simple, this method excludes donor iSNV not found in the recipient host at time of sampling and fails to account for stochastic nucleotide variant dynamics in early infection following transmission. Sobel Lenoard et al. subsequently developed the beta-binomial model that utilizes relative frequencies of shared iSNVs to estimate bottleneck size [11]. The beta-binomial model was enhanced by Ghafari et al. to explicitly account for non-independent assortment of iSNVs during transmission [12]. All these methods use frequencies of iSNVs (i.e., sites at which two alleles, or nucleotide variants, are present). One limitation, then, is that iSNV inclusion is strongly influenced by the nucleotide variant calling threshold, or the cutoff value determining the potential of a true genetic variation. With this in mind, the clonal mutation model developed by Shi et al. instead relies on consensus sequence differences between members of a transmission pair [6]. This approach circumvents the limitations of the nucleotide variant calling threshold and better accounts for early infection stochastic effects.

Nearly all published data on the influenza A virus bottleneck originate from a single household study with 43 transmission pairs [13]. While limited in the number of pairs, the estimate for this dataset is well validated with multiple studies finding that, regardless of inference model used, the transmission bottleneck is tight (1–3 virions) [6,11,14]. However, given that only a single dataset has been used, these studies have not been able to assess the impact of host and viral factors on bottleneck size. This is a significant knowledge gap, as previous work has shown that influenza vaccination, influenza season, and viral subtype influence shedding and the secondary attack rates [15]. Further, host factors such as sex at birth, age, and immune status are well documented modulators of viral shedding and transmission [1618]. Given their strong influence on transmission dynamics, we hypothesize that these same factors may also play a role in the transmission bottleneck size. Nonetheless, the impacts of viral and host factors on the transmission bottleneck for influenza, or many other viruses, are largely unknown.

Here we use data and specimens from two case-ascertained household transmission studies to determine how the influenza virus transmission bottleneck is modulated by viral and host factors. Specifically, we identify within-household transmission pairs and employ both the beta-binomial and clonal mutation models to estimate transmission bottleneck size. Each model utilizes distinct aspects of the viral sequence data, and, to our knowledge, this study compares these models on the largest dataset to date. Lastly, we examine how viral (season and subtype) and host factors (sex, age, and vaccination status) impact the transmission bottleneck size by examining subsets of our overall population stratified by these factors.

Results

We obtained samples from individuals enrolled in FluTES and RVTN-S, two case-ascertained household transmission studies conducted in the United States. In full, these studies enrolled 2683 individuals from 806 households [19]. In total, we sequenced 567 samples from 319 individuals and 102 households in duplicate. Of these, 399 were successfully sequenced (see Methods section for details).

We focused our analysis on samples and sequences surrounding transmission events. Households that contained viruses that differed by more than 5 mutations were excluded because this distance likely indicated multiple independent introductions rather than within-household transmission. A phylogenetic analysis confirmed the genetic relatedness of viruses within the remaining households (S1A-D Fig). Lastly, given the infectious period of IAV [20], we restricted our analysis to households in which symptom onset for all individuals within a household occurred within a continuous 14 day period. Following this filtering, we had a final analysis set of 100 participants from 46 households. Key descriptive features of included participants are shown in Table 1.

thumbnail
Table 1. Influenza samples over four seasons within two household transmission studies.

https://doi.org/10.1371/journal.ppat.1014079.t001

Consistent with seasonal IAV trends in the United States, the A(H1N1)pdm09 subtype predominated during the 2019/2020 season while the A(H3N2) subtype predominated during the 2018/2019 and 2021/2022 seasons [2123]. The A(H3N2) subtype predominated in the 2017/2018 season [24]; however, the single pair in this study had A(H1N1)pdm09. Regardless of season, most participants were unvaccinated or had an unknown vaccination status and a majority were under the age of 18 years. Aside from the 2019/2020 season, participants were primarily female. Most households had only two IAV-positive individuals with a handful of larger (3–4 person) households identified in each season.

Next, we examined genetic features of viruses included in our final analysis set. For all sequenced specimens, the frequencies (Fig 1A) and number (Fig 1B) of iSNV per sample were generally low, consistent with previous studies of IAV [25,26]. We further found that iSNV frequency was strongly correlated between sequencing replicates (Fig 1C) indicative of high reproducibility in our variant calling and frequency estimates. There was no major difference in the iSNV frequency distributions of synonymous and nonsynonymous mutations, consistent with previous work (Fig 1D) [13,25].

thumbnail
Fig 1. Genetic diversity of specimens.

(A) histogram of iSNV frequency. (B) histogram of number of iSNVs per specimen. (C) scatter plot of iSNV frequency between sample replicates. (D) density plot of iSNV frequency of synonymous and non-synonymous mutations.

https://doi.org/10.1371/journal.ppat.1014079.g001

Transmission pair characteristics

We defined transmission pairs from members within the same household and season based on symptom onset date. Most households had symptom onset of all individuals within one week (S2 Fig). We assigned individuals with the earliest symptom onset date as donors for a household regardless of household size. For majority of cases, donor was study-identified index; in some cases, samples from study-identified index were not available/was not successfully sequenced and earliest symptom onset date was used to assign donor status. In cases of two individuals sharing an earliest-for-a-household symptom onset date, we treated both individuals as potential household donors and then as a donor-recipient pair with one another. In households with three or more individuals with unique symptom onset dates, we did not allow for individuals with intermediate onset dates to serve as donors. Our pairing method meant that some pairs were included twice to account for both directions of plausible transmission. For the clonal mutation analyses, in which directionality is not important, we excluded pairs that had identical individuals with directionality reversed (i.e., removed double counted pairs, see below). Additionally, in these cases, we assigned donor and recipient roles, however these role assignments were not relevant in the analysis. S3 Fig shows a visual representation of this pairing schema.

Because the beta-binomial and clonal mutation models rely on different underlying sequence characteristics, the number of eligible pairs in our dataset differed. The beta-binomial model utilizes shared and non-shared iSNVs between members of a transmission pair and, thus, requires that samples contain at least one iSNV. This model is also sensitive to transmission direction. In contrast, the clonal mutation model utilizes the number of clonal differences between members of a transmission pair and, accordingly, does not impose any iSNV requirements for use of a sample. These differences in pairing requirement meant that the beta-binomial model analyzed more pairs, but fewer samples compared to the clonal mutation model.

Table 2 describes characteristics of transmission pairs constructed for both methods of transmission bottleneck estimation.

We evaluated transmission pairs that spanned our host and viral co-factors of interest. Aside from the 2017/2018 season where only two samples were included in our final analysis set, we had a similar number of pairs in each influenza season. We had a greater number of A(H3N2) transmission pairs as compared to A(H1N1)pdm09. This was attributable to a greater number of pairs within the A(H3N2)-predominated seasons (2018/2019 and 2021/2022) as compared to the A(H1N1)pdm09-predominated seasons (2017/2018 and 2019/2020). Our dataset contained more pairs with concordant vaccination status as compared to discordant vaccination status. We also found that female participants were represented more often in the recipient position as compared to the donor position. However, this was not relevant to transmission bottleneck estimation using the clonal mutation model as directionality does not contribute. Lastly, we found that transmission where at least one member of the pair is a child (under 18) was common in our population, reflective of our sample population containing more children than adults.

We next examined genetic features shared between members of a transmission pair. We found that the majority of iSNVs were not shared between members of a pair (Fig 2A). The number of clonal differences was also low for most transmission pairs (Fig 2B). These data are suggestive of a tight bottleneck.

thumbnail
Fig 2. Shared iSNV and clonal distribution.

(A) Shared genetic diversity between transmission pairs. Each point is an iSNV within a transmission pair. iSNVs are based on within host consensus sequence (≥ 50% frequency) and numerical corrections account for differences in between-pair consensus base at a particular locus. Details of numerical corrections applied can be found in Methods. (B) Distribution of number of clonal differences between transmission pair members for the overall study population.

https://doi.org/10.1371/journal.ppat.1014079.g002

Transmission bottleneck size

To determine transmission bottleneck size, we employed two distinct approaches to utilize the two types of data available. The first method [11], as opposed to a simple average, accounts for differing number of iSNVs used in pair-scale transmission bottleneck estimates with pairs with greater number of iSNVs more heavily contributing to the cohort or subgroup transmission bottleneck estimate. The second method [6] uses clonal mutation distributions across groups of pairs to determine maximumly likely bottleneck sizes. Using the iSNV method (beta-binomial model), pair-scale transmission bottleneck estimates were heterogeneous in size (S4 Fig). Generally, transmission bottleneck estimates for A(H1N1)pdm09 pairs were larger than for A(H3N2) pairs. When summed together using a weighted average method (see [11] for details), we found an overall transmission bottleneck size of 1. This is consistent with previous estimates using similarly structured datasets [27,9]. The clonal mutation method does not permit pair-level estimates of transmission bottleneck size; nonetheless, it also estimated an overall transmission bottleneck size of 1.

We next sought to determine if viral or host co-factors of the donor and/or recipient alter transmission bottleneck size. For viral co-factors, we evaluated season and viral subtype. For host co-factors, we evaluated age (child <18, adult ≥ 18), sex, and vaccination status. We found that stratification by host and viral factors did not strongly impact the estimated transmission bottleneck size regardless of the model used (Fig 3). In all cases when the beta-binomial model was utilized, we found that the transmission bottleneck of maximum likelihood was 1 using a weighted average method (see [11] for details). Direct, unweighted comparisons between pair-level estimates across co-factor groups was not preformed because variation in number of iSNVs per pair implies different uncertainty levels for each pair-level estimate. Similarly, in all cases evaluated with the clonal mutation model, we found that the transmission bottleneck of maximum likelihood was 1. The 2017/2018 season was not analyzed using the clonal mutation model due to only including a single pair. While our sample sizes for some groups are small, a power analysis with simulated bottleneck sizes demonstrated that we would be able to detect consequential differences (S5 Fig). Together, these data indicate that transmission bottleneck size is not strongly impacted by host or viral co-factors.

thumbnail
Fig 3. Bottleneck size with sample size overall and by metadata factors.

Bottleneck estimates for overall population and by host/viral factors (year, subtype, age, sex, and vaccination status). Overall population and subgroup analysis bottleneck estimates were calculated with independent distributions (clonal method) or a weighted averaging method (iSNV method). Additional details on bottleneck estimation method can be found in Methods. Purple bars represent estimates found using clonal model and blue bars represent estimates found using beta-binomial model (in all cases, 1). Host factors are listed in format of X/Y with X representing donor factor and Y representing recipient factor. “A” stands for adult. “C” stands for child. “M” stands for male. “F” stands for female. “V” stands for vaccinated. “U” stands for unvaccinated or unknown vaccination status. 95% confidence intervals are shown by black error bars.

https://doi.org/10.1371/journal.ppat.1014079.g003

Discussion

In this study, we utilized two case-ascertained household transmission studies spanning 4 influenza seasons to estimate the transmission bottleneck across more than 50 possible household transmission pairs. We used two different estimation methods, each leveraging distinct features of sequencing data. We found that, regardless of method used, transmission bottleneck size was approximately 1 virus particle, indicating a clonal viral population establishes infection. While factors such as influenza season, subtype, and influenza vaccination status, sex, and age (child vs. adult) are known to influence influenza viral shedding and transmission, they did not alter the estimated transmission bottleneck size in our population.

Influenza virus evolution has been studied extensively at both the host and population scales. Transmission is critical because it links these two scales – a nucleotide variant must be transmitted from one host to another to spread across a population. Bottleneck size is important as it determines how much diversity is lost during a transmission event. This study and others have found that the transmission bottleneck for influenza virus is very tight. Further, while some data are consistent with a contribution of positive selection to transmission [28], a majority of evidence points to stochastic effects determining the virions that survive the bottleneck [29]; the frequency of a given nucleotide variant in the population of incoming infectious particles at the moment of exposure determines its likelihood of being successfully transmitted and, when the bottleneck is tight, typically the majority variant survives. Given that influenza viral shedding typically starts 2 days post infection [30], nucleotide variant frequency is largely set by the frequencies in the founding viral population. As newly arising nucleotide variants rarely accumulate in that short a time to a frequency where they can plausibly be transmitted, within-host processes are likely not a significant contributor to population level processes for IAV. In fact, for SARS-CoV-2 infection, studies show that evolution within a single host can outpace evolution along a transmission chain [8,31].

We also find that factors known to impact viral shedding and transmission rates do not impact transmission bottleneck size. For instance, Rolfes et al. performed an analysis on a subset of the same case-ascertained household cohorts we utilized and found a significantly higher secondary attack rate in the 2021/2022 IAV season compared to prior seasons [19]. When comparing the same seasons, we did not find a corresponding difference in bottleneck size (Fig 3). This is not limited to influenza virus. Bendall et al. similarly found that the SARS-CoV-2 transmission bottleneck is not related to infection severity or transmissibility [32]. Together, these findings suggest that a tight transmission bottleneck size may be intrinsic to the respiratory virus transmission route. Further, while barcoded systems have been used in animal models to show that transmission routes can alter the bottleneck size, tight bottlenecks have been consistently found in both respiratory and non-respiratory viruses [9,3335]. This suggests a tight bottleneck may be an essential feature of virus transmission. One hypothesis as to why this is the case is that there is simply a very small probability of successful infection establishment by multiple nucleotide variants. Respiratory virus transmission is a multistep process: a virion must evade the intrinsic and innate immune factors of the respiratory tract within its initial host, transit through the host respiratory tract, survive outside the body (as a droplet in the air or on a living (e.g., fingers) or inanimate (i.e., fomite) surface), enter a recipient's respiratory tract, find recipient cells supportive of replication, and, finally, evade within-cell defenses of the recipient to successfully replicate. The chance that all these events will occur successfully for a single virion is very small. Further, principles of droplet generation and survival suggest that infection primarily occurs with very few virions seeding a recipient host, driving a small bottleneck [33]. Precisely where virions get lost along this pathway can be deciphered using barcoded systems, an approach that has been successfully used within animal models [28,36].

Our study has several strengths. To our knowledge, this is the largest number of high-quality transmission pairs used to estimate transmission bottleneck size from a single dataset. Further, primary sequence data underwent rigorous screening procedures. Our study used first positive samples obtained over the course of daily monitoring. This means that there is a high likelihood that we accurately capture viral heterogeneity very close to time of transmission. Given that transmission bottleneck estimation involves using inference models to reconstruct the genetic heterogeneity at the precise moment of transmission, having samples with granular information on timing of infection pairs is ideal. We also employed two methods that have both been previously validated on similarly structured datasets and, as described in [11] and [6], are orthogonal approaches that utilize distinct aspects of sequence data to estimate transmission bottleneck size.

Our study also has some limitations. As is the case with all human datasets, our study may not be completely generalizable. Our study population represents a relatively healthy population and does not cover settings where persistent viremia and transmission potential are more prevalent (e.g., those in residential/long-term care). Further, factors not controlled for such as socioeconomic status, race, air quality, or climate may mean that our findings are not valid outside of the study population.

Due to our approach to identifying transmission pairs, we include some pairs twice to account for uncertain transmission directionality. While this slightly dilutes our sample set due to the inclusion of mutually exclusive transmission situations, we believe that it allows for increased certainty, as we capture true directionality of transmission. We could statistically account for this using unequal weighting; however, we do not believe it would substantially impact our results and thus our results are limited by the omission of this feature in our analysis. We have also potentially included transmission pairs that are not valid. Given typical social patterns of households, there is potential for two household members to have joint exposure to an external source as opposed to our assumption that one household member always transmits to another household member. We mitigated this by requiring minimal genetic distance between pairs prior to analysis. We dichotomize age into two categories, adult and child, as opposed to further dividing age based on known lines of differing immune differences (e.g., delineating older vs. younger adults) [37]. While this limits the scope of questions addressed, further divisions of age categories would decrease our sample size for each subgroup. Lastly, some of our population subsets and co-factor analysis groups are small, limiting the statistical power of our study. However, we show through a simulated data analysis that we could detect consequential differences in transmission bottleneck size if they were to exist (S5 Fig).

We find that the transmission bottleneck size of influenza A virus is small regardless of viral and host cofactors known to modulate intra- and inter-host infection processes. Given a strong role for antigenic drift in IAV evolution, our finding that the transmission bottleneck is tight raises interesting questions regarding how within-host and population-scale IAV evolution are linked mechanistically, culminating in well-established paradigms of influenza ecology and epidemiology. Ultimately, our findings suggest that there is a deeper genetic or mechanistic basis for transmission bottleneck size – the investigation into which will be the subject of future work.

Methods

Ethics statement

These studies were approved by the Vanderbilt University Medical Center Institutional Review Board (FluTES IRB #171420, RVTN-S IRB #211495), reviewed by Centers for Disease Control (CDC), and conducted consistent with applicable federal law and CDC policy (45 CFR 46.102(l) [2]). All adults provided written consent; for children, parents provided written permission and children ≥ 7 years old provided assent.

Cohort description

We used data and specimens collected in two case-ascertained household transmission studies (FluTES and RVTN-S) based in the United States (US) that together spanned four influenza seasons. FluTES enrolled individuals from the 2017/2018, 2018/2019, and 2019/2020 seasons, and RVTN enrolled individuals in the 2021/2022 season. Both studies followed a similar design, and complete study protocols are published elsewhere [19,38]. Briefly, enrollees were identified and recruited from ambulatory clinics, emergency departments, or other settings that performed influenza testing and received a positive test result for influenza A virus (IAV). Cases with acute illness of less than 5 days’ duration who lived with at least 1 other person who was not currently ill were eligible to participate. The enrollee and their household contacts were enrolled within 7 days of the enrollee’s illness onset and followed for up to 7 days (2017–2020) or 10 days (2021–2022). Nasal swabs were self-/parent-collected or staff-collected daily during follow-up and tested for influenza using RT-qPCR. Enrollment questionnaires were administered by study staff or self-administered. Questionnaires included information on participant age, self-reported sex and race, self-reported ethnicity, household characteristics, self-reported medical history, self-reported symptoms in the week prior to enrollment, and self-reported influenza vaccination for the current season. Influenza vaccination was self-reported at enrollment and was included if both date and location of vaccination were provided. Participants who reported vaccination less than 14 days prior to enrollment or who reported unknown vaccination were considered unvaccinated. Self-administered daily quantitative symptom questionnaires were completed during follow-up.

Sequencing of samples

We sequenced samples from households with two or more IAV positive individuals. We sequenced the first positive specimen with a cycle threshold (Ct) value of ≤30 from each individual to capture samples closest to likely time of transmission [32]. IAV-positive samples with an RT-qPCR Ct ≤ 30 were sequenced in duplicate after the RNA extraction step. RNA was extracted using Invitrogen PureLink Pro 96 Viral RNA/DNA Purification Kit on an EpiMotion liquid handler or a MagMAX viral/pathogen nucleic acid purification kit (ThermoFisher) on a Kingfisher instrument. SuperScript IV one-step RT-PCR kits and universal IAV primers were used for RT-PCR [39]. Sequencing libraries were prepared using Illumina DNA Prep Kits, and libraries were sequenced on a Novaseq (2  ×  150 PE reads) by the Advanced Genomics Core at the University of Michigan or on a Nextseq (2 X 150 PE reads) in the Lauring laboratory. Reads from each sample were aligned to the corresponding influenza vaccine strain for each subtype and season: A/Michigan/45/2015 A(H1N1)pdm09, A/Hong Kong/4801/2014 A(H3N2), and A/Singapore/INFIMH-16–0019/2016 A(H3N2). For 2019/2020 A(H1N1)pdm09, we used A/New Jersey/13/2018, as the A/Brisbane/02/2018 sequence was not available. For the 2021/2022 influenza season we used A/Darwin/9/2021 A(H3N2). The vaccine strain for that season was A/Cambodia/e0826360/2020, but there was a mismatch between vaccine and circulating strain. For alignment, we used Bowtie2 [40] with the “very sensitive” setting, and duplicate reads were discarded using Picard tools (Picard Toolkit 2019). Reads from both replicates of a given specimen were combined and used to make a within-host consensus sequence using a script from [41]. The replicates were then separately aligned to this consensus, and duplicates were removed.

Nucleotide variant calling

For all genomes with an average genome-wide coverage of at least 1000× in both replicates (post de-duplication), we used iVar [42] to generate a consensus sequences and perform variant calling. We used an iSNV frequency threshold of ≥0.005 (0.5%). Intrahost single nucleotide variants had to be on reads with a mapQ of ≥20, bases with a phred score of ≥30, a per-site sequencing depth of ≥400, and an iVAR p-value of ≤1  ×  10−5. iSNVs were retained only if they were called in both sequencing replicates. In the case of overlapping open reading frames (ORFs), an iSNV was classified as nonsynonymous if it was nonsynonymous in any ORF. Stop codons were classified as nonsynonymous. For all other analyses, we used the average iSNV frequency in the two replicates as the iSNV frequency. We excluded samples with greater than 50 iSNVs due to sequencing quality concerns. Further, as described in [11], iSNVs found only in recipient samples were excluded from analysis due to the low probability of the genetic difference emerging post transmission event.

Clonal mutation determination

For the clonal mutation analysis, we counted the number of clonal mutations on a per-pair basis. Clonal mutations within a transmission pair are sites that (a) do not contain an iSNV and (b) have different nucleotides in each member of the transmission pair. To account for sequence degradation and artefacts typical to ends of IAV segments, we excluded any sites outside of the first start codon and last stop codon per segment.

Delineation of transmission pairs

To generate phylogenetic trees, consensus sequences were aligned using MAFFT (v7) [43]. Maximum likelihood phylogenetic trees were generated using IQTree (v2.4.0) with a GTR model [44]. Trees were visualized and annotated using ggtree [45].

Bottleneck size estimation

We utilized two methods to calculate transmission bottleneck size. The first method [11] uses iSNV data to determine pair-scale transmission bottleneck sizes. To obtain bottleneck estimates for the entire cohort or subgroup, we used an average weighted by the number of iSNVs per pair calculation [11]. The second method [6] uses clonal mutation distributions to determine transmission bottleneck sizes across groups of pairs. This method does not allow for pair-scale transmission bottlenecks to be calculated; instead, a distribution is fit to clonal mutation frequency data to estimate the bottleneck. For overall and subgroup analyses, separate distributions were generated and fit to obtain bottleneck estimates.

When iSNVs were utilized for bottleneck analysis, it was necessary to assess whether the consensus base at a particular locus was the same between both members of the transmission pair. In cases where both members of a pair contained iSNVs at a particular locus and, at that locus, they had the same consensus and alternative bases, iSNV frequencies were not altered. In most cases, only one member of a transmission pair contained an iSNV at a particular locus. If the consensus base was concordant (i.e., the same between both members of the transmission pair), the non-iSNV-containing member was assigned an iSNV frequency of 0. If the consensus base was discordant and the alternative base was the same as the consensus base in the opposite pair member, the iSNV frequency of 1 was assigned to the non-iSNV-containing member.

Definition of cofactors

For secondary subgroup analysis, we used participant-reported metadata to define host cofactors. We defined vaccination as receipt of a season-specific influenza vaccine at least 14 days prior to enrollment. We defined sex based on subject self-report. We treated age as a dichotomous variable: individuals < 18 years as children and individuals ≥ 18 as adults. We subset the overall transmission pair population based on these defined cofactors within each role for pair members as described in Results.

Simulated power analysis

To determine ability to discern between a bottleneck of 1 and other values given small sample sizes, we simulated 100 unique subgroups of different empirical average bottleneck sizes with 5 pairs per group (the size of our analyzed smallest subgroup). We assumed bottleneck size follows a zero-truncated Poisson distribution with lambda equaling average bottleneck size. While this assumption may not best represent the empirical distribution of bottleneck sizes, current sampling limitations prevent full exploration of the distribution space. Based on this distribution, we generate 100 simulated subgroups (5 pairs per group) with empirical bottlenecks of 1, 3, 5, and 7 (S5 Fig). We then did an all-pairwise comparison of groups with an average bottleneck of 1 to those with an average bottleneck of 3, 5, and 7. We quantify statistical significance using a Mann-Whitney U test (p = 0.05).

Data and code availability

Bottleneck estimation and all figures were generated using R (R version 4.4.2). Complete code to produce all paper figures can be found at https://github.com/lauringlab/IAV_bottleneck_Flutes_RVTN. Raw sequence reads are available on NCBI SRA, Bioproject PRJNA1085292 (FluTES) and PRJNA1303715 (RVTN-S).

Supporting information

S1 Fig. Sample phylogenetic trees.

(A-D) Phylogenetic trees of all samples. Tips are colored based on household membership with colors in separate trees representing non-related households. Individual panels represent unique reference strains as follows: (A) A/Michigan/2017 A(H1N1)pdm09, (B) A/Singapore/2018 A(H3N2), (C) A/Brisbane/2019 A(H1N1)pdm09, (D) A/Darwin/2021 A(H3N2).

https://doi.org/10.1371/journal.ppat.1014079.s001

(TIF)

S2 Fig. Timing of each pair relative to the index case.

Dots represent all individuals included in our final analysis set. Lines connect members of a transmission pair. Individuals with the earliest symptom onset date are defined as the index case. In cases where two individuals in a single household have an earliest-for-household onset date, both cases are assigned as index cases.

https://doi.org/10.1371/journal.ppat.1014079.s002

(TIF)

S3 Fig. Transmission pairing schema.

Individual samples are assigned into pairs based on the date of symptom onset. Each arrow indicates a transmission pairing with the arrowhead pointing towards a recipient. Created in BioRender. Krupinsky, K. (2026) https://BioRender.com/q34qakq.

https://doi.org/10.1371/journal.ppat.1014079.s003

(TIF)

S4 Fig. Individual pair bottlenecks.

Bottleneck estimates for individual transmission pairs using beta-binomial model (top) and number of iSNVs used in each estimate (bottom). Filled in (teal) bars represent pairs with A(H1N1)pdm09; open (white) bars represent pairs with A(H3N2). iSNVs were used if they were found in the donor pair-member or both pair-members (i.e., iSNVs were not used if found exclusively in recipient pair-member). Stars on bars represent maximum likelihood estimates greater than 15.

https://doi.org/10.1371/journal.ppat.1014079.s004

(TIF)

S5 Fig. Simulated subgroups to detect bottleneck size.

(A, C, E) Distributions of bottleneck size for virtual subgroups of 5 simulated transmission pairs with empirical bottleneck size of 1 (blue), 3 (red in A), 5 (pink in C), and 7 (orange in E). (B, D, F) Percentage of all-pairwise comparisons significantly different by the Mann-Whitney U test (p = 0.05) for each row.

https://doi.org/10.1371/journal.ppat.1014079.s005

(TIF)

Acknowledgments

We thank all individuals who participated in this study. KK was supported by the Molecular Mechanisms in Microbial Pathogenesis Training Program, NIH T32 AI007528. The FluTES study was supported by CDC U01IP001083 (to HKT, CGG) and the RVTN-S study was supported by CDC 75D30121C11656 (to HKT, CGG). Sequencing and analysis in the Lauring lab was supported by NIH R01 AI148371 (to ASL, ETM) and Penn-CEIRR, NIH 75N93021C00015 (to ASL, ETM).

References

  1. 1. Kim H, Webster RG, Webby RJ. Influenza Virus: Dealing with a Drifting and Shifting Pathogen. Viral Immunol. 2018;31(2):174–83. pmid:29373086
  2. 2. Harrington WN, Kackos CM, Webby RJ. The evolution and future of influenza pandemic preparedness. Exp Mol Med. 2021;53(5):737–49. pmid:33953324
  3. 3. Lauring AS, Tenforde MW, Chappell JD, Gaglani M, Ginde AA, McNeal T, et al. Clinical severity of, and effectiveness of mRNA vaccines against, covid-19 from omicron, delta, and alpha SARS-CoV-2 variants in the United States: prospective observational study. BMJ. 2022;e069761. pmid:35264324
  4. 4. Markov PV, Ghafari M, Beer M, Lythgoe K, Simmonds P, Stilianakis NI, et al. The evolution of SARS-CoV-2. Nat Rev Microbiol. 2023;21(6):361–79. pmid:37020110
  5. 5. Petrova VN, Russell CA. The evolution of seasonal influenza viruses. Nat Rev Microbiol. 2018;16(1):47–60. pmid:29081496
  6. 6. Shi YT, Harris JD, Martin MA, Koelle K. Transmission bottleneck size estimation from de novo viral genetic variation. Mol Biol Evol. 2023;41(1):msad286. pmid:38158742
  7. 7. Lauring AS. Within-host viral diversity: a window into viral evolution. Annu Rev Virol. 2020;7(1):63–81. pmid:32511081
  8. 8. Sigal A, Neher RA, Lessells RJ. The consequences of SARS-CoV-2 within-host persistence. Nat Rev Microbiol. 2025;23(5):288–302. pmid:39587352
  9. 9. Varble A, Albrecht RA, Backes S, Crumiller M, Bouvier NM, Sachs D, et al. Influenza A virus transmission bottlenecks are defined by infection route and recipient host. Cell Host Microbe. 2014;16(5):691–700. pmid:25456074
  10. 10. Emmett KJ, Lee A, Khiabanian H, Rabadan R. High-resolution Genomic Surveillance of 2014 Ebolavirus Using Shared Subclonal Variants. PLoS Curr. 2015;7. pmid:25737802
  11. 11. Sobel LA, Weissman DB, Greenbaum B, Ghedin E, Koelle K. Transmission bottleneck size estimation from pathogen deep-sequencing data, with an application to human influenza A virus. J Virol. 2017;91(14):e00171–17. pmid:28468874
  12. 12. Ghafari M, Lumby CK, Weissman DB, Illingworth CJR. Inferring transmission bottleneck size from viral sequence data using a novel haplotype reconstruction method. J Virol. 2020;94(13).
  13. 13. McCrone JT, Woods RJ, Martin ET, Malosh RE, Monto AS, Lauring AS. eLife. 2018;7:e35962.
  14. 14. McCrone JT, Woods RJ, Martin ET, Malosh RE, Monto AS, Lauring AS. Stochastic processes constrain the within and between host evolution of influenza virus. eLife. 2018.
  15. 15. Perofsky AC, Huddleston J, Hansen C, Barnes JR, Rowe T, Xu X. Antigenic drift and subtype interference shape A(H3N2) epidemic dynamics in the United States. eLife. 2024.
  16. 16. Vom Steeg LG, Klein SL. Sex and sex steroids impact influenza pathogenesis across the life course. Semin Immunopathol. 2019;41(2):189–94. pmid:30298431
  17. 17. Gounder AP, Boon ACM. Influenza pathogenesis: The effect of host factors on severity of disease. J Immunol. 2019;202(2):341–50.
  18. 18. Ferdinands JM, Thompson MG, Blanton L, Spencer S, Grant L, Fry AM. Does influenza vaccination attenuate the severity of breakthrough infections? A narrative review and recommendations for further research. Vaccine. 2021;39(28):3678–95. pmid:34090700
  19. 19. Rolfes MA, Talbot HK, McLean HQ, Stockwell MS, Ellingson KD, Lutrick K, et al. Household transmission of influenza A viruses in 2021-2022. JAMA. 2023;329(6):482–9.
  20. 20. Carrat F, Vergu E, Ferguson NM, Lemaitre M, Cauchemez S, Leach S, et al. Time lines of infection and disease in human influenza: a review of volunteer challenge studies. Am J Epidemiol. 2008;167(7):775–85. pmid:18230677
  21. 21. Chung JR. Interim estimates of 2021–22 seasonal influenza vaccine effectiveness — United States, February 2022. MMWR Morb Mortal Wkly Rep. 2022;71.
  22. 22. Dawood FS. Interim estimates of 2019–20 seasonal influenza vaccine effectiveness — United States, February 2020. MMWR Morb Mortal Wkly Rep. 2020;69.
  23. 23. Xu X, Blanton L, Elal AIA, Alabi N, Barnes J, Biggerstaff M, et al. Update: Influenza Activity in the United States During the 2018-19 Season and Composition of the 2019-20 Influenza Vaccine. MMWR Morb Mortal Wkly Rep. 2019;68(24):544–51. pmid:31220057
  24. 24. Flannery B. Interim estimates of 2017–18 seasonal influenza vaccine effectiveness — United States, February 2018. MMWR Morb Mortal Wkly Rep. 2018;67.
  25. 25. Bendall EE, Zhu Y, Fitzsimmons WJ, Rolfes M, Mellis A, Halasa N, et al. Influenza A virus within-host evolution and positive selection in a densely sampled household cohort over three seasons. Virus Evol. 2024;10(1):veae084. pmid:39444487
  26. 26. VanInsberghe D, McBride DS, DaSilva J, Stark TJ, Lau MSY, Shepard SS, et al. Genetic drift and purifying selection shape within-host influenza A virus populations during natural swine infections. PLoS Pathog. 2024;20(4):e1012131. pmid:38626244
  27. 27. Zwart MP, Elena SF. Matters of Size: Genetic Bottlenecks in Virus Infection and Their Potential Impact on Evolution. Annu Rev Virol. 2015;2(1):161–79. pmid:26958911
  28. 28. Holmes KE, VanInsberghe D, Ferreri LM, Elie B, Ganti K, Lee CY, et al. Viral expansion after transfer is a primary driver of influenza A virus transmission bottlenecks. bioRxiv. https://www.biorxiv.org/content/10.1101/2023.11.19.567585v3. 2025. Accessed 2025 July 21.
  29. 29. Morris DH, Petrova VN, Rossine FW, Parker E, Grenfell BT, Neher RA, et al. Asynchrony between virus diversity and antibody selection limits influenza virus evolution. Elife. 2020;9:e62105. pmid:33174838
  30. 30. Cori A, Valleron AJ, Carrat F, Scalia Tomba G, Thomas G, Boëlle PY. Estimating influenza latency and infectious period durations using viral excretion data. Epidemics. 2012;4(3):132–8. pmid:22939310
  31. 31. Choudhary MC, Crain CR, Qiu X, Hanage W, Li JZ. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequence characteristics of coronavirus disease 2019 (COVID-19) persistence and reinfection. Clin Infect Dis. 2022;74(2):237–45.
  32. 32. Bendall EE, Callear AP, Getz A, Goforth K, Edwards D, Monto AS, et al. Rapid transmission and tight bottlenecks constrain the evolution of highly transmissible SARS-CoV-2 variants. Nat Commun. 2023;14(1):272. pmid:36650162
  33. 33. Sinclair P, Zhao L, Beggs CB, Illingworth CJR. The airborne transmission of viruses causes tight transmission bottlenecks. Nat Commun. 2024;15(1):3540. pmid:38670957
  34. 34. Valesano AL, Taniuchi M, Fitzsimmons WJ, Islam MO, Ahmed T, Zaman K, et al. The Early Evolution of Oral Poliovirus Vaccine Is Shaped by Strong Positive Selection and Tight Transmission Bottlenecks. Cell Host Microbe. 2021;29(1):32–43.e4. pmid:33212020
  35. 35. Gallet R, Fabre F, Thébaud G, Sofonea MT, Sicard A, Blanc S. Small bottleneck size in a highly multipartite virus during a complete infection cycle. J Virol. 2018;92(14):e00139–18. pmid:29720515
  36. 36. Trende R, Darling TL, Gan T, Wang D, Boon ACM. Barcoded SARS-CoV-2 viruses define the impact of duration and route of exposure on the transmission bottleneck in a hamster model. Sci Adv. 2025;11(3):eads2927. pmid:39813353
  37. 37. Sasaki S, Sullivan M, Narvaez CF, Holmes TH, Furman D, Zheng N-Y, et al. Limited efficacy of inactivated influenza vaccine in elderly individuals is associated with decreased production of vaccine-specific antibodies. J Clin Invest. 2011;121(8):3109–19. pmid:21785218
  38. 38. Thomas CM. Early and increased influenza activity among children — Tennessee, 2022–23 influenza season. MMWR Morb Mortal Wkly Rep. 2023.
  39. 39. Hoffmann E, Stech J, Guan Y, Webster RG, Perez DR. Universal primer set for the full-length amplification of all influenza A viruses. Arch Virol. 2001;146(12):2275–89. pmid:11811679
  40. 40. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–9.
  41. 41. Xue KS, Bloom JD. Reconciling disparate estimates of viral genetic diversity during human influenza infections. Nat Genet. 2019;51(9):1298–301. pmid:30804564
  42. 42. Grubaugh ND, Gangavarapu K, Quick J, Matteson NL, De Jesus JG, Main BJ, et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 2019;20(1):8. pmid:30621750
  43. 43. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–80.
  44. 44. Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32(1):268–74. pmid:25371430
  45. 45. Yu G, Smith DK, Zhu H, Guan Y, Lam TT. ggtree  : an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol. 2016;8(1):28–36.