Library-based analysis reveals segment and length dependent characteristics of defective influenza genomes

Found in a diverse set of viral populations, defective interfering particles are parasitic variants that are unable to replicate on their own yet rise to relatively high frequencies. Their presence is associated with a loss of population fitness, both through the depletion of key cellular resources and the stimulation of innate immunity. For influenza A virus, these particles contain large internal deletions in the genomic segments which encode components of the heterotrimeric polymerase. Using a library-based approach, we comprehensively profile the growth and replication of defective influenza species, demonstrating that they possess an advantage during genome replication, and that exclusion during population expansion reshapes population composition in a manner consistent with their final, observed, distribution in natural populations. We find that an innate immune response is not linked to the size of a deletion; however, replication of defective segments can enhance their immunostimulatory properties. Overall, our results address several key questions in defective influenza A virus biology, and the methods we have developed to answer those questions may be broadly applied to other defective viruses.


Introduction
It has long been known that the total number of particles in a viral population far exceeds the number of infectious particles. [1] Early work on this phenomenon revealed that, in addition to fully-infectious particles, there are semi-infectious, interferon-suppressive, and defective-interfering particles (DIPs). [2][3][4][5][6] The latter is marked by their capacity to inhibit, rather than enhance, infection. Interference by DIPs can arise from direct competition for resources within an infected cell, exclusion of replication-competent viruses during packaging, or increasing the production of interferons, key signaling components of cell-intrinsic innate immunity. [7][8][9][10] While the genetic basis of DIPs is idiosyncratic, differing from virus to virus, a defining characteristic is their capacity to, despite their defects, rise to high frequencies in viral populations when complemented by wild-type virus in coinfections.
Due to their association with coinfections, DIPs were once largely thought to be artifacts of laboratory-grown viruses. [11] However, clinical sequencing has revealed that they are a common component of viral infections and can be associated with disease outcome. [12][13][14] Their potential ability to modulate the course of disease has now sparked a renewed interest in their therapeutic potential. [15][16][17][18] For influenza A virus, a (-) sense, segmented, RNA virus, defective particles, interfering or otherwise, largely bear genomes with large internal deletions in the three polymerase-encoding segments. [19][20][21][22] These deletions generally reduce the size of each segment from *2.3kb to *400nt. [13,23] It is unknown what drives this size and segment preference. Understanding deletions, from formation through selection, is challenging. Studying de novo deletion formation, a stochastic event, requires single-cell resolution, as was performed by Kupke et al. (2020) [24], or a highly sensitive sequencing approach in order to disambiguate rare production of deletions from artifacts such as template-switching. [25] Even with those controls, observed deletions remain subject to selective pressures from genome replication. Rather than focus on their formation, and the challenges that presents, we instead chose to study selection on deletions after they form in order to understand if these pressures may nevetheless provide an understanding of the ecology of these particles.
Using a combination of PCR and Gibson assembly we seeded an initial population with a variety of deletions and duplications that generated a broad range of lengths, and measured changes in this distribution as a function of progression through the viral life cycle. While validated full-length, amplicon-based, approaches exist for sequencing influenza A virus, they exhibit length-dependent biases, which can artifactually generate apparent differences in deletion abundance due to differences in saturation kinetics. [26] To overcome this limitation, we instead used an approach wherein individual barcoded adapters were associated with a particular deletion, or duplication. Our libraries were similar to those generated using RanDeL-Seq, a recent effort wherein deletions were generated in HIV and Zika genomes using random transposon insertions and exonuclease digestion. [27] Using our artificial libraries, we demonstrate that deletions in influenza segments lead to significant advantages during genome replication, explaining their abundance. Certain deletions are excluded as populations expand, both in terms of size and segment identity, explaining, at least in part, the observed distribution in final viral populations. Lastly, we find that deletions can generate aberrant sequences that can drive interferon induction, but their stimulatory behavior does not appear to be correlated with their length, and is enhanced by genome replication.

PCR-based generation of length polymorphic deletion/duplication libraries
The observed frequencies of variants within an initially clonal population are not only reflective of the fitness benefits, or drawbacks, of any given variant, but also the timing with which they are formed. [28] In other words, the distribution of defective segments is highly confounded by whether a deletion forms early in population expansion or late. To better understand how defective influenza species may compete amongst one-another, we designed a library-based approach wherein we introduce, contemporaneously, a distribution of more than 100 possible lengths representing over 1,000 possible junction combinations into an influenza genomic segment. We thereafter track the distribution of variants within each library as a function of progression through the viral life cycle. Our library approach is described in Fig 1A. In brief, we computationally designed primers in both the forward and reverse orientations across target influenza segments staggered every 25nt for the first third of the segment, 50nt for the second, and 100nt for the last. The first and last 75nt of each segment were excluded as they contain the viral promoter and critical packaging sequences. [9,[29][30][31][32] Each individual primer with the same orientation was paired with a specific, shared, primer, and individual PCRs were performed to generate forward and reverse flanks of staggered sizes. An additional round of PCR was performed on reverse flanks to add a 57bp adapter containing a 12bp random barcode, flanked by cut sites for the type IIS restriction enzyme BtgZI. Gibson assembly was then used to randomly pair forward and reverse flanks, reconstructing a library consisting of both deletions and duplications. [33] Taking advantage of the capacity of type IIS restriction enzymes to cleave outside of their restriction sites, we released barcoded adapters with 14/10nt of flanking sequence with a BtgZI digest. By sequencing these adapters, we may link specific deletions, or duplications, to their corresponding barcode. Thereafter, to infer junction abundance we can sequence across our variable barcodes from constant regions in our adapter. Our barcoding schema avoids the length-bias of target amplicon approaches to sequencing influenza, which overestimate the abundance of shorter species relative to longer. [26] Influenza A virus defective particles largely consist of those bearing large deletions in the three polymerase-encoding segments, as deletions in the remaining five segments appear only rarely in viral populations. [19][20][21][22] There is currently no single explanation for this observation. Possibilities include differences between segments in the rate of deletion formation, replication, or even packaging. [10,34] To explore whether features other than biases in formation may contribute to the observed differences in abundance, we chose to generate libraries in a single polymerase segment, PB1, encoding the core viral polymerase, and a single non-polymerase segment, HA, encoding the viral receptor-binding and fusion protein hemagglutinin. In order to focus solely on length and sequence composition, and not on whether a given segment produces an essential protein, we used variants of PB1 and HA with mutated start codons. While these variants will still generate products from alternative start codons (such as PB1-F2), as well as "UFO" proteins from upstream start codons generated from cap-snatching, the absence of the essential PB1 or HA proteins means these viruses are not viable, and thus we should not see selection on viability. [35,36] Using the schema described above, we expect 1,936 possible unique junctions for PB1 and 1,225 possible junctions for HA. After library As templates for each library, we used vectors encoding each segment under control of a polymerase I-dependent system for generating authentic viral RNAs. For each library, a set of staggered primers were used in individual PCR reactions with an external primer. Further PCR was used to append a 57nt adapter with a random, 12nt, barcode. Products were then randomly combined in a pooled Gibson assembly reaction. To link barcodes to duplication or deletion junctions, barcodes and flanking regions were freed from the plasmid pool with a BtgZI digest and sequenced. Primers used to generate the library are in S1 File. (B) Theoretical and actual length distributions of libraries. Lengths do not include the 57nt adapter. Bins in this and all other panels are 100nt in length, numbers represent the right edge of each bin. Dotted lines show the length of wild-type PB1 and HA. (C) Libraries contain most of the theoretical diversity. Each point represents a single library.

Shorter segments are preferentially amplified during genome replication
The first question we wished to address with these libraries was whether length, independent of other features of the viral life cycle, influences genome replication. There are conflicting reports concerning whether shorter segments replicate faster, and thus outcompete their longer counterparts. [10,37,38] To make these measurements, we co-transfected libraries with plasmids encoding the minimal influenza genome replication machinery-PB2, PB1, PA, and NP-and measured barcode abundance in viral genomes after 24 hours. To determine enrichment, or depletion, of a given barcode, we generated an orthogonal dataset consisting of the same plasmid libraries transfected with PB1, PA, and NP, but not PB2 (S1 Fig). By generating a dataset free from PB2, we corrected for length-dependent transcription and degradation biases from the polymerase I-dependent viral RNA transcription system; without PB2 these RNA species are neither replicated nor stabilized by the remaining components of the viral replication machinery. [39] Comparing the composition of libraries between replication-incompetent, and -competent, conditions, there is a noticeable shift towards smaller variants (Fig 2A). Plotting enrichment as a function of length, these two features are negatively correlated, with Spearman correlation coefficients of -0.86 and -0.92 for PB1 and HA respectively ( Fig 2B). Smaller segments that arise due to deletions would therefore be expected to possess a considerable replication advantage over their longer counterparts. This observation is in line with prior work from Widjaja et al. (2012) [40] showing shorter, Gaussia, luciferase was preferentially replicated by viral replication machinery over longer, firefly, luciferase. To confirm this conclusion, we generated PB1 and HA segments of defined length (200, 400, 800, and 1600nt) consisting of equal numbers of nucleotides from the 5' and 3' termini of each segment. As in our libraries, we introduced an additional 57nt adapter with a barcode; however, rather than a random barcode we instead used a defined barcode to delineate each size. A control qPCR demonstrates our ability to disambiguate each of these barcodes from one-another, with greater than 1000-fold discrimination for target versus off-target amplification (S2 Fig). As expected, there remains a clear inverse relationship between enrichment in a genome replication assay and length for both segments at a Spearman correlation coefficient of -0.95 for both PB1 and HA ( Fig 2C).
However, replication using the minimal viral polymerase machinery does not truly recapitulate the full suite of feedback mechanisms and inter-segment competition that occurs during bona fide viral infection. To confirm if length plays a role in genome replication during actual viral infection we generated two viral populations containing our marked 400nt and 1600nt variants of either HA or PB1-rescuing such populations by coinfection with a wild-type virus to provide missing essential viral genes. We then measured how the ratio of these two species changed during genome replication in an infection of the human lung epithelial carcinoma cell line, A549, at an MOI of 25. This high MOI increases the probability that an infected cell possesses both marked species competing with one-another, as well as our complementing, wild-type, virus. This high MOI also excludes the effects of second rounds of infection due to superinfection exclusion (S3 Fig). [41,42] At 14 hours we measured the shift in ratio of these two species in these cells and found, as in our transfection assays, a considerable advantage to shorter, over longer, species. Curiously, this effect was slightly stronger for PB1 than for HA (Fig 2D). Therefore a length-dependent replication advantage explains how these parasitic,

PLOS PATHOGENS
Segment and length dependent characteristics of defective influenza genomes deleterious, variants rise to frequencies exceeding their longer, replication-competent, progenitors.

Length-dependent selection reshapes PB1 and HA distributions as viral populations expand
Genome replication is only one step in the viral life cycle at which defective viral genomes may be favored or disfavored. To expand our library study to aspects of the viral life cycle uncaptured in our genome replication assay, we transfected cells with our libraries and the viral replication machinery and waited 24 hours. Thereafter, we infected these transfected cells with Points were only shown if represented in all three libraries under both conditions R is the Spearman correlation coefficient. n = 3. (C) qPCR supports the conclusion that genome replication is inversely correlated with length. Vectors encoding variants of four seperate lengths (200, 400, 800, and 1600nt) were transfected into 293t cells with and without the full viral genome replication machinery and analyzed by qPCR 24 hours post transfection. The fraction of each variant was calculated relative to the longest variant (1600nt), and this value compared between replication-competent and -incompetent datasets to determine the relative enrichment attributable to genome replication. R is the Spearman correlation coefficient. n = 3. Validation of qPCR in S2 Fig. (D) Shorter defective viral genomes are enriched during genome replication in viral infection. A549 cells were infected with mixed populations consisting of 400 and 1600nt variants of either HA or PB1 and a complementing wild-type strain at a combined MOI of 25, as measured in genomic equivalents by qPCR. Cells were then washed to remove any additional virus at 2 hours post infection, and, at 14 hours post infection supernatant was removed and cellular RNA harvested. The initial ratio of these two variants was measured in the viral inoculum by qPCR, and compared with the ratio in the cellular fraction. Both PB1 and HA 400nt were enriched over 1600nt by one-tailed t-test, Benjamini-Hochberg corrected FDR <0.05. The effect was stronger for PB1 than HA, two-tailed t-test, p<0.05. Each point is a single replicate, lines are the mean. n = 3. Validation that second rounds of infection are excluded is in S3 Fig. https://doi.org/10.1371/journal.ppat.1010125.g002 wild-type virus and allowed infection to proceed for 72 hours. During this time, variants within each population could be complemented by coinfection with wild-type virus and compete with one-another. We then harvested clarified supernatant and sequenced those variants that had successfully become incorporated into the viral population.
Comparing the viral supernatant to genome replication-only controls, there are divergent patterns of enrichment between HA and PB1 libraries (Fig 3A). For HA, there is a direct, positive, relationship between length and enrichment in the viral supernatant, with considerable advantage to longer, over shorter, segments (Spearman correlation coefficient of 0.94) ( Fig  3B). For PB1, small segments, <400nt of natural sequence, are excluded from the viral population but otherwise preferences are relatively flat, indicating little change in these populations after they were shaped during genome replication. As such there is a much more modest correlation between length and enrichment at a Spearman correlation coefficient of 0.55. To confirm that PB1 and HA exhibit divergent relationships between length and competition as viral populations expand, we used defined 400nt and 1600nt variants and repeated selection as performed for our more extensive libraries. As expected, qPCR analysis finds more robust enrichment of a 1600nt length segment over a 400nt counterpart in HA than in PB1 ( Fig 3C). Therefore, features other than genome replication appear to act to largely exclude, or at least blunt the advantages of, deletions in HA relative to those in PB1.
The enrichments in Fig 3B are the results of selection pressure during several steps in the viral life cycle, including additional rounds of genome replication, a requirement for complementation during coinfection with wild-type virus, competition for packaging into viral particles, and the timing and magnitude of viral release. If genome replication alone were the only selection pressure acting on our libraries, we would anticipate that our enrichment curves would be similar to those procured in our initial experiments. To explain our observation, there must exist a length-dependent negative selection that counteracts the advantage of smaller, over larger, variants during genome replication. While initially thought to be a step at which defective viral genomes are preferred, recent data indicate that packaging may instead exclude these species from the population. [34,37] To determine whether exclusion during packaging may benefit larger, over smaller, variants, we performed experiments as in Fig 2D, but now measured how variant composition differs between the supernatant and the cell fraction at 14 hours post infection. Similar to results from Alnaji et al. (2021) [37], we conclude that this step exerts a potent negative selection on our 400nt, relative to our 1600nt, variant ( Fig 3D). This effect was slightly stronger for PB1 than for HA, and thus cannot explain the advantages to PB1 deletions over HA that we observe over the course of viral expansion. Other pressures, as yet unexplored, must also contribute to the observed shifts in population length distribution.
Assaying the entire process, from polI transcription to successful encapsidation in viral particles in the supernatant, we see a highly stereotyped enrichment curve for both PB1 and HA, with maxima close to 400nt in length ( Fig 3E). Variants with a final length centering around this size become enriched in our library, whereas those that are smaller or larger become depleted. The magnitude of this effect, as measured from the peak enrichment to 1600nt, where preferences flatten but our measurements remain relatively robust, was consistently greater for PB1 than for HA, indicating an increased competitive fitness for smaller species in the former over the latter (Fig 3F). This process was highly replicable between independentlygenerated libraries with inter-replicate Pearson correlation coefficients of *0.75 when comparing across the enrichment or depletion of individual junctions (S5 Fig). These maxima correlate well with previously-identified "sweet-spot" lengths from continuous culture propagation of influenza defective particles. [23] Taken together, our data indicate that selection on defective segments, both positive and negative, is sufficient to recapitulate their Maximally-enriched size bin for each library replicate was compared to matched values at 1600nt. n = 3. For panels (C), (D), and (F), each point represents a single measurement, lines represent means. All measurements significantly differed from no enrichment of the measured species by one-tailed t-test, with Benjamini-Hochberg corrected FDR <0.05 within each panel. Significant differences between HA and PB1 noted with asterisks, p<0.05, two-sample t-test.

Deletions in the influenza genome are associated with interferonproducing cells
The work of Tapia et al. (2013) [8], since repeated by others, demonstrated that populations replete in defective influenza particles exhibit enhanced interferon induction. [43] To explore further, we wished to probe whether our libraries capture the variation responsible for this increase in the innate immune response. Unlike our previous selections, where viral replication and packaging served to shape variant distribution, understanding how variants may influence innate immune detection is less straightfoward. Potentially solving this problem, prior work indicates that observed increases in interferon induction in response to influenza variants are frequently due to more cells producing interferon rather than an increase in the amount of interferon each infected cell produces. [44][45][46] As such, if we sort infected cells based on interferon production, we would anticipate that we may be able to co-enrich stimulatory variants in our libraries.
Before applying this selection to our libraries, we first sought to understand whether it could recapitulate the earlier findings that deletions broadly associate with interferon induction. To this end, we first generated two wild-type influenza A virus populations that contained an intermediate burden of defective particles, independently, from reverse-genetics onwards (S6 Fig). By working with a population where defective particles are neither rare nor overwhelming, we allow the potential for enrichment of deletions in interferon-producing cells; otherwise, if all cells are infected by defective viruses we may be unable to achieve any additional enrichment, and if few to none are we may lack the sensitivity to detect their presence. We next infected an A549 cell line expressing the cell-surface marker LNGFR under control of the IFNB1 promoter and sorted populations into interferon-enriched, and -depleted, subsets using magnetic-activated cell sorting. [46] Prior to sequencing viral genomes from these populations, we first validated our sorting efficiency with total mRNA sequencing of both populations ( Fig 4A). As expected, there is significant enrichment of the gene encoding the most highly-transcribed type I interferon in A549 cells, interferon beta (IFNB1). We also find coenrichment of type III interferon transcripts (IFNL1, IFNL2, and IFNL3). Additional enriched genes include a number of interferon-stimulated genes, suggesting either direct transcription of a subset of ISG loci by IRF3, or a high level of autocrine signaling experienced by interferon-producing cells.
We next used a full-length PCR-based strategy to sequence influenza genomes. [47] While length-biased, we sought to reduce this bias by initiating reverse transcription with equivalent viral RNA, and limiting cycles of PCR. We identified reads containing deletion junctions by mapping continuous reads with STAR, identifying discontinuous mapping in unmapped reads with BLAST, collapsing all identified junctions into a single annotation file for each biological replicate, and then remapping discontinuous reads using this BLAST-annotated STAR index. [48,49]. We required that a deletion had three mapped bases to either side of a junction and that read pairmates were concordant. As we used paired-end reading, deletion junction counts were collapsed to a per fragment value rather than per read value. As expected from the highly stochastic nature of deletion formation, while junction counts between infection replicates using the same biological population are highly correlated, those from divergent biological replicates exhibit no detectable correlation to one-another (S7 and S8 Figs).
Deletion-spanning fragments are predominantly derived from the polymerase segments, regardless of whether a sample was enriched or depleted for interferon (S9 Fig). When we compare the fraction of deletion-containing fragments between interferon-enriched and depleted datasets, there is statistically significant enrichment across most segments ( Fig 4B). Mapping depth across the polymerase segments is consistent with this enrichment, showing a slight, but replicable, increase in depth at the 5' and 3' ends of the three polymerase segments relative to the central portion of each gene segment (S10 Fig). Other deletions, while enriched, were too rare to produce visible changes in sequencing depth. To confirm these results were not driven by PCR bias, we developed a qPCR sensitive to increased defective content (S11 Fig). Our design leverages the short extension time common in most qPCR protocols, and so, by designing primers flanking the shared limits of deletion breakpoints between biological replicates we generate a signal that is responsive to the presence of shortened products while producing little to no signal on full-length polymerase segments. We can then correct this gestalt "DI-specific" signal to "full-length" signal generated from qPCR primers that lie nested within most observed deletions. [50] Using this qPCR, we confirm our deep-sequencing results ( Fig  4C).
As deletions preferentially associate with interferon induction, we may ask whether all deletions exhibit an identical relationship to this process. That is, is the mere presence of a deletion sufficient to induce high levels of interferon, or do different deletions exhibit differential induction of interferon pathways? Using complementing cells, we grew a previously-cloned stimulatory PB1 deletion containing 177nt from the 5' end of the vRNA, and 385nt from the 3' end of the vRNA, PB1 177:385 , and a newly-identified deletion from our dataset, PB1 455:350 . [46] Comparing the two, we find that the former retains its capacity in infections to induce high levels of type I interferon transcripts, whereas the latter is no more stimulatory than a nominally wild-type population (Fig 4D). This difference appears to be linked to the identity of the deletion itself, and not due to differences in packaging rates of these two segments. Prior work on influenza A viruses suggests that even upon substantial manipulation, these viruses maintain a canonical 1+7 configuration in viral genomic packaging, and likely package only a single copy of each genomic segment. [51][52][53] Suggesting that this is largely true for these defective viruses, and demonstrating equivalent dose not just of biologically acive particles, but even of defective segments themselves, we find that the ratio of the HA genomic segment to the PB1 genomic segment in viral preparations is unchanged between wild-type and either PB1 177:385 or PB1 455:350 ( Fig 4E).

Segment length and interferon induction do not appear to be correlated
As different deletions can cause divergent levels of interferon stimulation, and as we may sort by interferon production to classify immunostimulatory variation, we may use our sortable cell lines and our libraries to generate comprehensive measurements of how length of a defective segment might impact interferon induction. The increase in interferon observed in infections with defective influenza species has been posited to arise from a preference for shorter molecular species by RIG-I, the major sensor of this virus. [54,55] To first establish whether, as expected, RIG-I is detecting the stimulatory defective species that we are studying, we introduced either of two siRNAs targeting RIG-I to A549 cells and infected with PB1 177:385 . As expected, either RIG-I-targeting siRNA reduced not only levels of RIG-I, but also the production of interferon transcripts in response to PB1 177:385 , consistent with its general role as the predominant sensor of this virus in non-immune cells (S13 Fig). To take things a step further, we wished to generally demonstrate whether, as expected, smaller defective species tend to evoke a stronger innate immune response. Therefore, we infected a type III interferon reporter A549 cell line with our libraries and sorted on interferon production.
We changed to a cell line expressing LNGFR and eGFP under control of a type III interferon promoter over a type I reporter as the sort described in Fig 4 only produced a yield of *5000 cells per sort from a starting population of 20 million cells. Notably, length of a defective segment, if it does influence interferon induction, would only be one variable within our experiment doing so. Timing of infection, autocrine signaling, paracrine signaling, and likely other features, including those intrinsic to the cells we are infecting, will influence the probability a cell produces interferon. Therefore we must sample a sufficient number of cells to average out such behavior over our whole population in order to measure how length of a segment may, or may not, influence innate immune detection. As the theoretical yield based on the fraction of interferon-positive cells ranges from *100,000-200,000 in our previous experiment, our interferon-beta reporter sorting resulted in a greater than 95% loss of material, and would produce far too low a yield to analyze a comprehensive set of variants.

PLOS PATHOGENS
Segment and length dependent characteristics of defective influenza genomes Based on our mRNAseq data, and the prior observation that a type III interferon reporter line produces a stronger signal that still correlates with type I interferon production, we opted to change strategies. [46] Consistent with an improved resolution, we were able to recover *100,000 cells in each of our interferon-enriched sorts, and, by qPCR, our sorting efficiency as measured by type I interferon expression was comparable to that achieved with a type I interferon reporter (Fig 5A and S14 Fig).
Comparing length distributions of our library in interferon-enriched versus interferondepleted populations, we notice no obvious shifts as we did in our other selections (Fig 5B). To explore in greater depth, we looked for evidence for enrichment with interferon as a function of length to look for more subtle shifts that may drive interferon expression (Fig 5C). We see no clear relationship between size and interferon state, with no strong enrichment at any size, at Spearman correlation coefficients of -0.23 and -0.04, for PB1 and HA, respectively. Overall, we fail to find significant evidence that length is a determinant of interferon production.
To probe whether we captured any variation in interferon induction within our libraries, we explored how individual junctions rather than binned sizes behaved in our interferon enrichment sequencing (Fig 5D). Using DESeq2 to analyze our data, there was little detectable variation between interferon-enriched and -depleted datasets in our PB1 library. [56] However, we find slightly more variation within our HA library, and identify a single variant with statistically significant enrichment in our interferon-producing cells. This variant consists of a junction lying 150nt from the 5' end of the HA vRNA, then our 57nt adapter, and lastly a downstream junction lying 475nt from the 3' end of the HA vRNA, for a total length of 682nt (HA 150:bc:475 ). Curiously, the mirror-image deletion, HA 475:bc:150 exhibits mild, not statistically significant, enrichment.
This finding provided us with an opportunity to demonstrate that length is not the sole contributor to whether any given deletion is stimulatory or not. We cloned both HA 150:bc:475 and HA 475:bc:150 , and grew each independently using complementing cells. Thereafter, we tested the capacity of viral populations bearing these mutations to stimulate interferon ( Fig 5E). As predicted by our more comprehensive analysis, HA 150:bc:475 induces approximately 2-fold higher transcription of IFNB1 than HA 475:bc:150 . As our methodology had the capacity to detect this two-fold difference, it suggests that length plays no more than a modest role, at most, in contributing to defective variation in interferon induction. While a surprise, this conclusion is highly consistent with recent experiments in 293t cells expressing viral replication machinery, which found that while length of an influenza segment could influence innate immune detection, the size at which this occurs, <125nt, is far smaller than defective influenza genomes, indicating that observed stimulation by defective viral particles likely arises from some other feature of deletions such as perturbations to RNA secondary structure. [57] Interferon induction by defective segments is sensitive to genome replication If our differentially-stimulatory variants exhibit different affinities for RIG-I or engage in aberrant behavior that indirectly impacts RIG-I signaling, then we might expect that these behaviors would be exacerbated by genome replication. While deletions in the polymerase segments produce defective species that, on their own, cannot engage in genome replication due to a requirement for de novo polymerase synthesis, deletions in non-polymerase segments would not be subject to such restrictions. [58] Therefore, we must ask, is the immune stimulation in our HA deletions contingent upon their ability to engage in genome replication? Interestingly, both HA 150:bc:475 and HA 475:bc:150 exhibit induction of IFNB1 above that of our less stimulatory PB1 variant, PB1 445:bc:350 , suggesting that both act as efficient triggers of innate immunity, even if their potency differs.
To determine whether genome replication may play a role in this process, we used a suppressor of this process, the nucleoside analog, ribavirin. [59,60] While prior studies have used the translation inhibitor cycloheximide, cycloheximide enhances interferon signaling to even pure innate immune agonists, thus complicating conclusions using this inhibitor. [61][62][63] To confirm that we are investigating the effects of genome replication alone, and not pleiotropic impacts on cellular pathways, we compare the impacts of ribavirin treatment concurrently on infections with our less stimulatory PB1 deletion, PB1 445:350 , which is unable to engage in genome replication, and infections with HA 475:bc:150 . Upon ribavirin treatment, HA 475:bc:150 exhibits a massive drop in viral transcription of the PA gene, consistent with reduced template

PLOS PATHOGENS
Segment and length dependent characteristics of defective influenza genomes vRNA molecules for mRNA synthesis (Fig 6A). PB1 445:350 exhibits no drop as any mRNA transcribed in this virus must be from the initiating viral genome. With matched transcription levels between our deletions in HA and PB1, we now see a decrease in interferon induction for HA 475:bc:150 but none for PB1 445:350 , demonstrating that genome replication is a key determinant of interferon induction even for defective influenza species.
Wishing to demonstrate that replication and identity are features more critical than size in interferon induction, and that this effect is likely not restricted to HA alone, we cloned an incredibly small defective species in the neuraminidase segment (NA), NA 189:76 , only 265nt in size. As complementation of neuraminidase is difficult to do in trans, we replaced the coding sequence for hemagglutinin with neuraminidase, and grew this variant on cells which complement for hemagglutinin, similar to a previous construction of an eGFP pseudovirus. [64] We then compared the capacity of this variant to stimulate interferon with, and without, ribavirin to our stimulatory PB1 variant, PB1 177:385 (S15 Fig). In doing so we find that NA 189:76 matches, but does not exceed, stimulation by PB1 177:385 in the absence of ribavirin. Moreover, when we add ribavirin, as with our HA variants, we see a suppression of interferon induction by This leads to an interesting question; would PB1 445:350 , noted to have approximately the same capacity to stimulate interferon as wild-type influenza, actually be potently stimulatory if it was capable of replicating its genome? To answer this question, we integrated a PB1 expression cassette into our type III interferon reporter line. This allowed us to measure the effects of complementation on interferon induction by flow cytometry, permitting a more refined measurement delineating whether complementation not only drives increased interferon induction, but also whether it fundamentally does so by increasing the number of cells that successfully detect influenza.
As expected from our model derived from ribavirin treatment, complementation of either PB1 deletion, PB1 445:350 or PB1 177:385 , with PB1 expression in trans greatly increases the fraction of cells participating in the production of interferon, although each variant retains its relative level of stimulation to one-another (Fig 6B). This supports our conclusions that relative interferon induction is not only contingent on the amount, but also the sequence, of the aberrant segment. Combined with our prior observations, we can see that even those deletions that drive modest interferon induction, such as PB1 445:350 , could be quite dangerous to the virus were they to occur in non-polymerase segments.
Following from these data, we wanted to conclusively demonstrate that enhanced replication drives a higher probability an infected cell detects viral infection, and that our complementation experiments weren't simply leading to additional rounds of infection. Using the highly stimulatory PB1 177:385 , we repeated the infection of complemented interferon reporter cells and concurrently stained for the viral protein HA. As transcription and replication are fundamentally linked in influenza viruses, increased protein production is highly correlated with genome replication. [65,66] Therefore, we may use HA staining as a rough proxy for the level of replication within an infected cell. In doing so, we see a positive relationship between the level of HA staining of infected cells and the probability that those cells produce interferon, with a Spearman coefficient of 0.89. This demonstrates that our results were not merely driven by an increased fraction of infected cells, but rather by an increased probability of response in cells with active viral replication (Fig 6C). Indeed, at higher levels of HA staining we find that PB1 177:385 , can induce interferon in as many as 35% of infected cells, rivaling prior measurements of interferon expression in viral infections lacking the critical interferon antagonist NS1. [45]

Coinfection increases interferon induction by defective particles
Seeing that complementation appears to impact innate immune detection of defective polymerase species, we wondered whether this observation could be extended to coinfections, wherein intact polymerase genes are delivered by other viral particles, rather than an integrated expression cassette. In a coinfection we would expect some level of competition between our defective segment and its full-length counterpart, potentially reducing levels of RIG-I ligand. To explore whether coinfection reproduces data consistent with our complementation experiments, we turned to a pre-existing pseudovirus system in which the coding sequence of influenza HA has been replaced with mCherry. By infecting reporter cells with this pseudovirus alongside either wild-type influenza, or PB1 177:385 , we can identify coinfections as those that stain positive for both HA and mCherry. [64,67] Curiously, our pseudovirus alone appears to induce much higher levels of interferon than wild-type influenza, potentially due to the presence of non-native sequence. Regardless, when present in a coinfection with PB1 177:385 a greater fraction of cells express interferon than when infected with either

PLOS PATHOGENS
Segment and length dependent characteristics of defective influenza genomes virus alone (Fig 7A). A similar effect is not seen for wild-type influenza, demonstrating the specificity of this phenomenon to defective influenza species. Thus complementation during coinfection can drive higher levels of interferon induction by deletions in polymerase segments.
Following the logic that complementation from co-infecting viruses can lead to increased detection of defective influenza particles, we might expect that deletions in the polymerase segments drive a proportionately greater innate immune response at higher infectious doses where coinfection is more common. To see if this assumption is true we infected our interferon reporter line at an MOI of either 0.1 or 2 of a low-defective population, and equivalent genome copies of a high-defective population, and measured the fraction of HA-positive cells that were expressing interferon lambda (Fig 7B). With a low-defective particle population, the average rate at which infected cells produce interferon remains largely unchanged between these two MOIs. However, with a high defective population, there is a massive increase in the fraction of interferon-producing cells at a higher MOI. This was not due to a difference in infection rate, as the fraction of HA-expressing cells in each condition between the low and the high defective content viral populations was similar.
This suggests that interferon induction by the vast majority of defective species is a nonlinear process. Their relative importance is dependent upon not only their frequency, but also the rate of complementation as driven by coinfection by viruses bearing full-length segments. Critically, these processes modulate interferon induction, but genome replication is not absolutely required for detection of defective species as demonstrated by stimulation from PB1 177:385 alone. This nicely explains why some groups have concluded that genome replication is wholly dispensible for interferon induction, others have concluded it is modulatory, and yet still more have concluded it is absoloutely essential. [43,54,[68][69][70] Each group was undoubtedly correct for their experimental setup, but the contribution of genome replication to the observed induction of interferons is highly sensitive to the rate of coinfection and the fraction of the population which is defective, and thus different results may be derived even from the same underlying process.

Discussion
We have used a library-based approach to comprehensively explore the relationship between length, replication, competition for participitation in the viral population, and interferon induction in influenza A virus for the polymerase segment, PB1, and the non-polymerase segment, HA. Additionally, we describe how genome replication can contribute to interferon induction by defective species, and demonstrate that both the population composition and the rate of coinfection influence the probability with which these viruses are detected by cell intrinsic innate immunity (Fig 8). Interestingly, during replication our artificial libraries converge on a near-identical length as natural defective populations. [23]. This is the result of positive selection for small segments, particularly in PB1, during genome replication, negative selection during packaging, and, likely, other features that remain to be explored. This suggests that the contrasting pressures during viral growth are sufficient to shape defective influenza populations to their final, observed, lengths, from potentially disparate initial distributions. After deletions form, they are subject to selection during genome replication, which favors smaller products. As viral populations expand, different selection pressures reshape viral populations. In part, selection against small variants during packaging, observed for both PB1 and HA, contributes to this reshaping. During viral growth and expansion, very small (<400nt) PB1 variants are excluded, whereas the HA segment exhibits length dependence throughout, favoring larger products. Combining pressures produces a curve with a deletion fitness maximum of *400 nt, but with a lower overall preference for HA deletions than PB1 deletions. Interferon induction by deletions appears to be largely lengthindependent. Genome replication modulates inteferon induction by deletions, either occuring in single infections for non-polymerase deletions, or during coinfection for polymerase deletions. https://doi.org/10.1371/journal.ppat.1010125.g008

Segment and length dependent characteristics of defective influenza genomes
The advantage to small segments during genome replication could be due to several mechanisms. One possibility is that during genome expansion, after carefully regulated steps of conversion of negative-sense vRNA to positive-sense cRNA and subsequent massive expansion of the viral vRNA pool, that re-initiation of replication upon a template must wait for completion of current rounds of replication. Therefore, smaller segments may simply be able to initiate subsequent rounds of replication more rapidly. Another possibility may be found in studies of nucleoprotein. Nucleoprotein (NP) is required both as a processivity factor in polymerase extension and to stabilize nascent cRNA antigenomes. Smaller RNA species do not require NP, and in its absence mvRNA <125nt may still be readily produced [57] While not quite as small, defective genome segments appear to require less NP, potentially allowing for more rapid progression through the viral life cycle. [71] The disadvantage to small segments during packaging is wholly consistent with recent RNA-RNA interaction experiments performed on intact influenza particles. [72,73] It has now been demonstrated that while the minimal packaging requirements are restricted to the 5' and 3' termini of influenza genomic segments, intersegmental contacts occur throughout and can dramatically impact packaging efficiency. [29,30]. Therefore the loss of many low-affinity contacts that contribute to packaging may serve to exclude defective species from the population.
The combination of genome replication, packaging, and other features we have yet to explore produces a highly replicable maximum fitness, even between PB1 and HA, at 400nt, not smaller, nor larger, with a preference for PB1 deletions over HA. Given recent interest in highly-fit defective particles as potential therapeutics, our results demonstrate that the consideration of multiple, contrasting, selection pressures is required to rationally design defective genomes that may compete and propagate throughout an otherwise replication-competent population.
In contrast to the other features we explored, we were surprised to find that interferon induction by influenza A virus defective particles is, at least largely, length-independent. This finding is consistent with similar work on Paramyxoviridae, which generate short, copy-back defective genomes. Work on these viruses found that it was not the length of these genomes, nor their double-strandedness, that leads to their detection, but rather changes in their RNA secondary structure. [74] It seems likely that deletions in influenza likewise perturb otherwise non-stimulatory sequence composition, and, by chance, these perturbations can generate potent RIG-I ligands. How can we explain our results in light of the finding that small, likely defective, genomes associate with RIG-I? [55] With the following logic we may show that this apparent contradiction may be explained: 1) perturbation in natural sequence may generate potent ligands for RIG-I, 2) most deviations from wild-type sequence, particularly those that would drive interferon induction, would be purged from the population, and 3) largely due to advantages during genome replication, defective particles, despite their defects, can nevertheless rise to high levels in the viral population. Therefore, sequence differences that generate potent RIG-I ligands would generally be expected to be found in defective genomes, simply as they represent a rare instance where a largely deleterious mutation (at least in terms of the population) still rises to high frequency. This does not preclude the role of mvRNAs <125nt in interferon induction, and these incredibly small RNA species likely contribute alongside defective genomes to the observed interferon response. [57,71] Lastly, we find that replication contributes to, but is not necessary for, the innate immune response to defective influenza particles. This is consistent with the idea that deletions, by chance, can generate more potent RIG-I ligands. RIG-I is known to exhibit a dose-responsive activation, and therefore for any given ligand, an increased concentration of that ligand would be expected to increase the probability of RIG-I engagement and interferon production. [75] This is in contrast to some reports which found that genome replication fails to modulate innate immune detection of these populations, however experimental choices such as the use of the inhibitor cycloheximide, or high MOIs wherein replication may not meaningfully contribute additional RIG-I ligands, may explain these discrepancies. [43,70] The finding that genome replication modulates interferon induction in turn leads us to reconsider the observed distribution of defective species across the viral genome, largely consisting of deletions in the polymerase segments, in a new light. Under conditions early in infection, where coinfection may be more limited, occasional packaging of deletions in the polymerase segment may be tolerated as the likelihood these deletions induce interferon remains low. However, should the virus package non-polymerase deletions, these viruses would exhibit an increased probability of interferon induction even under conditions where coinfection is rare. We therefore postulate that, rather than defective particles preferentially consisting of deletions in the polymerase segments due to some selection for their presence, we should also consider that there may be a strong selection against inclusion of deletions in non-polymerase segments.
Overall our study addresses several critical gaps in our understanding of influenza A virus defective particles. Our library-based approach was quite successful at generating models of how defective influenza populations are shaped after their initial formation, demonstrating that pressures other than generation are sufficient to explain their observed distribution. While there still remain features left to be explored to procure a full explanation of the kinetic features underlying their propagation, this is perhaps unsurprising. Prior experiments tracking defective content in long-term culture showed oscillatory dynamics, consistent with the behavior of parasitic elements in many social systems, implying frequency-dependent negative selection in addition to the features discussed here. [76] Therefore continued work to understand these particles will need to couple substantial modeling with tractable experimental systems, such as the libraries we generated in this report. Although we were only able to make a few inferences regarding the precise nature of deletions that drive interferon induction, namely that this phenomenon is length-independent and impacted by genome replication, our study was not designed to probe sequence-dependent phenomena as well as length-dependent owing to the introduction of barcodes and other artificial sequences which themselves likely impacted RNA structure. Future work with a modified library design will be required for further exploration of those features with an ultimate goal of being able to perform in silico prediction of stimulatory behavior. Finally, beyond this work on influenza A virus, we note that our library design is also broadly extensible to explore any length-dependent phenomena wherein the lengths are compatible with PCR amplification, and hope that providing this method will serve as a resource to the broader community.

Cell lines and viruses
The following cell lines were used in this study: HEK293T (ATCC CRL-3216), MDCK-SIAT1 (variant of the Madin Darby canine kidney cell line overexpressing SIAT1 (Sigma-Aldrich 05071502)) and A549 (human lung epithelial carcinoma cell line, ATCC CCL-185). Variants of MDCK-SIAT1 overexpressing influenza HA and PB1 were generated using a lentiviral vector as previously described. [77][78][79] A549 type I and type III interferon reporter lines were previously described. [46] A549 cells overexpressing PB1 were generated using lentiviral transduction using the same vector for MDCK-SIAT1. Cell lines were tested for mycoplasma using the LookOut Mycoplasma PCR Detection Kit (Sigma-Aldrich) using JumpStart Taq DNA Polymerase (Sigma-Aldrich). A549 cells used in all sequencing experiments had their

PLOS PATHOGENS
Segment and length dependent characteristics of defective influenza genomes identity confirmed via STR profiling by ATCC. All cell lines were maintained in D10 media (DMEM supplemented with 10% heat-inactivated fetal bovine serum and 2 mM L-Glutamine) in a 37˚C incubator with 5% CO 2 .
Wild-type A/WSN/1933 (H1N1) virus was created by reverse genetics using plasmids pHW181-PB2, pHW182-PB1, pHW183-PA, pHW184-HA, pHW185-NP, pHW186-NA, pHW187-M, pHW188-NS. [80] Genomic sequence of this virus provided in S4 File. HEK293T to MDCK-SIAT1 cells were seeded in an 8:1 coculture and transfected using BioT (Bioland Scientific, LLC) 24 hours later with equimolar reverse genetics plasmids. 24 hours post transfection, D10 media was changed to Influenza Growth Medium (IGM, Opti-MEM supplemented with 0.04% bovine serum albumin fraction V, 100 μg/ml of CaCl 2 , and 0.01% heatinactivated fetal bovine serum). 48 hours post-transfection, viral supernatant was collected, centrifuged at 300g for 4 minutes to remove cellular debris, and aliquoted into cryovials to be stored at -80˚C. Thawed aliquots were titered by TCID50 on MDCK-SIAT1 cells and calculated using the Reed and Muench formula or titered by qPCR against the viral segment HA. [81] To create wild-type viral stocks with a low defective content, MDCK-SIAT1 cells were infected at an MOI of 0.01 and harvested 30 hours post-infection. To generate intermediate defective populations, MDCK-SIAT1 cells were serially infected at an MOI of 0.01 twice, harvested at 72h each passage. To generate high-defective populations, these passages were instead performed at an MOI of 1.
Indicated mutations were introduced by inverse-PCR followed by single-molecule Gibson ligation and confirmed by Sanger sequencing. To generate viruses bearing large deletions in PB1 or HA, HEK293T cells were co-transfected with all reverse-genetics plasmids encoding A/ WSN/1933 except for either HA, or PB1, the indicated variant, and a construct expressing either HA or PB1 under control of a constitutive promoter in the pHAGE vector as used to transduce MDCK-SIAT1 cells. [79] To generate the viruses bearing a large deletions in NA, HEK293T cells were co-transfected with all reverse-genetics plasmids encoding A/WSN/1933 except for HA and NA, the indicated variant, a copy of the coding sequence of NA flanked by the packaging sequence of HA, and a construct expressing either HA under control of the a constitutive promoter in the pHAGE vector as used to transduce MDCK-SIAT1 cells. MDCK-SIAT1 cells in co-culture either constitutively expressed HA or PB1. Viruses were titered on complementing MDCK-SIAT1 cells, or by qPCR against the viral segment PA. For the barcoded HA deletions, HA 150:bc:475 and HA 475:bc:150 , 5' AGCTCCTATTAG 3', and 5' TATAGTCCCAAT 3', were used instead of random sequence.

Library cloning and initial sequencing
Up (amplifying the 5' end of the gene) and Down primer sets were used in individual PCR reactions alongside the common primers 5' GGTAACTGTCAGACCAAGTTTACTC 3' (Up) or 5' GAGTAAACTTGGTCTGACAGTTACC 3' (Down) using Q5 Hot Start High-Fidelity 2x Master Mix (New England Biolabs, M0494). Primers were designed as described in https:// github.com/a5russell/Defective_Library_Mendes_Russell, using annealing temperature calculations as described by Breslauer et al. (1986) [82]. Each reaction was performed using 2.5 ng of template DNA consisting of a pHH21 vector encoding either HA or PB1 with the ATG start codon mutated to GTA. [39] Reaction conditions consisted of a 55˚C annealing temperature, a 2 minute extension, and 11 total cycles of amplification. Up and Down amplicons were independently pooled and subjected to DpnI digest at 37˚C for 1 hour. Products were then purified using SPRI beads from the ProNex Size-Selective Purification System (Promega, NG2001) at a 1x bead volume following manufacturer's instructions.
Down amplicons were then subjected to a second round of PCR using the same common PCR primer and an additional primer consisting of an artificial adapter containing a 12nt random nucleotide sequence, 5' GCCATCGCCTACACGACGCTTCNNNNNNNNNNNN CGTTGACCACTGCTTGCGATGAT 3'. This primer was acquired from integrated DNA Technologies, with a hand-mixed randomer. Conditions for this amplification consisted of 25ng of template DNA subjected to 8 cycles of amplification at a 50˚C annealing temperature and 2 minute extension. This amplicon was then purified once more using a 1x bead volume.
25 ng of each amplicon pool, Up and Down with adapter, were combined in a Gibson reaction using NEBuilder HiFi DNA Assembly Master Mix (New England Biolabs, E2621) using manufacturer's protocols save for a 1 hour incubation step rather than 15 minutes. The resultant reaction was then cleaned using 1x bead volumes, and electroporated into electrocompetent Stbl3 cells. Plates were counted, scraped, and plasmids prepared for each step. This whole process, from PCR onwards, was performed 3 independent times for each library.
To sequence junction regions associated with each barcode, 1μg of each library was digested with BtgZI overnight and released products were purified by gel electrophoresis followed by gel extraction. Bands were then subjected to ligation with a DNA splint containing an NNNN region to facilitate ligation to the released overhang. This splint was generated by annealing 5' AAGCAGTGGTATCAACGCAGAGTACAT 3' to 5' NNNNATGTACTCTGCGTTGATAC CACTGCTT 3'. Ligation to each splint was performed overnight at 16˚C using T4 DNA ligase (New England Biolabs).
Ligation reactions were used directly as templates for PCR to infer Up junctions and Down junctions, separately, using the common primer 5' CGGGGTACGATGAGACACCATATT GGTCTCAAGCAGTGGTATCAACGCAG 3', and 5' GAAGCAGAAGACGGCATACGA GATAGGCCATCGCCTACACGACGCT 3' (Up) or 5' GAAGCAGAAGACGGCATACGA GATAGATCATCGCAAGCAGTGGTCA 3' (Down) with the following reaction conditions: 60˚C annealing temperature, 15s extension, for 12 cycles. A second round of PCR was then used to add partial Illumina adapters (5' TCGTCGGCAGCGTCAGATGTGTATAAGAGA CAGCGGGGTACGATGAGACACCA 3' / 5' GTCTCGTGGGCTCGGAGATGTGTATAA GAGACAGGAAGCAGAAGACGGCATACGAGAT 3', with a 55˚C annealing temperature, 15s extension, and 6 cycles. Samples were then barcoded using Illumina UD indices with a 62˚C annealing temperature, 20s extension, and 10 cycles. HA and PB1 libraries were pooled in indexing as they could be disambiguated during sequencing. Final samples were bead cleaned with a 3x bead volume and sizes confirmed by tapestation prior to sequencing.

Barcoded library selections
To analyze rates of genome replication, in an 8:1 coculture of HEK293T and MDCK-SIAT1 cells, each individual deletion/duplication library was cotransfected with plasmids that individually encode the mRNA for PB2, PB1, PA, and NP in an equimolar ratio using BioT according to the manufacturer's protocol. An orthogonal dataset was generated with transfections consisting of a library, PB1, PA, and NP but not PB2. Transfections were then allowed to proceed for 24 hours and RNA was purified using the Monarch Total RNA Miniprep Kit from New England Biolabs (T2010). To analyze other steps in the viral life cycle, after 24 hours media was changed from D10 to IGM and cells were infected with a low-defective wild-type WSN population at an MOI of 0.25, and infection was allowed to progress for 72h. Virus-containing supernatant was clarified by centrifugation at 300g for 4 minutes, and RNA was purified from 100 μl of supernatant using the Monarch Total RNA Miniprep Kit with 600 μl of lysis buffer.
To associate barcodes with the induction of interferon, 21 million interferon-lambda reporter cells were seeded across 3 15cm plates. 24 hours later, the media was changed to IGM and cells were infected with library supernatant at an MOI of 0.05. 14 hours post-infection, cells were separated using the MACSelect LNGFR System (Miltenyi Biotec). The supernatant of the samples were aspirated before being trypsinized for 10 minutes at 37˚C and quenched with D10 media. Samples were centrifuged for 4 minutes at 300g, had their supernatant removed, and were resuspended in 320 μL of 1x phosphate buffered saline (PBS). 80 μL of MACSelect LNGFR MicroBeads were mixed into the samples thoroughly and then placed in a 4˚C fridge for 15 minutes. Samples were then applied to LS columns (Miltenyi Biotec, 130-042-401) according to manufacturer's instructions. In addition to our enriched population, five million interferon-depleted cells from each sample were retained for analysis. RNA was thereafter purified from both sets of samples using the Monarch Total RNA Miniprep Kit from New England Biolabs.
To grow mixed 400nt and 1600nt variant populations, transfection was performed as for our more complex libraries, and thereafter populations were permitted to expand as described above over 72 hours.

Sequencing of naturally-occurring deletions in influenza A virus
Interferon-beta A549 reporter cell lines were infected at an MOI of 0.1 and sorted using magnetic activated cell sorting as described in the previous section as for interferon-lambda reporters, save for MS columns were used instead of LS columns. RNA was purified using either RNeasy plus mini kit (Qiagen, 74134), for interferon-depleted samples, or the RNease plus micro kit (Qiagen, 74034) for interferon-enriched samples. As above, RNA was converted to cDNA using the universal vRNA primers of Hoffmann et al. (2001) [47] and the SuperScript III First-Strand Synthesis SuperMix kit (Invitrogen, 18080400) according to the manufacturer's protocol. Samples were normalized by qPCR against HA. PCR was then performed as per Hoffmann et al. (2001) [47] for 16 cycles.
To generate cDNA from mRNA, normalized RNA samples were converted to cDNA using the SMART-Seq2 method and subjected to 14 cycles of amplification. [83] Both mRNA and vRNA samples were fragmented by tagmentation using the Nextera XT DNA library preparation kit from Illumina (FC-131), indexed, and sequenced.

Computational analysis of sequencing data
The entire analysis pipeline is found at https://github.com/a5russell/Defective_Library_ Mendes_Russell. To analyze barcodes, a custom Python script was used to parse sequencing files. Upstream and downstream junctions were assigned to barcodes separately and combined to generate a barcode-junction map. Likely due to PCR chimerism, highly abundant barcodes ended up assigning to multiple junctions, but these secondary assignments were rare. Therefore, an empirical threshold of 75% of reads must confidently assign to a single junction to assemble a barcode. A single mismatch was allowed within junction sequence for assembly. Thereafter, for each individual sample the number of times a barcode was identified within sequencing was analyzed without any further discrimination, no score values needed to be considered, and mismatched barcodes were discarded. This resulted in an assembly for the majority of observed barcodes, suggesting little benefit could be procured using a more complex analysis pipeline allowing for a greater hamming distance. As we used paired-end reads for all sequencing, barcodes or junctions were collapsed to a consensus sequence, if a mismatch was observed whichever read possessed the higher quality score at that position was presumed to be the correct nucleotide.
To analyze mRNA sequencing, reads were first trimmed using Trimmomatic providing the Nextera adapter sequences and the following variables: 2 seed mismatches, a palindrome clip threshold of 30, a simple clip threshold of 10, a minimum adapter length of 2, keep both reads, a lead of 20, a sliding window from 4 to 15, and a minimum retained length of 36. [84] Reads were then mapped against a concatenated GRCh38 human genome assembly and the A/WSN/ 1933 genome using STAR with default settings. [48] HTseq-count was used to count occurrences of a given transcript within our read mapping. [85] Output from HTseq-count was then used in DeSeq2 using a short R script, and results parsed using Python scripts.
To analyze vRNA sequencing, reads were also trimmed using Trimmomatic as above. Next read1 and read2 were mapped separately using STAR against the A/WSN/1933 genome with a requirement to map without any gaps by enforcing mapping to a custom gtf file consisting of the full length of each vRNA. Mismatches were excluded with the command -outFilterMis-matchNmax of 3, deletions were largely excluded by providing a deletion open penalty of 6, and deletion per base penalty of 6. Unmapped reads were retained, and were then used in a BLASTn search against a custom BLAST database consisting of just the A/WSN/1933 genome. [49]. Input for this search used a percent identity of 90, a word size of 10, a gap open penalty of 5, extend penalty of 2, and an evalue cutoff of 0.000001. BLAST output was then parsed and reads were identified that mapped to the same segment, discontinuously. Such reads must map completely, with no "missing" bases, and must map with the same polarity, with no inversion. Repetitive elements, such as 2nt repeated at the 5' and 3' ends of the junction, which could, in theory, be assigned to either, were assigned to the 5' end arbitrarily.
BLAST output was then used to initialize a new gtf file, using all data from all technical replicates for a given biological replicate. Therefore two gtf files were generated, one for each biological replicate. Unmapped reads were then mapped using this updated file using STAR with identical parameters as above. The resulting four files, two from the first mapping and two from the second, updated, mapping, were then combined and bam flags were fixed for appropriate, paired-end, analysis. Junctions were thereafter counted in the bamfile with a requirement that both reads mapped, neither is inconsistent with the deletion (ie if both map over the region a deletion is contained within, both show a deletion within the read), and that at least three bases are mapped on either side of the junction when considering the consensus produced by the paired-end reads.
Output was then parsed using custom Python scripts and Samtools. [86] qPCR Code generating graphs in manuscript can be found at https://github.com/a5russell/ Defective_Library_Mendes_Russell. Primers for all qPCR analyses listed in S5 File. For validation of natural diversity, purified RNA was used as described above. For all other qPCR analyses, the method of Shatzkes et al. (2014) [87] was used. In brief, adherent cells were incubated, after a PBS wash, with a gentle permeabilization buffer consisting of 10 mM Tris-HCl, 150 mM NaCl, 0.25% IGEPAL, and 1% DNaseI in DNase/RNase-free water and incubated at 37˚C for 5-10 minutes. DNase and infectious virus were then inactivated with a 5 minute 85˚C incubation. For supernatant measurements, 10 μl of supernatant was incubated with 90 μl buffer instead. cDNA from lysate was generated using the High Capacity First Strand Synthesis Kit (Applied Biosystems, 4368814) with random hexamers according to the manufacturer's protocol using a final lysate concentration of 10% of the reaction volume. qPCR was thereaf performed using Luna Universal qPCR Master Mix (New England Biolabs, M3003) with manufacturer's suggested reaction conditions. For pure plasmid controls, plasmids were used at the indicated concentrations instead of cDNA in qPCR.

Flow cytometry
Indicated cells were seeded 24 hours prior to infection, and, at the indicated time points, trypsinized and resuspended in PBS supplemented with 2% of heat-inactivated fetal bovine serum (FBS). For HA staining, cells were stained with 10 μg/ml of H17-L19, a mouse monoclonal antibody confirmed to bind to WSN HA in a prior study. [88] Cells were refrigerated with the primary antibody for one hour at 4˚C before being washed with PBS supplemented with 2% FBS and then stained with a goat anti-mouse IgG antibody conjugated to allophycocyanin and refrigerated for another hour at 4˚C. Cells were washed with PBS supplemented with 2% FBS and fixed with 1% formaldehyde (BD Cytofix) for 30 minutes at room temperature. Cells were washed with PBS and thereafter run on a flow cytometer. Data processing consisted of first generating a debris gate in FlowJo prior to export to a tsv or csv and analysis using custom Python scripts.

RIG-I siRNA experiments
Each transfection received 50 μL of Opti-Mem, 1.5 μL of Lipofectamine 3000 (Invitrogen, L3000001), and 0.25 μL of 10 μM siRNA or 10 μM non-targeting control. All siRNAs used were Ambion Silencer select siRNAs from Life Technologies Corporation, catalog number s223614, and s223615 for RIG-I, and 4390843 for the non-targeting control. Transfection mixes were combined with A549 cells diluted to 200,000 cells/mL in D10 media. 100,000 transfected cells were seeded in each well of a 24-well plate. 24 hours post-transfection, the media was replaced with fresh D10. 48 hours post-transfection, the media was changed to IGM and cells were infected with PB1 177:385 . 14 hours post-infection, cells were harvested using the method of Shatzkes et al. (2014) [87].  Fig 2C. (A) All four qPCR reactions exhibit linearity when tested against a plasmid control. R 2 from linear regression against Ct verus the log-transformed plasmid concentration. Linear regression line shown. (B) Each qPCR is highly-specific to its cognate target, and discriminatory against non-cognate targets. Target versus non-target specificity was calculated as the relative signal of each qPCR reaction on the indicated off-target control when compared to a target control at 0.01 ng per qPCR reaction. n = 3 for both panels.  Fig 4D, confirming levels of influenza transcript expression. qPCR measuring levels of HA transcript as normalized to the housekeeping control L32 for the indicated influenza mutants infecting A549 cells at an MOI of 0.5 at 8h postinfection. The significant difference between PB1 deletion variants is not due to incorrect dosage, as shown by similar levels of influenza transcripts. Neither PB1 mutant is replication-competent, as shown by the reduced influenza transcript levels compared to a wild-type control. Asterisks represent significantly different values in all pairwise comparisons, two-tailed t-test, using Benjamini-Hochberg multiple-hypothesis correction at an FDR of 0.05. n = 4. (left) qPCR against IFNB1 as corrected by the housekeeping gene L32 demonstrates that, even with genome replication NA 189:76 is no more stimulatory than PB1 177:385 , and when replication is suppressed, it is actually less stimulatory. (right) qPCR against PA, as corrected by the housekeeping gene L32, was used to validate the suppression of viral replication by the antiviral nucleoside ribavirin, and matched dose between our variants. A549 cells were pretreated with 200 μM ribavirin for 2h and then infected at an MOI of 0.5. RNA was harvested for analysis after 14 hours of infection. Astrisks indicate conditions wherein signal was significantly different between variants. Two-tailed t-test. n = 4. (TIF) S16 Fig. Gating for Fig 6B. Individual replicates shown. Dotted line represents positivity threshold Gating drawn on the uninfected control before application to infected samples, set at the 99.9 th percentile of uninfected. Data shown as kernel density estimates with Gaussian distributions. n = 3.