Evolution-guided design of super-restrictor antiviral proteins reveals a breadth-versus-specificity tradeoff

1Division of Basic Sciences, 2Division of Human Biology, and 3Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, WA, USA; 4Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA, USA; 5Pacific Northwest Research Institute, Seattle WA, USA; 6Institute of Virology, Medical Center-University of Freiburg; 7Faculty of Medicine, University of Freiburg, Freiburg, Germany


Introduction
The innate arm of mammalian immunity includes dozens of antiviral proteins that act cellautonomously to block viral replication [1,2]. Over long evolutionary periods, innate immune proteins must adapt to combat rapidly-evolving pathogenic viruses. This adaptation often occurs at the interfaces between viral and host immune proteins, to either evade or increase binding interactions [3]. This back-and-forth evolution between host and viral proteins can result in signatures of positive selection (higher rates of non-synonymous substitutions, dN, compared to synonymous substitutions, dS) that can be identified by a comparison of orthologous genes over a phylogenetic tree. Positive selection is often concentrated in just a few residues that maximally impact binding affinity and thereby dictate the outcomes of viral infection in host cells [3]. Indeed, experimental tests show that variation in positively selected sites can explain the species-specific differences in antiviral specificity between orthologous antiviral proteins [3][4][5].
Such retrospective analyses are a powerful means of identifying amino acids and proteinprotein interaction interfaces that have been shaped by past episodes of selection. Nonetheless, they are limited in terms of describing the available landscape of possible adaptation against viruses. For example, epistasis between different rapidly evolving sites could have shaped and constrained the available landscape of possible adaptation against viruses [6][7][8]. We sought to means to elicit higher-specificity versions of antiviral proteins and understand the available paths to adaptation of a host innate immune protein to potential viral pathogens.
We focused our studies on MxA, an interferon-induced large dynamin-like GTPase, which inhibits a diverse range of viruses by interacting with multiple distinct viral proteins [9, 10]. MxA is comprised of an N-terminal globular GTPase-containing head (G domain) and a C-terminal stalk, which are connected by a hinge-like bundle-signaling element (BSE) [11,12]. Previously, we had identified the L4 loop (L4), which protrudes from the MxA C-terminal stalk, as a hotspot for recurrent positive selection in primates [4]. We showed that five residues (amino acids 540, We hypothesized that combinatorial mutagenesis of all five positively selected residues in L4 of human MxA might reveal the contributions of positions other than residue 561 in antiviral restriction and generate MxA variants with increased antiviral activity (here, called 'superrestrictors') relative to human MxA. Our combinatorial analyses revealed strict amino acid requirements at some L4 positions (e.g., residue 561) but also significant contributions of other L4 positions (e.g., residues 540, 564) to gain and loss of restriction against THOV. Our analyses also revealed a context-specific epistasis between L4 residues, thereby reiterating the merit of our combinatorial mutagenesis approach. Finally, consistent with our predictions, our analyses reveal 'super-restrictor' variants of human MxA that have 10-fold higher antiviral activity against THOV, in spite of the fact that human MxA is itself one of the most potent natural restrictors of THOV. We reveal that the basis of this super-restriction is increased binding to the THOV nucleoprotein (NP). Unexpectedly, we find an inverse correlation between the antiviral activity of MxA variants against the THOV and IAV orthomyxoviruses. Our analyses thus reveal not only a powerful means to elicit super-restrictor versions of antiviral proteins, but also reveal an unexpected breadth-versus-specificity tradeoff that shapes the adaptive landscape of antiviral proteins in nature.

Results
. CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/557264 doi: bioRxiv preprint

Combinatorial mutagenesis reveals L4 sequence requirements for THOV restriction
Retrospective analyses can not only identify which residues have recurrently evolved due to relentless adaptation (positive selection) [3], but also identify residues that have not undergone any changes despite millions of years of protein divergence (purifying selection). We reasoned that a prospective approach focusing on positively selected sites, while preserving sites that have evolved under purifying selection, might reveal insights into the selective pressures that mold antiviral defense repertoires and uncover unexplored potential for enhanced antiviral activity. Furthermore, we reasoned that focusing on only a subset of sites revealed to be important for past adaptation by retrospective studies would allow us to use combinatorial mutagenesis to more comprehensively survey potential paths of antiviral adaptation.
We therefore generated a MxA gene variant library ( Figure 1A, Figure S1) encoding MxA proteins containing random combinations of all amino acids in the five positively selected sites of L4 in wild-type human MxA (hereafter referred to as wtMxA). Five residues in the L4 of wtMxA protein have evolved under recurrent positive selection in simian primates: 540 (G), 561 (F), 564 (F), 566 (S) and 567 (S). These residues were combinatorially mutagenized to all 20 amino acids using primers with NNS codons (Methods). We randomly selected ~600 clones and individually evaluated their antiviral activity against THOV using a minireplicon assay (Methods).
In this minireplicon assay, the viral polymerase components (PB2, PB1, PA) as well as the genomic-RNA-binding nucleoprotein (NP) drive the expression of a firefly luciferase reporter.  Table S1).
Despite only altering the 5 most rapidly evolving L4 positions, we found that most MxA variants (95%) had worse THOV restriction than wtMxA, consistent with the evolutionary finding that these residues are crucial for MxA's antiviral function. We searched for patterns to explain the sequence requirements to maintain MxA antiviral activity against THOV. We found that all restrictive variants possessed a hydrophobic aromatic amino acid (Phe F, Tyr Y or Trp W) at residue 561 ( Figure 1B, right). These results extend our previous findings that a hydrophobic . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/557264 doi: bioRxiv preprint aromatic amino acid at residue 561 could confer antiviral restriction against THOV in the context of wtMxA [4]. While many non-restrictive variants had non-aromatic residues at position 561, we obtained at least 21 combinatorial MxA variants that had weaker anti-THOV activity than wtMxA despite possessing F, Y, or W at residue 561 ( Figure 1C). Therefore, we conclude that an aromatic residue at position 561 is necessary but not sufficient to confer anti-THOV restriction.
This implies that the other four rapidly evolving L4 residues also directly contribute to MxA antiviral activity against THOV.

'Super-restrictor' MxA variants and their molecular basis
In our initial screen, we also discovered four MxA variants (LFVKG, GYKDA, TFGNF and GYGQL in the randomized L4 positions) with 2-to 5-fold better restrictive ability against THOV than wtMxA ( Figure 1B, 1C). We validated that these variants are indeed more potent than wtMxA throughout a dose-response curve. This increased potency cannot be explained by expression levels. These super-restrictor MxA variants validate our premise that combinatorial mutagenesis of positively selected residues could yield increase antiviral potency. However, because only 6% of the initial mutant pool possessed the required aromatic amino acid at position 561 ( Figure 1B, 1C), our initial screen did not fully explore the super-restrictor sequence space. Therefore, we designed a second combinatorial library, in which we fixed site 561 as F, the amino acid present in wtMxA, while randomizing the other four amino acids under positive selection in L4.
For the four-site screen, we randomly selected 168 variants and assessed their antiviral activity in triplicate against THOV using the minireplicon assay ( Figure 2A, Table S2). As predicted, a greater proportion (~65%) of MxA variants in the 4-site screen have anti-THOV activity comparable to wtMxA (Figure 2A) as opposed to only ~6% in the five-site screen ( Figure 1B). Moreover, we found that ~13% of the 4-site variants are super-restrictors ( Figure 2A) compared to only ~0.5% in the initial screen. To validate the top super-restrictors from both screens, we retested their activity in the THOV minireplicon assay ( Figure 2B). Most of the variants identified as super-restrictors again showed enhanced restriction of THOV. In particular, the three most potent super-restrictors obtained with the 4-site screen (i.e., QFAYS, VFRSV and TFAMC) are 7 to 10-fold more potent than wtMxA ( Figure 2B). Although these super-restrictor variants are expressed at similar levels as wtMxA ( Figure 2C, bottom), they have higher anti-THOV restriction at all levels of expression ( Figure 2C). This significant improvement in THOV . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/557264 doi: bioRxiv preprint restriction is remarkable because human wtMxA is the most potent naturally-occurring MxA ortholog for THOV restriction identified to date [4,13].
Since we previously found that changes at amino acid 561 correlated with gain or loss of binding to the target nucleoprotein (NP) [4,14], we next tested if increased MxA-NP interaction could explain the super-restriction phenotype [15][16][17]. MxA forms higher-order oligomers that are required for antiviral activity [11, 18,19]. Thus, increased affinity of MxA monomers to NP might result in higher avidity in the multimeric MxA complex that interacts with THOV NP [4,11,14,18,20]. We assayed the association of NP with FLAG-tagged MxA super-restrictors in THOV-infected cells. We found that the most potent super-restrictors QFAYS, VFRSV and TFAMC pull down a larger fraction of cellular NP compared to wtMxA (GFFSS) ( Figure 2D), whereas non-restrictor MxA versions (variant PFFSS or an MxA protein with a deletion of L4) have greatly reduced ability to pull down THOV NP. This suggests that the mechanism for super-restriction involves a more avid interaction of MxA with the THOV NP target protein.
We next investigated whether there was a common sequence pattern that could explain the enhanced anti-THOV restriction activity of the super-restrictor variants. We noticed that MxA variants with glutamine (Q) at position 540 were enriched among the 4-site super-restrictors ( Figure 2B, Figure S4), including the most potent super-restrictor, QFAYS. Swapping the glutamine (Q) residue at position 540 in QFAYS, QFQSM and QFVVM to the glycine (G) present in wtMxA led to a significant drop in restriction activity -in one case to below wtMxA levels ( Figure 3A). This finding suggested that position 540 could play an important role in determining super restriction. In order to better explore the role of position 540 in superrestriction, we mutated the glycine at residue 540 (G540) in the wtMxA backbone to every other amino acid ( Figure 3B). We found that changing position 540 to Q in wtMxA leads to only a 2fold improvement in antiviral activity ( Figures 3A, B), significantly less than the super-restriction of the QFAYS, QFQSM and QFVVM variants. Moreover, although individual substitutions (e.g., G540A or G540Q) could increase THOV restriction 2-3 fold, none of the single substitutions at residue 540 achieved the level of super-restriction in the most potent variants (compare Figures   3B with 2B). These results imply that Q540 is necessary but not sufficient to explain superrestriction. Consistent with these findings, although variants QFQSM and QFVSM are both super-restrictors, variant QFLSM is a non-restrictor despite differing only at position 564 ( Figure   3C). Our findings imply that multiple positively selected L4 residues in addition to residue 540, can enhance or epistatically interfere (e.g., QFLSM) with the degree of MxA restriction. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/557264 doi: bioRxiv preprint Our single residue swap studies revealed 540A to be the most potent restrictor ( Figure 3B) in the context of the wtMxA backbone. However, we speculated that the enhancement of MxA restriction by 540A mutations might also be context-dependent. To test this possibility, we made 540A mutations in three super restrictors -HFSGR, TFAMC and VFRSV -that each encode for an 'unfavorable' amino acid at position 540 in the wtMxA context ( Figure 3B). If the enhancement in MxA restriction by 540A were context-independent, we would expect to see further enhancement of anti-THOV activity in each of the HFSGR, TFAMC and VFRSV superrestrictors via 540A mutation. Instead, we found that HFSGR and AFSGR MxA variants were equivalent in their restriction, as were the TFAMC and AFAMC variants ( Figure 3D). However, a V540A swap in VFSRV dramatically lowered restriction activity by 10-fold ( Figure 3D), even though the same V540A swap in a wtMxA context (XFFSS) increased restriction by 5-fold ( Figure 3B). Together, our results show that a simple model of context-independent contribution of all L4 positions cannot account for the broad repertoire of super-restrictors. If our analyses had relied on combining 'restriction favorable' single residues like A540 while avoid 'restrictionunfavorable' residues like V540, we would have never been able to discover super-restrictors like VFRSV. Instead, we find that epistasis shapes the restriction profile of many MxA superrestrictor variants such as VFRSV. This discovery of context-specificity demonstrates the power an unbiased combinatorial mutation approach to provide a more thorough exploration of the adaptive landscape for discovering novel super-restrictors.

A breadth-versus-specificity tradeoff underlies MxA antiviral specificity
MxA restricts a number of different viruses and its evolution is likely to have been shaped by a number of host-virus interactions and MxA from humans restricts both THOV and IAV more potently than other primate MxA variants [4]. Therefore, we investigated how restriction against one virus (i.e., THOV) affects the antiviral activity against the other (i.e., the H5N1 strain of IAV).
This comparison is especially appropriate because restriction of both viruses depends on the L4, specifically on residue F561 in MxA [4]. Thus, one might expect that gaining superrestriction against one virus via changes in the MxA L4 might also confer super-restriction against other viruses that are restricted by a similar interface. We first investigated several 'nonrestrictor' variants that still preserved an aromatic residue at residue but had lost anti-THOV activity ( Figure 4A). We tested these MxA variants for their anti-IAV restriction activity using the IAV minireplicon assay described previously (Methods) as a proxy for IAV replication. We found . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/557264 doi: bioRxiv preprint that these variants still retained anti-IAV activity ( Figure 4B). Thus, loss of restriction against THOV is not necessarily linked to loss of IAV restriction. Next, we tested 'super-restrictor' MxA variants that had 10-fold higher activity against THOV ( Figure 2). Surprisingly, we found that all anti-THOV super-restrictors tested have lower antiviral activity against IAV than wtMxA ( Figure   4C). Thus, MxA optimization to restrict THOV also impairs its ability to restrict IAV. These results demonstrate that MxA has different optimal sequence states for different viruses. Our findings suggest that wtMxA provides a generalist solution to restrict both THOV and IAV and perhaps a large number of other viruses; specialization to restrict one virus renders the protein ineffective against other viruses.

Discussion
Taking advantage of an approach that combines combinatorial mutagenesis and evolutionguided insight into residues that have been subject to positive selection, we have been able to identify MxA variants that are 10-fold better anti-THOV restrictors than wtMxA, itself the most potent MxA ortholog known against THOV [4,13]. Our analyses reveal that these 'superrestrictors' require contribution from several positions in L4 including the necessary requirement of an aromatic residue at position 561. Conversely, we find that 'unfavorable' residues can also render MxA variants to become 'non-restrictors' against THOV in spite of having an aromatic residue at position 561. Thus, multiple L4 residues that have evolved under recurrent positive selection over primate evolution shape the antiviral specificity of MxA.
Our analyses further reveal that context-specific epistasis helps determine the contributions of individual L4 residues in shaping MxA specificity. We note instances where the same amino acid mutation (A540V) can have opposite effects on restriction activity in MxA proteins that only differ at three other L4 residues. Our results suggest that the path to adaptation in MxA is indeed shaped by historical contingency (i.e., what MxA sequence was present when adaptation occurred) as well as by epistasis. These findings that are reminiscent of previous work delineating the evolutionary path to adaptation is often shaped by contingency and epistasis [6-8].
Our strategy for the discovery of super-restriction factors preserves protein domains subject to purifying selection and samples mutations at residues already highlighted by positive selection analyses as recurrent targets of adaptation. This focus on only a smaller subset of residues, in . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/557264 doi: bioRxiv preprint this case L4, allows us to more fully explore outcomes of combinatorial mutagenesis [21] by relaxing the constraints imposed by epistasis and historical contingency. Using this method, we are thus able to obtain super-restrictor MxA versions that would not necessarily have been obtained by combining 'favorable' residues identified by a deep-mutational scan of all L4 positions.
It is not necessarily true that the positive selection of MxA L4 was shaped via interactions with NP proteins of orthomyxoviruses like THOV or IAV. However, we can view both of these extant viruses are proxies for other selective events, including some that occurred deep in simian primate evolution. Nonetheless, our analysis MxA variants derived via combinatorial mutagenesis reveals an unexpected breadth-versus-specificity tradeoff; MxA 'super-restrictor' variants with 10-fold higher THOV restriction appear to lose all IAV restriction. Our findings suggest that, under threat by multiple viruses, antiviral genes such as MxA appear to be under evolutionary pressure to harbor more broadly active alleles even at the expense of more potent, specific antiviral activity. This result is reminiscent of a recent study, which reported that generalist (promiscuous) MHC class II alleles are selected for in human populations in response to high pathogen diversity [22]. Similar breadth-specificity tradeoffs have also been invoked in the case of cytochrome p450 detoxification genes that protect herbivorous insects from plant counterdefenses [23] and in plant disease resistance (R) genes encoded by the Leucine Rich Repeat (LRR) gene family [24]. Furthermore, although it was not the intention of our combinatorial mutagenesis strategy to do so, the breadth-specificity tradeoff we observe is nevertheless similar to the outcome of an artificial selection experiment that made the GroEL chaperone highly specific for one substrate but at the expense of its substrate-binding breadth [25]. It is remarkable that critical components of the innate immune defense apparatus may evolve under the same constraints as an intracellular chaperone, sacrificing high avidity of binding to a given viral substrate in order to maintain a breadth of target binding and thereby antiviral response. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/557264 doi: bioRxiv preprint 29] like MxA L4 [18]. We speculate that unstructured loops like L4 represent an ensemble of many conformations [18] that provide MxA with structural and evolutionary flexibility to adapt to binding distinct viral targets, whereas super-restricting variants may be less structurally flexible, trading off increased restriction of certain viral targets with decreased antiviral range. Thus, breadth-specificity tradeoffs constrain the adaptive landscape of antiviral proteins.  The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/557264 doi: bioRxiv preprint variants with lower restriction than wtMxA are shown below the dashed green line (these contain very diverse residues in the randomized L4 positions). Approximately 6% of the variants have equivalent restriction to wtMxA (green restrictors or 'R', between the pink and green dashed lines) and these all possess a F, Y, or W residue at position 561. L4 sequences of four variants that restrict THOV better than wtMxA (pink 'super-restrictors' or 'SR') are shown. C.       The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/557264 doi: bioRxiv preprint how much it contributes to the overall difference in frequencies. For example, Q540 is enriched in the super-restrictor pool in our analysis of 168 4-site MxA variants.

NNS library construction
The L4 loop five-point variant library was constructed using oligonucleotide-directed mutagenesis of the rapidly evolving sites in MxA. To mutate the five rapidly evolving sites in the L4 loop (positions 540, 561, 564, 566 and 567), two mutagenic oligonucleotides (one sense, one antisense) were synthesized (IDT) that contain sequence complementarity to 70 bp in the region encoding for the rapidly evolving residues. For the targeted positions, the oligonucleotides contain NNS codons (N = A, T, C or G, and S = G or C). This biased randomization results in 32 codons with all 20 amino acids sampled -a significant decrease in library complexity and incidence of stop codons without the loss of amino acid complexity. The maximum complexity of this library is thus 20 5 , or 3.2 million variants. One round of PCR was carried out with either the sense or antisense oligonucleotide and a flanking antisense or sense oligonucleotide. A second round of PCR using a combination of first round products and both flanking primers produced the full-length double stranded product. The full-length PCR product was purified, digested and ligated into the pQXCIP retroviral expression vector. The ligation was purified and eluted in 10 µl dH 2 O, which was used to transform ElectroMAX TM DH10B TM cells (ThermoFisher). 836 colonies were randomly selected and overnight in LB liquid medium containing 100 µg/ml of ampicillin. All variants containing stop codons introduced by NNS mutagenesis were removed and not analyzed further.
The L4 loop four-site variant library was similarly constructed, except that only four L4 positions were randomized, whereas position 561 was fixed as F. This library has a maximum complexity of 20 4 , or 160,000 variants.

Sequencing
The library variants and all other MxA clones in this study were sequenced in full-length using The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/557264 doi: bioRxiv preprint MxA L4 loop variants were cloned into the NotI/EcoRI sites of the retroviral expression vector pQXCIP-3x flag. Point mutants were generated using the Q5 ® site-directed mutagenesis kit (New England Biolabs).

Minireplicon assays
The THOV minireplicon assay was performed in 293T cells in a 96-well format For the IAV (H5N1) restriction analyses, we performed similar experiments but with the H5N1 minireplicon system of A/Vietnam/1203/04 as described (13) in a 12-well format, including a reporter construct encoding firefly luciferase under the control of the viral promoter, and 300 ng of the MxA expression plasmid. Just like with the THOV minireplicon experiments, restriction was assayed relative to empty vector control by measuring firefly luciferase expression relative to the Renilla luciferase transfection control. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/557264 doi: bioRxiv preprint cell lysates were subjected to standard Western blot analysis using antibodies against the MxA protein and THOV NP as described previously (12).

Logo plots
Difference logo plots comparing super-restrictors, restrictors or non-restrictor classes to the full library. For each class of clones, we constructed an amino acid position frequency matrix, and generated difference logo plots using the DiffLogo R package [30]. Amino acids shown above the y=0 line are enriched in each class of clones compared to their background frequency in the whole library, and those below the y=0 line are depleted. The total height of each stack of letters represents how different the two classes of clones are from one another, and the height of each amino acid letter reflects how much it contributes to the overall difference in frequencies.

Statistical Analysis
Data analyses were done using GraphPad Prism 7.0 software. All data are shown as mean  SEM. Statistical analysis was performed using unpaired t -tests with two tailed, 95% confidence. p values less than 0.05 were considered statistically significant.
. CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/557264 doi: bioRxiv preprint