Genomic Signatures for Avian H7N9 Viruses Adapting to Humans

An avian influenza A H7N9 virus emerged in March 2013 and caused a remarkable number of human fatalities. Genome variability in these viruses may provide insights into host adaptability. We scanned over 140 genomes of the H7N9 viruses isolated from humans and identified 104 positions that exhibited seven or more amino acid substitutions. Approximately half of these substitutions were identified in the influenza ribonucleoprotein (RNP) complex. Although PB2 627K of the avian virus promotes replication in humans, 45 of the 147 investigated PB2 sequences retained the E signature at this position, which is an avian characteristic. We discovered 10 PB2 substitutions that covaried with K627E. An RNP activity assay showed that Q591K, D701N, and M535L restored the polymerase activity in human cells when 627K transformed to an avian-like E. Genomic analysis of the human-isolated avian influenza virus is crucial in assessing genome variability, because relationships between position-specific variations can be observed and explored. In this study, we observed alternative positions that can potentially compensate for PB2 627K, a well-known marker for cross-species infection. An RNP assay suggested Q591K, D701N, and M535L as potential markers for an H7N9 virus capable of infecting humans.


Introduction
A novel influenza A H7N9 virus emerged in Eastern China in March 2013, and 241 of the 676 laboratory-confirmed cases reported until July 2015 resulted in death, a remarkable 36% case fatality rate. Most cases were from China, except four from the Taipei Centers for Disease Control (Taipei CDC), twelve from the Centre for Health Protection, Hong Kong SAR, one Chinese traveler reported by Malaysia, and 2 Canadian travelers returning from China. The first Taiwanese H7N9 infection was reported in April 2013 in a businessman returning from the Jiangsu province of China [1,2]. The second and third infections were reported in December 2013 and April 2014 in Chinese tourists [3,4]. The fourth H7N9 infection was reported on April 25, 2014, in a businessman with a history of travel to China [5]. All four recovered after hospitalization with clinical treatments except that the 2 nd patient died of complication of septic shock from bacterial pneumonia.
The novel influenza A H7N9 virus is an avian virus to affect humans [18,19]. In mice experiments, the virus was found more pathogenic than an avian H7N9 virus (A/duck/Gunma/466/ 2011) and a representative H1N1 virus (A/California/4/2009) [20]. In ferret, A/Shanghai/2/ 2013 could replicate well in the upper and lower respiratory tracts to high titers for 6 to 7 days [21]. The virus was found directly transmitted by contact, and was less efficient in airborne [21,22]. Regardless, one isolate from a patient in Anhui was shown highly transmissible between ferrets by respiratory droplets [23]. H7N9 virus was found able to infect but did not replicate well in pigs after intranasal inoculation [20,21]. Neither was it able to further transmit to other pigs [21]. It was reported that a single Q226L mutation (H3 numbering) on the influenza A hemagglutinin (HA) enabled the H7N9 viruses to have a mixed α-2,3/α-2,6 receptor preference, which increased binding to mammalian-like receptors in the human upper airway [24]. Moreover, the replication-promoting PB2 E627K mutation dominated the H7N9 patient isolates [25,26]. Other PB2 mutations including Q591K, and D701N were also investigated for their enhancing polymerase activity in human 293T cells [27].
In this study, we determine the genome of the fourth H7N9 isolate from Chang Gung Memorial Hospital (CGMH) and describe the genome diversity of these Taiwanese isolates. We subsequently assess the genetic diversity of H7N9 genomes from cases reported between March 2013 and April 2015. In our previous studies, we performed large-scale scanning of the influenza A virus genomes and summarized a list of species-associated signatures to distinguish human and avian viruses [18,28,29]. Using these signature positions, we investigated how most of the avian signatures were retained in the H7N9 viruses and which signature positions may become characteristic to humans. Moreover, we performed a genome-wide scan to summarize the genetic diversity among all H7N9 viral proteins and focused on PB2 mutations that potentially enable the virus to infect humans. Finally, a reporter assay was performed to assess the influence of mutations on RNP activities.

Ethics Statement
The present study aimed to characterize the genomic heterogeneity of human H7N9 viruses in the four Taiwanese patients, as well as all reported H7N9 genomes from the public databases since March 2013. Three genomes each belongs to the first three Taiwanese patients were among the database genomes we collected. These genomic sequence records are available to the general public through their web services (The Influenza Virus Resource, and the Global Initiative on Sharing Avian Influenza Data). No patient information were ever made available through these services. The two genomes sequenced in this study were derived from an H7N9 virus isolate deposited to a virus bank maintained in the Clinical Virology Laboratory of CGMH. Such isolates were cultured from the clinical specimens collected by physicians for the purpose of medical diagnosis or public health investigation. Clinical information from any of these specimens were anonymized and de-identified prior to their inclusion to the virus bank such that the authors had no access to any identifying information at any time.

Virus Isolation
The virus was isolated from sputum samples of the aforementioned patient admitted to the CGMH by using the Madin-Darby canine kidney (MDCK) cell line maintained in a Dulbecco's modified Eagle's medium (DMEM, Gibco, Grand Island, NY, USA) containing 0.1 mg/mL trypsin. For virus propagation, 200-μL of the sputum samples were injected into the allantoic cavity of 10-to 11-day-old embryonated eggs and incubated at 37°C for 3 days; subsequently, the allantoic fluid of the inoculated chicken eggs was harvested. Labels 4-CGMH1 and 4-CGMH2 represent the viruses isolated from egg passage 1 and MDCK cell passage 2, respectively, and "4" represents the fourth case of H7N9 in Taiwan.

RT-PCR and Sequencing
Viral RNAs were extracted from the cell culture supernatant or the allantoic fluid of the inoculated chicken eggs according to the manufacturer's instructions using a QIAamp Viral RNA Mini Kit (Qiagen, Valencia, CA, USA). The RNA was reverse transcribed into a cDNA by using SuperScript III reverse transcriptase (RT) (Invitrogen, Carlsbad, CA, USA). The polymerase chain reaction (PCR) was performed using a proofreading DNA polymerase KOD-plus (Toyobo, Osaka, Japan) and the specific primers listed in S1 Table. The following PCR conditions were applied: 40 cycles of 94°C for 30 s, 50°C or 55°C for 30 s, and 68°C for 2 min (BioMetra Thermocycler, Biometra, Göttingen, Germany). The PCR products were isolated through electrophoresis on 1% agarose gel, and appropriate-size amplicons were excised from the gel and purified using a QIAquick gel extraction kit (Qiagen, Valencia, CA, USA). Nucleotide sequencing was performed according to the manufacturer's protocols using the BigDye terminator cycle sequencing kit (Version 3.1, Applied Biosystems, Carlsbad, CA, USA). The nucleotide sequences were assembled using the SeqMan program (DNASTAR, Madison, WI, USA).

Sequence Analysis
Protein translation and multiple sequence alignment were performed using BioEdit (Version 7.2.5) [32]. The position-specific amino acid compositions were plotted on graphs using WebLogo 3 [33]. A phylogenetic tree was inferred using the Neighbor-Joining method with 1,000 replicates as implemented in MEGA6 [34]. A mutual information (MI) analysis was performed on Mutual Information Server To Infer Coevolution (http://mistic.leloir.org.ar/) to detect coupled mutations [35]. potentials were predicted using Chimera [38] by applying Coulomb's law, and the surfaces were colored from −4 kT/e (red) to 4 kT/e (blue). A/little yellow-shouldered bat/Guatemala/ 060/2010(H17N10) (PDB ID 4WSB) was used as the template to construct a plausible conformation for the entire PB2 protein.
Chloramphenicol acetyl transferase enzyme-linked immunosorbent assay RNP activity was measured using a chloramphenicol acetyl transferase enzyme-linked immunosorbent assay (CAT ELISA; Roche, Indianapolis, IN, USA). First, 293T cells were cotransfected with 1 μg of the H7N9 RNP components cloned into pcDNA3.1. Concurrently, a plasmid containing a reporter gene (CAT) flanked by the viral promoters (pPOLI-CAT-RT) was transfected into the cells. The total cell lysate was extracted 48 h after transfection using the 1× lysis buffer provided in the CAT ELISA kit. After quantifying the protein content, each protein sample was diluted to 5 μg/μL and serially diluted using the lysis buffer. Subsequently, the sample was assayed according to the manufacturer's instructions to determine the CAT levels. Table 1 lists the amino acid variations among the five H7N9 virus genomes reported in Taiwan. TW1/2013, TW2/2013, and TW2/2014 are from the first, second, and third patients, reported in April and December 2013 and April 2014, respectively. CGMH1 and CGMH2 are two genomes from the same specimen of the fourth patient and represent viruses from egg passage 1 and MDCK passage 2, respectively.

Genetic Characteristics of Taiwanese H7N9 Genomes
With 10 amino acid substitutions, PB2 had the highest mutation (Table 1), followed by HA and NA, each with seven substitutions. In addition, PB1 and PA of these H7N9 genomes each had five positions exhibiting heterogeneity. PB1-F2, a 90-amino acid product, was alternatively translated from frame 2 of PB1 [39]. PA-X, a 252-amino acid product, was translated through a frame-shifting mechanism [40], in which the first 191-amino acids were the same as PA (coding sequence 1..573) and the remaining 61-amino acids were from frame 2 of PA (coding sequence 575..760). Although only two amino acid substitutions were observed in PA-X (Table 1), an additional substitution at position 61 of PA-X was omitted from the table because PA-X and PA share the first 191-amino acid segment. No variation was observed in NS2 among these Taiwanese genomes.
Among the 46 positions listed in Table 1, 24 had amino acid substitutions from the first and second genomes, and 33 from the second to the third genome, indicating that the substitutions were frequent. Thirteen of the 33 substitutions between the second and third genomes changed their respective amino acid residues to the original residues in TW1/2013. TW2/2014 and CGMH2 (MDCK passage 2) had identical genomes at the amino acid level. The two CGMH genomes, however, differed at two positions, NA 16 I vs T and PB1 586 R vs K. A number of the amino acid residues listed in Table 1 are well-known for their virulence and have been annotated [9,13,14,17,[41][42][43]. In particular, both PB2 K627E and D701N were observed only in TW2/2014 and the two CGMH genomes, in which 627E is characteristic of the avian species. Amino acid residues for the H7N9 vaccine candidate A/Anhui/1/2013 were included in Table 1 for reference. Also included were the amino acid compositions for avian H7N9 viruses that we downloaded from NCBI at these positions. The mutations from A/Anhui/1/2013 were bolded, which were mostly seen in the two Taiwanese isolates from the 3 rd and 4 th patients in 2014, suggesting that the H7N9 viruses have drifted away from the vaccine candidate. Many avian H7N9 viruses exhibited diverse genetic makeups at the listed amino acid positions in Table 1. In particular that PB2 191, 559, 570, 627 and PA-X 194 (underlined in Table 1) each displayed a dominant residue different from the ones in A/Anhui/1/2013. HA Q226L (Q235L in H7 numbering) was seen in the 2 nd through the 4 th Taiwanese patients. Even in avian H7N9 viruses, this mutation affecting receptor-binding was already seen in 396 of 460 HA sequences we examined. It was mentioned that two PB2 mutations Q591K and D701N could enhance polymerase activity of avian viruses in human 293T cells [27]. PB2 591 remained Q for all Taiwanese isolates, A/Anhui/1/2013, and all avian H7N9 viruses that we analyzed (data not shown). While PB2 701D was seen in A/Anhui/1/2013, a mutation to N was observed in two out of four Taiwanese isolates. On the contrary, there was only one displaying D701N among the avian H7N9 viruses.

Human H7N9 Residues on Human-Avian Signature Locations
Over 140 H7N9 genomes recorded between late March 2013 and late April 2015 were retrieved from the NCBI and GISAID databases to comprehensively explore the H7N9 amino acid transitions. In a previous study, we reported 47 species-associated signatures that potentially marked a genetic boundary at which an avian influenza A virus can efficiently transmit to or HA Q235L (H7 numbering, or Q226L in H3) is a receptor-binding site for human [42]. NA 68 (both H7 and H3 numbering) at neuraminidase stalk [58] to affect NA activity. PB2 E627K increases virus replication in mammalian cells [13,14,17]. PB2 D701N enhances transmission in guinea pigs [9]. replicate in humans [29]. Table 2 lists these residues and the associated amino acid compositions for the human H7N9 viruses. The residues for CGMH2 and A/Shanghai/2/2013 were included for comparison. HA and NA positions are missing in Table 2 because they were excluded in our previous studies [28,29]. Avian-like residues were identified in most of these signature positions for the H7N9 virus, indicating that it is avian in origin. A few exceptions included PB2 627 and PA 100, 356, and 409 because they exhibited human-like residues K, A, R, and N, respectively. Conversely, CGMH2 included only two human-like residues PA 100A and 356R. In addition, human H7N9 exhibited a truncated NS1 that was a 217-amino acid segment, thus missing the signature at position 227.

Position-specific Amino Acid Variations for H7N9
Only four human-avian signature positions developed human-like residues that dominated the H7N9 population (Table 2). No human-like residues were observed in other signature positions, suggesting that some nonsignature positions contribute to human infections. We scanned the entire set of 12 protein alignments and summarized the compositions for the positions exhibiting genetic diversity.  Table 2). In other words, more amino acid positions were free to evolve as nonsignature positions compared with the signature positions associated with the human-avian boundary. PB2 exhibited the most abundant 17 variations, followed by HA (16), PB1-F2 (13), PB1 (11), PA (11), NS1 (11), and NA (10). Of these 104 logos, 62 belonged to the RNP genes.

Amino Acid Cosubstitution in H7N9 PB2 Proteins
We further explored the interlacing of the 17 PB2 substitutions (Fig 1). Table 3 summarizes the multiple sequence alignment for these substitutions. PB2 sequence of A/Shanghai/2/2013 was used as the baseline for displaying residue changes. Sequences with no or only one substitution at these 17 locations were excluded for brevity. The remaining 78 sequences were divided into three temporal groups based on influenza seasonality: season I from March to September 30, 2013 (20 viruses), season II from October 1, 2013 to September 30, 2014 (40 viruses), and season III after October 1, 2014 (18 viruses). Within each season, the order of appearance was arbitrarily selected to enhance the illustration of the substitution patterns. In particular, amino acid substitutions cooccurring with K627E were bolded, and their counts were summarized (Table 3).
Along with PB2 K627E, four strains isolated early in season I exhibited Q591K. Such a cosubstitution reemerged only in a recent Xinjiang strain on January 8, 2015. Moreover, 14 viruses exhibited another covarying position D701N with K627E observed in seasons I and II but missing in season III. Although no cooccurrence of Q591K, K627E, and D701N was observed, other amino acid changes coemerged with K627E at various degrees. For example, V139I, K191E, V511I, M535L, N559T, M570I, I647V, and M676V each had cosubstitutions ranging from 8 to 21 instances with respect to K627E. Although S286G, R340K, M473V, K526R, T569A, and A588V were observed (Table 3), their association with K627E was seen in four or fewer instances.
Certain PB2 sites (Table 3) Table 3. The trio emerged in April 2013 in early stage of the outbreak and stayed in the H7N9 population in almost the entire seasons II and III, except that M570I seemed to fade out in half of the strains collected in season III. The alignment containing 79 PB2 sequences and 17 amino acid positions from Table 3 was used to infer their coupled mutations using MI. S1 Fig shows the covariation network for these 17 PB2 positions, including the predicted MI scores depicting the degree of covariation between any of the paired mutations. The same alignment, as well as the 79 full-length PB2 sequences of 759-aa long were used to infer two phylogenetic trees shown in S2 Fig, on which the 17 amino acid mutations outlined in Table 3 were labeled on tree branches for tracing their evolutionary pathway.

CGMH2 PB2 Protein Stereography
As listed in Table 3, 42 of the listed H7N9 viruses exhibited a human-like signature K at position 627, and 36 retained an avian-like E signature. This suggests the existence of other mutations, which would possibly compensate for the K627E change, enabling this avian virus to infect humans. To resolve the spatial correlation among the amino acid positions that exhibited covariations with PB2 627, a simulated CGMH2 PB2 structure was modeled based on the fulllength PB2 of A/little yellow-shouldered bat/Guatemala/060/2010(H17N10) (PDB ID 4WSB). This bat influenza H17N10 virus was chosen because it is the latest and the only full-length PB2 being resolved thus far [44], comparing with the other commonly used avian influenza H5N1 PB2 C-terminal domain (CTD) structure (PDB ID 3KC6) of 204-aa long covering only positions 538 to 741 of a full-length PB2 protein. Although this bat virus shares only 67.6% identity with the full-length PB2 of CGMH2, the simulated structure within CTD by using 4WSB was found qualitatively comparable with the one simulated by 3KC6 (data not shown). Similar to the findings of a previous report [9], the residues 591Q and 627E shared a surface (Fig 2). Spatially dispersed 139V, 191E, 473M, 511I, 526K, 647V, and 701N, which were distant from 627E, are illustrated in Fig 2C. Fig 2D illustrates the other side of the simulated protein structure, on which residues 569T and 570I were right next to each other and within the 627 domain.

RNP Activity for Covarying PB2 Amino Acids in Human H7N9 Virus
A CAT reporter RNP activity assay was performed to evaluate the effects of the PB2 K627E covariations. Only 10 mutations (amino acid positions with asterisks in Table 3) displaying at least six cosubstitutions with the 36 strains having K627E were tested. As shown in Fig 3, the RNP activity of PB2 627E (a signature characteristic to avian species) markedly reduced to 7.9% in the human cells. The Q591K and D701N residues accompanying K627E considerably restored the RNP activity back to 65.9% and 70.2%, respectively (both with P < 0.001). M535L is another mutation exhibiting the compensatory effect with 627E (17.7%, P < 0.05). However, the other seven amino acid substitutions that covaried with K627E exhibited no such effect.

Discussion
The mutation Q226L of influenza HA protein was reported to increase binding to receptors in human upper airway [24]. In 142 H7N9 HA sequences we examined in this study, 133 already showed L at this position. Only 4 strains displayed Q, including A/Shanghai/1/2013, A/  (Table 1). This raises the concern for avian H7N9 viruses to continue infecting human.
As presented in Table 1, the two CGMH viruses had identical genome sequences except at NA 16 and PB1 586. The two genomes were sequenced from the same specimen cultured using different culture systems. This result is consistent with those of other studies, in which a sequencing discrepancy was observed among different viruses isolated from a single patient. Lin et al. [45] determined the genome sequences from three samples of the first Taiwanese patient with an H7N9 infection: two samples each from a sputum and throat swab on one given day, and a third from a throat swab on another day. At PB2 627, a virulence factor for efficient viral replication, the sputum specimen produced a residue K for promoting replication in mammalian cells, whereas the other two remained E, as observed in most avian species. Through a pyrosequencing assay, Mok et al. [46] revealed that both residues R and K developed at an oseltamivir-resistant marker NA 292 (equivalent to position 289 in H7N9 numbering) as quasispecies; this was also observed in the sputum specimen of the first Taiwanese H7N9 patient. A change from R to K renders the virus resistant to oseltamivir. Examples such as these demonstrate how this quasispecies exhibited intrahost genome variability among samples collected from different tissues, on different days of disease progression, or when using different laboratory host systems. RNA viruses, in particular the influenza A viruses, are known to mutate frequently. The missing fidelity during genome replication could explain the sequencing discrepancies observed in these works. Whether such mutations were simply spontaneously and randomly produced or could be the result of different selection pressures such as tissues, culture systems, or host adaptation during disease progress, would require further investigation.
Although the two influenza virus surface genes HA and NA were subjected to host selection and were therefore generally more variable than the internal genes, the H7N9 RNP genes seemed to exhibit higher genetic diversity than did HA and NA; this can be seen in Fig 1, where numerous RNP residues exhibited more divided logos, suggesting the importance of polymerase genes in the evolution of the H7N9 virus for host adaptation. Another possible source for the observed genetic heterogeneity is the diverse avian genomes inherited by the H7N9 viruses. The novel H7N9 virus inherited six of its internal genes from the avian H9N2 viruses. Chen et al. [18] compared the H7N9 PB2 sequences with 287 avian H9N2 PB2 sequences and reported that their percent identities were markedly variable from as low as 81.4% to as high as 99.2%, indicating the intrinsic genetic diversities in H7N9 PB2 from various bird populations. Additional evidence for PB2 diversity was obtained in four of the earliest H7N9 human isolates reported in two weeks in Shanghai (A/Shanghai/1/2013-A/Shanghai/4/ 2013); in these isolates, 6 of the 759 amino acid positions had already exhibited substitutions (data not shown). These observations demonstrate that the novel H7N9 viruses inherited PB2 from multiple avian origins.
In addition to mutations at PB2 627, mutations at PB2 591 and 701 were strongly involved in mammalian host restriction [47]. In this study, we demonstrated that Q591K and D701N, each accompanied by K627E, restored the RNP activity (P < 0.001). These affirmed the finding by Mok et al. [27] that Q591K and D701N are crucial in compensating for the absence of 627K. In addition, M535L+K627E restored the RNP activity (P < 0.05). Although the two positions 591 and 627 were structurally close with apparent involvement in the same host factors, the residues 535 and 701 were spatially distant from 627, suggesting the involvement of a different mechanism. Of the 147 PB2 sequences we analyzed, only five had Q591K, and four of them occurred at an extremely early stage of H7N9 endemic in March and April 2013. In all, 15 D701N substitutions were observed, and none occurred after January 2014. Conversely, 23 M535L substitutions were observed in all three seasons, signifying its regular presence and potential role in the increasing H7N9 adaptation to humans.
Several other amino acids exhibited covarying patterns with K627E, which were consistent with those exhibited by Q591K, M535L, and D701N. However, the RNP activity assays suggested no correlation between the amino acid changes at these positions with PB2 K627E. Some of these substitutions, however, formed their own coevolving patterns. For example, we found {V139I, S286G, T569A, M676V}, occurring only in season II (November 2013-January 2014), in 12 strains from Shenzhen and Hong Kong, regardless of whether they were accompanied by 627E or K. Such amino acid coevolutions suggest their possible involvement in polymerase activities. We did not rule out any other coevolving residues that emerged in other H7N9 genes or across different genes. Further investigation is warranted to determine whether these changes occurred spontaneously or were correlated with host adaptability or other functions. RNP activity is measured by a constitutional assay. Influenza viral genes PA, PB1, PB2 and NP were cotransfected into cells to express RNP complex. Meanwhile, a reporter gene, such as luciferase, CAT, or GFP flanked by viral promoters was also transfected into cells to be driven by RNP complex [48]. RNP activity assay has not only been applied in H7N9 [26,27,49] but also in H5N1 avian influenza viruses [6,50,51], 2009 pandemic H1N1 influenza virus [52][53][54][55], and seasonal H3N2 influenza virus [51,54]. The RNP activity does not entirely correlate to the replication rate [56]. As a result, using this assay alone to fully assess the viral replication efficiency during crossing species should be considered as tentative. Nevertheless, it is a surrogate and reliable assay to examine the influence of any mutation of RNP genes on RNP activity. Moreover, PB2 is known to enter mitochondrion [57]. Other than RNP activity, the mutation K627E and other co-substitutions may affect the mitochondrion localization of PB2 protein.
Such mechanism may also associate with human adaption for this virus.
Chen et al. proposed species-specific signatures that have been widely used as markers to assess the possibility of human infection from avian influenza viruses. These signatures were determined according to an analysis of 306 human and 95 avian virus genomes in 2006 [28]; these signatures were revalidated in 2009 on the basis of an analysis of over 3,000 genomes of each of the two viruses [29]. Although an avian influenza A virus can be reasonably assumed to have acquired more human signatures thereby increasing its potential to infect humans, validation of such an assumption using only sequence analysis is challenging. Although biological experiments may be useful in proving such an assumption, these experiments may confer the risk of generating a potential pandemic strain. Currently, evidence supporting a correlation between the number of species-specific signatures and the possibility of human infection is lacking. Nevertheless, analyzing human-isolated avian influenza virus genomes is crucial because they may reveal alternative positions that compensate for mutations at certain signature or nonsignature positions.

Conclusion
Numerous human-isolated avian influenza A viruses exhibit PB2 627K, which is a human-specific signature, and strong biological evidence indicates that an E627K mutation promotes avian viral replication in mammals. However, 45 of the 147 human H7N9 isolates investigated in this study exhibited an avian signature E at this position. Additional sequence analysis suggested that 10 PB2 substitutions could potentially increase the efficiency of an avian virus in infecting humans in the absence of PB2 627K. We used a reporter assay to test all of these substitutions and showed that either Q591K, M535L, or D701N mutation increases the viral RNP activity in human cells with PB2 627E, suggesting that E627K, Q591K, M535L, and D701N are crucial markers for assessing the potential of an avian virus to infect humans, as well as for potentially increasing adaptation for the virus to gain human-to-human transmission capability in leading to a pandemic in the future.
Supporting Information S1 Fig. PB2 amino acid sites exhibiting coupled mutations using mutual information (MI). Based on the alignment in Table 3 (79 PB2 sequences for 17 amino acid sites) using Mutual Information Server To Infer Coevolution (http://mistic.leloir.org.ar/). MI scores are labeled on the arcs connecting these amino acids. Arcs are of variable thickness to approximate the MI scores. An MI threshold of 6.5 was used according to the server default setting. K627E was found coupled with D701N, N559T, K191E, K526R and Q591K. Two nearly remote clusters were also identified, including (M473V, V511I, M535L, I647V) and (V139I, S286G, T569A, M676V). (TIF) S2 Fig. PB2 phylogenetic trees for H7N9 viruses. PB2 sequences of H7N9 viruses listed in Table 3 were used by MEGA 6.0 to produce the Neighbor-Joining tree with 1,000 pseudo replicates. Amino acid substitutions were labeled at the tree branches to follow the trend of these 17 mutations. (A&B) The alignment contains only 17 amino acid positions that we intend to follow. (C&D) The alignment contains the entire 759-aa PB2 sequence. (TIF) S1