A Comparative Study of Short Linear Motif Compositions of the Influenza A Virus Ribonucleoproteins

Protein-protein interactions through short linear motifs (SLiMs) are an emerging concept that is different from interactions between globular domains. The SLiMs encode a functional interaction interface in a short (three to ten residues) poorly conserved sequence. This characteristic makes them much more likely to arise/disappear spontaneously via mutations, and they may be more evolutionarily labile than globular domains. The diversity of SLiM composition may provide functional diversity for a viral protein from different viral strains. This study is designed to determine the different SLiM compositions of ribonucleoproteins (RNPs) from influenza A viruses (IAVs) from different hosts and with different levels of virulence. The 96 consensus sequences (regular expressions) of SLiMs from the ELM server were used to conduct a comprehensive analysis of the 52,513 IAV RNP sequences. The SLiM compositions of RNPs from IAVs from different hosts and with different levels of virulence were compared. The SLiM compositions of 845 RNPs from highly virulent/pandemic IAVs were also analyzed. In total, 292 highly conserved SLiMs were found in RNPs regardless of the IAV host range. These SLiMs may be basic motifs that are essential for the normal functions of RNPs. Moreover, several SLiMs that are rare in seasonal IAV RNPs but are present in RNPs from highly virulent/pandemic IAVs were identified. The SLiMs identified in this study provide a useful resource for experimental virologists to study the interactions between IAV RNPs and host intracellular proteins. Moreover, the SLiM compositions of IAV RNPs also provide insights into signal transduction pathways and protein interaction networks with which IAV RNPs might be involved. Information about SLiMs might be useful for the development of anti-IAV drugs.


Introduction
Protein-protein interactions can be categorized into the following four classes: domain-domain interactions, mutual fit interactions, induced fit interactions and linear motif-domain interactions [1]. The binding site for linear motif-domain interactions is a short peptide of only a few (three to ten) residues that is called a ''short linear motif'' (SLiM) [1]. Three characteristics differentiate SLiMs from globular domains. The first characteristic is the ability of SLiMs to encode a functional interaction interface in a short (three to ten residues) and often poorly conserved sequence. The short length of the motifs also makes them much more likely to arise/disappear spontaneously via mutations, which make them more evolutionarily labile (i.e. likely to appear de novo in unrelated protein sequences) [1]. The second feature of SLiMs is that the richness of potential motifdomain interactions is higher than the domain-domain interactions within a given length of sequence. The third characteristic of SLiMs is that because only a small number of residues are involved, the interactions tend to be transient and have low binding affinities. Therefore, they are well suited for mediating functions that require a fast response to changing stimuli, such as interactions between SH2 motifs (which binds a phosphorylated tyrosine) and phosphorylation sites on its binding partners. These three characteristics may provide a flexible molecular basis for fast evolved proteins of RNA viruses with great versatility.
Several pioneering studies were significant for the characterization of SLiMs in viral proteins. Davey et al. collected 52 experimentally validated SLiMs present in viral proteins [2]. These examples of viral SLiMs are present in highly studied viral proteins that are responsible for relevant diseases, such as cancers (human papillomavirus, Epstein-Barr virus, human Tcell lymphotropic virus and adenovirus), immunodeficiency (HIV) or the flu (influenza). Currently, a comprehensive SLiM database has been established that is called the Eukaryotic Linear Motif (ELM) Resource for Functional Sites in Proteins [3]. Based on the motif patterns provided in the ELM database, computational analysis can be performed to identify high potential SLiMs in target proteins and can reduce the arduous and high cost laboratory procedures that are required to identify them.
The ribonucleoprotein (RNP) complex of influenza A virus (IAV), which is composed of the PA, PB1, PB2 and NP proteins, is essential for virus replication in cells. The RNP complex replicates the segments of the RNA virus genome and transcribes its genes [4]. Moreover, the RNP complex affects the evolution of IAV through its error-prone RNA polymerase, which produces variants of the viral proteins, including the HA, NA and the RNP themselves. Therefore, virus strains that are better adapted to a new host species are created [5]. Additionally, the RNP complex represents a promising drug target because its activities are distinct from RNA polymerase found in the host cell [6]. However, despite its biomedical importance, the absence of detailed SLiM information of the RNPs has limited our mechanistic understanding of RNP functions and the ability to design better drugs.
The present study sought to gain a deeper understanding of IAV RNP-host interactions that affect RNP activity in human cells. Using a functional proteomics approach, 96 SLiM consensus sequences (regular expressions) from the ELM server [3] were used to perform a systemic and comprehensive analysis of IAV RNPs. A comparative study of the SLiM composition of RNPs from IAVs from different hosts and highly virulent/pandemic (HP) IAV strains was performed. Several SLiMs, including highly conserved SLiMs, IAV host specific SLiMs and/or HP IAV specific SLiMs, that might affect RNP function were identified. The results of this study not only provide information on the SLiM compositions of IAV RNPs but also provide insights into the signal transduction pathways and protein interaction networks which IAV RNPs might be involved in.

Data
In total 63,237 sequences from IAV RNPs were retrieved from the NCBI Influenza Database. After checking for completeness by assessing the N-terminus and the length, 52,505 IAV RNP sequences were used in this study. This data set includes 18,952, 29,230 and 4,323 RNP sequences from IAVs from avian, human and mammalian hosts, respectively (Information S1). Hosts of the avian and mammalian IAVs are listed in Information S2. A set of 845 RNP sequences (Information S3) from highly virulent/ pandemic (HP) IAVs, including the 1918 H1N1 IAV from the ''Spanish Flu'', the H2N2 IAV from the 1957 outbreak, the H3N2 IAV from the 1957 outbreak, the H1N1 IAV from the 1977 Russia outbreak, the 2009 H1N1 IAV from the ''swine flu'', the H5N1 IAV from the 1997 Hong Kong outbreak and the 2004-2008 highly pathogenic H5N1IAVs from Vietnam, Indonesia and Thailand were analysed.
Information regarding the SLiMs was retrieved from the ELM server (the Eukaryotic Linear Motif Resource for Functional Sites in Proteins) [3]. SLiMs were classified into four types: protease cleavage sites (prefix CLV), protein motif interacting/binding sites (prefix LIG), posttranslational modification sites (prefix MOD) and subcellular targeting signals (prefix TRG) [3]. In total, 96 SLiMs that are each supported by more than five real sequences were used in this study and are listed in Information S4.

Statistical Methods
The tests for differences among k proportions were performed as follows [7]: n i and q q~1{ p p: The log-likelihood ratio tests for independence were performed as follows [7]: , wheref f ij is the expected frequency of f ij : The degree of freedom n = (i21)(j21).
The Shannon entropy H was introduced by Shannon as a measurement of uncertainty [8]. This method has been applied to measure the diversity of amino acids to identify biologically important amino acids in viral proteins from Papillomavirus [9], West Nile virus [10], HCV [11] and IAV [12,13,14]. The Shannon diversity index of each SLiM was computed by the formula: H~{ P p log p, where p is the proportion of each SLiM [7]. Identity Distributions of Pairwise Alignments. For a given a SLiM, all sequences harbor the SLiM from each host class were used to perform pairwise alignments and to compute the identity of each pair. For example, the SLiM LIG_PT-B_Apo_2_328 was identified in 4777, 4715 and 746 PA sequences from avian, human and mammalian IAVs, respectively. The 11,407,476 identities from (47776(477721)/2) pairwise alignments were computed using PA sequences from avian IAVs. Similarly, the 11,113,255 identities from (47156(471521)/2) pairwise alignments were computed using PA sequences from human IAVs. The 277,885 identities from (7466(74621)/2) pairwise alignments were computed using PA sequences from mammalian IAVs. Then, the distributions of the three sets of identities were plotted together.

Perl Programming
The computer programs that were used in this study for data manipulation and pattern (regular expression) match were written by the author using the Perl programming language. The program used for this data analysis is available on request.

An overview of the motif-based diversity of IAV RNP sequences
In total, 96 SLiM consensus sequences (regular expressions) were retrieved from the ELM server and were used to analyze the diversity of SLiM compositions for 52,505 IAV RNP sequences (Information S1). For each RNP, the occurrence of a SLiM at a position in the RNP is computed by the number of the RNP sequences with the SLiM at a given position divided by total number of the RNP sequences. For example, 7,222 PA protein sequences from human IAVs were used in this study. A SLiM with an occurrence of 1% for the PA protein from human IAVs means that 72 of the 7,222 PA protein sequences from human IAVs have the SLiM at the same position. As shown in Figure 1A, 1C, 1E and 1G, the identified SLiMs can be divided into the following three categories: an occurrence of greater than 90%, an occurrence between 90-10% and an occurrence of less than 10%. The group of SLiMs with an occurrence of over 90% (highly conserved) may be basic functional motifs for each RNP. A small fraction of the SLiMs with an occurrence between 90-10% forms the second group which represents partially conserved motifs (conserved in a subset of a RNP). SLiMs of this group have higher Shannon diversity indices than those from the other groups for all four RNPs ( Figure 1B, 1D, 1F and 1H). In contrast, most of SLiMs belong to the third group, which occur in less than 10% of the RNP. These results indicate that most SLiMs might be created sporadically by mutations and might be present in specific IAV strains. Together, the combination of occurrences and the Shannon diversity index can be used to distinguish different types of diversity of the SLiM composition. As shown in Figure 1, the first group of SLiMs has low Shannon diversity index value and high occurrence (greater than 90%), which represents highly conserved motifs (common for all IAVs). The second group of SLiM has high Shannon diversity index value and occurrence of 90-10%, which represents partially conserved motifs. However, the number of SLiMs in this group is few (Table 1). In contrast, the third group of SLiM has both low Shannon diversity index value and low occurrence (less than 10%). The number of SLiMs in this group is plenty ( Figure 1 and Table 1). The average numbers of SLiMs per gene (numbers in the brackets beside the raw frequency in Table 1) indicate the second and third SLiM groups represent different types of SLiM composition diversity.

Comparison of PA protein SLiM compositions among
IAVs from different hosts. To gain a deeper understanding of the SLiM composition of IAV RNPs, the SLiM compositions of IAV RNPs from different hosts were compared. Using the PA protein as an example, the comparison of the SLiM composition of PA proteins among IAVs from avian (A_PA), human (H_PA) and mammalian (M_PA) hosts reveals that the 791 identified SLiMs can be classified into three groups (Information S5). The first group is composed of 80 highly conserved SLiMs (with an occurrence of greater than 90% in all PA protein sequences) that are common in all PA proteins regardless of the IAV host range (Information S6). The 80 SLiMs may be basic motifs that are essential for normal PA protein functions. The second group includes 24 partially conserved SLiMs (with an occurrence between 90-10% for all PA protein sequences). The third group contains 687 low occurrence SLiMs (with an occurrence of less than 10% in all PA protein sequences). 21 locations that contain two or more overlapping SLiMs from the first group were found (red rectangles in Information S6). Locations with highly conserved overlapping SLiMs may represent short protein domains that can respond to multiple host factors/pathways (see discussion).
To uncover IAV host specific motifs in PA proteins in the second group, the test for difference among k proportions was performed. Because of the large sample size used in this study, a p value of 10 2100 was used as the cut-off value. In total, 14 SLiMs that have a p value of less than 10 2100 and have an occurrence of greater than 80% in the PA protein sequences from avian, human or mammalian IAVs were identified. Moreover, the log-likelihood ratio tests were performed to test the dependence between the existence of a SLiM and the host origin of the PA protein. All 14 SLiMs have a p value of less than 0.05 indicate there are dependences between the existence of the 14 SLiMs and the host origin of PA proteins. As shown in Figure 2A, all 14 SLiMs have a lower occurrence in PA proteins from human IAVs than in PA proteins from avian and mammalian IAVs. Notably, three of the SLiMs (LIG_SPAK-OSR1_1_204, MOD_PIKK_1_274 and MOD_GSK3_1_402) occur rarely in PA proteins from human IAVs. It is known that the PA sequences are not completely independent because there are phylogenetic relationships between them. A SLiM may be derived either from sequences of the same lineage (founder effect) or from host adaptation (convergent evolution). To reveal the underlying phylogenetic relationship, all PA sequences from each host class were used to perform pairwise alignments and the identities of all sequence pairs were computed ( Figure 2B). Moreover, all sequences harbor a SLiM from each host class were used to perform pairwise alignments and the identities of all sequence pairs were computed. Two of the 14 SLiMs are shown in Figure 2C and 2D as examples. If two PA protein sequences with an identity greater than 95% are considered as sequences from the same lineage, then a SLiM identified from PA protein sequences with an identity greater than 95% may represent a result of founder effect. In contrast, a SLiM identified from PA protein sequences with an identity less than 95% may represent an event of host adaptation (convergent evolution). Results in Figure 2C and 2D suggest both of the founder effect and host adaptation were occurred. Similar phenomena were found for other SLiMs (Information S7).

Comparison of the SLiM compositions of PA protein from IAVs with different virulence
To uncover potential IAV virulence-associated motifs in PA proteins, a comparison of PA SLiM compositions from all IAVs and HP IAVs was conducted. The 152 SLiMs identified can be classified into three groups (Information S8). The first group is composed of 80 highly conserved SLiMs (with an occurrence of greater than 90% in all PA protein sequences) that are common in all PA proteins regardless of IAV virulence. The second group includes 24 partially conserved SLiMs (with an occurrence between 90-10% in all PA protein sequences). The third group has 48 low occurrence SLiMs (with an occurrence of less than 10% in all PA protein sequences). Therefore, the number of candidate motifs in the third group was reduced from 687 to 48. If a SLiM appears in PA proteins from HP IAVs but is very rare in PA proteins from human IAVs, it may be associated with the virulence of HP IAVs through its effect on the function of the PA protein.
Using two criteria, a very low occurrence (less than 10%) in human IAV PA proteins and its presence in HP IAV PA proteins, 47 SLiMs from the second (24 motifs) and third (48 motifs) groups were identified. Moreover, a SLiM from the second group that has a high occurrence in PA proteins from both avian and mammalian IAVs but a low occurrence (19.5%) in PA proteins from human IAVs was found. The 48 SLiMs (Information S9) are candidate sites that might affect PA protein activity and might be associated with IAV transcription and/or replication efficiency. 10  Comparison of the SLiM compositions of PB1 proteins among IAVs from different hosts A comparison of PB1 SLiM compositions among IAVs from avian (A_PB1), human (H_PB1) and mammalian (M_PB1) hosts reveals that the 783 identified SLiMs can be classified into three groups (Information S10). The first class is composed of 81 highly conserved SLiMs (with an occurrence of greater than 90% in all PB1 protein sequences) that are common in all PB1 proteins regardless of the IAV host range (Information S11). These 81 SLiMs may be basic motifs that are essential for normal PB1 protein functions. The second class includes 13 partially conserved SLiMs (with an occurrence between 90-10% in all PB1 protein sequences). The third class has 689 low occurrence SLiMs (with an occurrence of less than 10% in all PB1 protein sequences). 17 locations that contain two or more overlapping SLiMs from the first group were identified (red rectangles in Information S11).
To uncover IAV host specific motifs in PB1 proteins in the second group, the test for difference among k proportions is performed. Using the p value of 10 2100 as a cut off value, 9 SLiMs were identified that have an occurrence of greater than 80% in the PB1 protein sequences from avian, human or mammalian IAVs. Moreover, the log-likelihood ratio tests were performed to test the dependence between the existence of a SLiM and the host origin of the PB1 protein. All 9 SLiMs have a p value of less than 0.05 indicate there are dependences between the existence of the 9 SLiMs and the host origin of PB1 proteins. As shown in Figure 3A, 8 of the 9 SLiMs have a lower occurrence in PB1 proteins from human IAVs than PB1 proteins from avian and mammalian IAVs. Notably, two of them (LIG_MAPK_1_584 and MOD_-PAK_2_429) have a very low occurrence in PB1 proteins from human IAVs. In contrast, The SLiM MOD_PIKK_1_580 is specific to the PB1 proteins from human and mammalian IAVs. To reveal the underlying phylogenetic relationship, all PB1 sequences from each host class were used to perform pairwise alignments and the identities of all sequence pairs were computed ( Figure 3B). Moreover, all sequences harbor a SLiM from each host class were used to perform pairwise alignments and the identities of all sequence pairs were computed. Two of the 9 SLiMs are shown in Figure 3C and 3D as examples. If two PB1 protein sequences with an identity greater than 95% are considered as sequences from the same lineage, then a SLiM identified from PB1 protein sequences with an identity greater than 95% may represent a result of founder effect. In contrast, a SLiM identified from PB1 protein sequences with an identity less than 95% may represent an event of host adaptation (convergent evolution). Results in Figure 3C and 3D suggest both of the founder effect and host adaptation were occurred. Similar phenomena were found for other SLiMs (Information S12).

Comparison of the SLiM composition of the PB1 protein from IAVs of different levels of virulence
To uncover potential IAV virulence-associated motifs in PB1 proteins, a comparison of PB1 SLiM compositions from all IAVs and HP IAVs was conducted. The 126 SLiMs identified can be classified into three groups (Information S13). The first group is composed of 81 highly conserved SLiMs (with an occurrence of greater than 90% in all PB1 protein sequences) that are common in all PB1 proteins regardless of IAV virulence. The second group includes 12 partially conserved SLiMs (with an occurrence between 90-10% in all PB1 protein sequences). The third group has 33 low occurrence SLiMs (with an occurrence of less than 10% of all PB1 protein sequences). Therefore, the number of candidate motifs in the third group was reduced from 689 to 33. If a SLiM appears in PB1 proteins from HP IAVs but is very rare in PB1 proteins from human IAVs, it may be associated with the virulence of HP IAVs through its effect on the function of the PB1 protein.
Using two criteria, a very low occurrence (less than 10%) in human IAV PB1 proteins and the presence in HP IAV PB1 proteins, 33 SLiMs from the second and third groups were identified. Moreover, two SLiMs from the second group were found that have a high occurrence in PB1 proteins from avian and mammalian IAVs but a low occurrence (approximately 20%) in PB1 proteins from human IAVs. The 35 SLiMs (Information S14) are candidate sites that might affect PB1 protein activity and might be associated with IAV transcription and/or replication efficiency. Notably, 2 of the 35 SLiMs (MOD_PAK_2_429 and LIG_-MAPK_1_584) are both avian and/or mammalian IAV specific (labelled ''A & M'' in Information S14).

Comparison of the SLiM compositions of PB2 proteins among IAVs from different hosts
A comparison of PB2 SLiM compositions among IAVs from avian (A_PB2), human (H_PB2) and mammalian (M_PB2) hosts reveals that the 712 identified SLiMs can be classified into three groups (Information S15). The first class is composed of 94 highly conserved SLiMs (with an occurrence of greater than 90% of all PB2 protein sequences) that are common in all PB2 proteins regardless of the IAV host range (Information S16). The 94 SLiMs may be basic motifs that are essential for normal PB2 protein functions. The second class includes 25 partially conserved SLiMs (with an occurrence between 90-10% of all PB2 protein sequences). The third class has 593 low occurrence SLiMs (with an occurrence of less than 10% of all PB2 protein sequences). In total, 23 locations that contain two or more overlapping SLiMs from the first group were found (red rectangles in Information S16).
To uncover IAV host specific motifs in PB2 proteins in the second group, the test for difference among k proportions is performed. Using the p value of 10 2100 as a cut-off value, 9 SLiMs that have an occurrence of greater than 80% in PB2 protein sequences from avian, human or mammalian IAVs were identified. Moreover, the log-likelihood ratio tests were performed to test the dependence between the existence of a SLiM and the host origin of the PB2 protein. All 9 SLiMs have a p value of less than 0.05 indicate there are dependences between the existence of the 9 SLiMs and the host origin of PB2 proteins. As shown in Figure 4A, 7 of the 9 SLiMs have lower occurrence in the PB2 proteins from human IAVs than the PB2 proteins from avian and mammalian IAVs. Notably, 3 of the 9 SLiMs (LIG_14-3-3_2_555, MOD_PAK_1_268 and MOD_PAK_2_268) have a very low occurrence in PB2 proteins from human IAVs. In contrast, 2 SLiMs (MOD_CK2_1_681 and MOD_GSK3_1_681) are specific to PB2 proteins from human and mammalian IAVs. To reveal the underlying phylogenetic relationship, all PB2 sequences from each host class were used to perform pairwise alignments and the identities of all sequence pairs were computed ( Figure 4B). Moreover, all sequences harbor a SLiM from each host class were used to perform pairwise alignments and the identities of all sequence pairs were computed. Two of the 9 SLiMs are shown in Figure 4C and 4D as examples. If two PB2 protein sequences with an identity greater than 95% are considered as sequences from the same lineage, then a SLiM identified from PB2 protein sequences with an identity greater than 95% may represent a result of founder effect. In contrast, a SLiM identified from PB2 protein  sequences with an identity less than 95% may represent an event of host adaptation (convergent evolution). Results in Figure 4C and 4D suggest both of the founder effect and host adaptation were occurred. Similar phenomena were found for other SLiMs (Information S17).

Comparison of the SLiM composition of PB2 proteins from IAVs with different levels of virulence
To uncover potential IAV virulence-associated motifs in PB2 proteins, a comparison of PB2 SLiM compositions from all IAVs and HP IAVs was conducted. The 157 SLiMs identified can be classified into three groups (Information S18). The first group is composed of 94 highly conserved SLiMs (with an occurrence of greater than 90% in all PB2 protein sequences) that are common in all PB2 proteins regardless of IAV virulence. The second group includes 23 partially conserved SLiMs (with an occurrence between 90-10% in all PB2 protein sequences). The third group has 40 low occurrence SLiMs (with an occurrence less than 10% in all PB2 protein sequences). Therefore, the number of candidate motifs in the third group was reduced from 593 to 40. If a SLiM appears in the PB1 proteins from HP IAVs but is very rare in PB2 proteins from human IAVs, it may be associated with the virulence of HP IAVs through its effect on the function of the PB2 protein.
Using two criteria, a very low occurrence (less than 10%) in human IAV PB2 proteins and the presence in HP IAV PB2 proteins, 41 SLiMs from the second and third groups were identified. Moreover, a SLiM from the second group was found that has a high occurrence in PB2 proteins from avian and mammalian IAVs but a low occurrence (25.4%) in PB2 proteins from human IAVs. The 42 SLiMs (Information S19) are candidates sites that might affect PB2 protein activity and might be associated with IAV transcription and/or replication efficiency. Importantly, 14 of the 42 SLiMs are even more notable. Three of them (MOD_CK2_1_336, LIG_FHA_2_337 and LIG-TRAF2_1_339) are avian IAV specific (labelled ''A'' in Information S19). Another eight SLiMs (LIG_SH3_3_536, TRG_LysEn-d_APsAcLL_1_441, MOD_PKA_2_659, LIG_APCC_KENbox_ 2_698, MOD_CK2_1_714, LIG_FHA_2_715, TRG_NLS_ MonoCore_2_735 and TRG_NLS_MonoExtN_4_736) are mammalian IAV specific (labelled ''M'' in Information S19). Three

Comparison of the SLiM composition of NP proteins among IAVs from different hosts
A comparison of NP SLiM compositions among IAVs from avian (A_NP), human (H_NP) and mammalian (M_NP) hosts reveals that the 630 identified SLiMs can be classified into three groups (Information S20). The first class is composed of 37 highly conserved SLiMs (with an occurrence of greater than 90% in all NP protein sequences) that are common in all NP proteins regardless of IAV host range (Information S21). The 37 SLiMs may be basic motifs that are essential for normal NP protein functions. The second class includes 28 partially conserved SLiMs (with an occurrence between 90-10% in all NP protein sequences). The third class has 565 low occurrence SLiMs (with an occurrence of less than 10% in all NP protein sequences). 6 locations that contain two or more overlapping SLiMs from the first group were found (red rectangles in Information S21).
To uncover IAV host specific motifs in NP proteins in the second group, the test for differences among k proportions is performed. Using the p value of 10 2100 as a cut-off value, 13 SLiMs that have an occurrence of greater than 80% in the NP protein sequences from avian, human or mammalian IAVs were identified. Moreover, the log-likelihood ratio tests were performed to test the dependence between the existence of a SLiM and the host origin of the NP protein. All 13 SLiMs have a p value of less than 0.05 indicate there are dependences between the existence of the 13 SLiMs and the host origin of NP proteins. As shown in Figure 5A, 10 of the 13 SLiMs have a lower occurrence in the NP proteins from human IAVs than in the NP proteins from avian and mammalian IAVs.
Notably, 2 of them (LIG_BRCT_BRCA1_1_309 and LIG_MAPK_1_98) have a very low occurrence in NP proteins from human IAVs. In contrast, 2 SLiMs (MOD_SUMO_451 and TRG_ENDOCY- TIC_2_97) are specific to the NP proteins from human and mammalian IAVs. To reveal the underlying phylogenetic relationship, all NP sequences from each host class were used to perform pairwise alignments and the identities of all sequence pairs were computed ( Figure 5B). Moreover, all sequences harbor a SLiM from each host class were used to perform pairwise alignments and the identities of all sequence pairs were computed. Two of the 13 SLiMs are shown in Figure 5C and 5D as examples. If two NP protein sequences with an identity greater than 95% are considered as sequences from the same lineage, then a SLiM identified from NP protein sequences with an identity greater than 95% may represent a result of founder effect. In contrast, a SLiM identified from NP protein sequences with an identity less than 95% may represent an event of host adaptation (convergent evolution). Results in Figure 5C and 5D suggest both of the founder effect and host adaptation were occurred. Similar phenomena were found for other SLiMs (Information S22).

Comparison of the SLiM composition of NP proteins from IAVs with different virulence
To uncover potential IAV virulence associated motifs in NP proteins, a comparison of NP SLiM compositions from all IAVs and HP IAVs was conducted. The 83 SLiMs identified can be classified into three groups (Information S23). The first group is composed of 37 highly conserved SLiMs (with an occurrence of greater than 90% in all NP protein sequences) that are common in all NP proteins regardless of IAV virulence. The second group includes 25 partially conserved SLiMs (with an occurrence between 90-10% in all NP protein sequences). The third group has 21 low occurrence SLiMs (with an occurrence of less than 10% in all NP protein sequences). Therefore, the number of candidate motifs in the third group was reduced from 565 to 21. If a SLiM appears in NP proteins from HP IAVs but is very rare in NP proteins from human IAVs, it may be associated with the virulence of the HP IAVs through its effect on the function of the NP protein. Using two criteria, a very low occurrence (less than 10%) SLiMs at the vicinity of amino acids that were associated with host adaptation of IAV PA proteins Several amino acid sites (AASs) in IAV RNPs were reported to affect IAV RNP activity or were associated with IAV host adaptation [15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31]. In total, 99 AASs (25 in the PA protein, 16 in the PB1 protein, 31 in the PB2 protein and 27 in the NP protein) from these reports were mapped to 185 SLiMs (42 in the PA protein, 35 in the PB1 protein, 67 in the PB2 protein and 41 in the NP protein) identified in this study ( Table 2-5). For instance, Gabriel et al used the highly pathogenic avian IAV SC35 to demonstrate that 7 AASs in IAV RNPs are associated with host adaptation [22]. All 7 of the AASs have corresponding SLiMs identified in this study. The AAS 615 of the PA protein can be mapped to LIG_FHA_2_612, LIG_14-3-3_3_615 and LIG_14-3-3_1_615 of the PA protein ( Table 2). The AAS 13 of the PB1 protein can be mapped to MOD_PIKK_1_11 and MOD_GSK3_1_13 of the PB1 protein ( Table 3). The AAS 678 of the PB1 protein can be mapped to MOD_PIKK_1_675 of the PB1 protein (Table 3). The AAS 333 of the PB2 protein can be mapped to MOD_PKA_1_331, MOD_PKA_2_331, MOD_ CK2_1_335, MOD_CK2_1_336 and LIG_FHA_2_336 of the PB2 protein (Table 4). The AAS 701 of the PB2 protein can be mapped to LIG_MAPK_1_702 of the PB2 protein ( Table 4). The AAS 714 of the PB2 protein can be mapped to MOD_CK2_1_714, LIG_FHA_2_715 and MOD_SUMO_717 of the PB2 protein ( Table 4). The AAS 319 of the NP protein can be mapped to LIG_APCC_KENbox_2_318 and MOD_ GSK3_1_319 of the NP protein (Table 5). Altogether, SLiMs identified in this study provide possible molecular mechanisms that may explain the activity, interaction or localization changes of IAV RNPs caused by those AAS changes.

Proposed cellular processes RNPs may be involved through SLiMs identified
The compositions of SLiMs in RNPs provide information regarding the pathways that the RNPs may be involved in. As shown in Table 6, RNPs with SH2 and SH3 ligand motifs, LIG_MAPK_1, LIG_14-3-3, LIG_FHA_2 and protein kinase phosphorylation sites may be involved in the MAPK, Wnt and  [3]. Moreover, RNPs with TRG_LysEnd_AP-sAcLL_1, TRG_ENDOCYTIC_2, LIG_EH1_1 and LIG_Ac-tin_WH2_2 may interact with actin and be involved in intracellular trafficking pathways [3]. All of these host cellular processes and pathways have been reported to be involved in postentry steps of IAV replication [32,33]. The different compositions of SLiMs among RNPs reflect the functional diversity of RNPs. Each RNP with a different SLiM composition has a varying ability to interact with different cellular processes and signal transduction pathways, and results in different impacts on viral replication and host adaptation.

Discussion
In total, 292 highly conserved SLiMs were found in IAV RNPs regardless of IAV host range. These SLiMs may be basic motifs that are essential for the normal function of RNPs. Two of them have been experimentally identified in IAV RNP proteins. The first SLiM is the nuclear localization signal (NLS) located between amino acid 182-217 in the IAV PB1 proteins [34]. Several NLS associated SLiMs were identified in this study as shown in Information S11. The second SLiM is the nuclear localization signal (NLS) located in the C-terminal of the IAV PB2 proteins [35]. The NLS associated SLiM was identified in this study as shown in Information 16. These examples suggest that computa- Table 3. SLiMs mapped to the vicinity of amino acids that were reported as genetic signatures or are associated with the adaptation of IAV PB1 proteins to the host.   [20,25] tional prediction of SLiM is helpful for identification of important function motifs in viral proteins.
In total, 67 locations with overlapping SLiMs were identified among the 292 highly conserved SLiMs in RNPs (red rectangles in Information S6, S11, S16, S21). These overlapping SLiMs may act together through three mechanisms. First, multiple SLiM interactions may be used cooperatively to increase the specificity and strength with which two proteins bind to each other. Second, multiple SLiMs may enable interaction between different cellular signals sequentially. For example, the function of the first SLiM may lead to the action of the second SLiM. Third, multiple SLiMs may also enable the interaction between different cellular signals competitively. A protein may contain different SLiMs that target the same amino acid residue for different post translational modifications as inputs from different cellular signals. This could lead to competition (i.e. an interaction) between the two signals, with different enzymes competing to modify the same residue. The different post translational modification states of the motif could bind to different interaction domains and result in different output signals from the interaction.
The SLiMs which have a very low occurrence in RNPs from human IAVs but present in RNPs from HP IAVs could be candidates for novel virulent determinants that are worthy to be further investigated. For example, 10 SLiMs (LIG_SPAK-OSR1_1_204, MOD_PIKK_1_274 and MOD_GSK3_1_402 in PA proteins; MOD_PAK_2_429 and LIG_MAPK_1_584 in PB1 proteins; MOD_PKA_1_268, MOD_PKA_2_268 and LIG_14-3-3_2_555 in PB2 proteins; and LIG_MAPK_1_98 and LIG_BRCT_BRCA1_1_309 in NP proteins) have a very low occurrence in RNPs from human IAVs but have a high occurrence in RNPs from avian and mammalian IAVs. Moreover, all 10 of the SLiMs were found in RNPs from HP IAVs (Information S9, S14, S19, S24). Therefore, they may represent emerging SLiMs in RNPs of avian IAV origin which are in the early stage of adaptation to human hosts. Another type of SLiMs that have a low occurrence in RNPs from avian, human and mammalian IAVs but are present in HP IAVs may also be potential virulent determinants that occurred by coincidence (Information S9, S14, S19, S24). Many proteins are regulated by post-translational modifications (PTMs) that may mediate allosteric effects or create binding sites important for protein-protein interactions where ligand domains can bind to phosphorylated, methylated or sumoylated sites. As described for the ELM server, SLiMs can be classified into four types of functional sites: ligand sites (LIG), PTM sites (MOD), proteolytic cleavage and processing sites (CLV), and sites for subcellular targeting (TRG) [3]. These functional assignments are useful in that they encompass the range of peptide motif activities. Furthermore, they can also help explain why many amino acid sites have been experimentally demonstrated to be functionally important for RNPs but do not have corresponding SLiMs in this study. For example, the glutamic acid at PB2 position 627 is generally found in avian viruses, whereas nearly all human isolates carry a lysine at this position [36]. Available data suggests that PB2 position 627 determines the temperature sensitivity of vRNA replication [37]. Viruses with PB2 627K can efficiently replicate in the mammalian upper respiratory tract, whereas those that possess PB2 627E cannot [38]. A PB2 E627K mutation enhances avian virus replication in mammalian cells at 33uC, but not at 37uC or 41uC, in vitro [37]. A lack of a corresponding SLiM suggests the cold sensitivity of avian virus polymerases with PB2 627E may be because the global domain conformation changes in the PB2 protein are directly affected by the residue itself rather than mediated by a gain or loss of a post-translational modification target site (SLiM).
To validate the putative SLiMs identified in this study several experimental methods can be used. The first method is the reverse genetics technology that is generally used for validation of IAV protein function/activity affected by different amino acid mutations [39,40]. The reverse genetics can be coupled with different function assays. For example, to validate the influence of a SLiM in virulence, virus particles produced by reverse genetics can be used to infect model animals (mouse, ferret, swine or primate). The survival rate, pathological changes, cytokine levels in blood could be measured [41,42]. Interactions between IAV RNPs and known host factors through SLiMs (e.g. LIG_SH2_STAT5 and LIG_-TRAF2) identified in this study can be validated by biomolecular fluorescence complementation (BiFC) [43,44] and split luciferase complementation assay (SLCA) [45]. Localization of RNPs mediated by targeting signal SLiMs such as nuclear export signal and nuclear localization signal can be validated by fluorescence recovery after photobleaching (FRAP) [46]. Specific modification such as sumoylation can be validated by immunoblot of SUMO specific antibody [47,48].
Using protein-protein interactions as targets for antiviral chemotherapy has been proposed over a decade [49]. Currently, this idea is considered in development of antiviral drugs for flaviviruses and HIV [50,51]. To interfere in protein-protein interactions, using peptides that mimic the interaction motifs is one of the most straightforward approaches [52]. Several reports demonstrated that peptide-mediated interference in IAV poly-merase complex assembly can attenuate IAV replication [53][54][55][56][57]. SLiMs such as PDZ motif [58], LIG_SH2_GRB2 [59] are being explored as drug targets. Since viruses have evolved to use motifs for essential functions by hijacking host proteins [60], identification of SLiMs which mediate interactions between viral protein and host factors may provide valuable and specific information for development of motif mimetic drugs to perturb the interactions to treat virus infections [2].
Inhibition of interactions between viral proteins has the advantage of high specificity and low side effect. However, resistant strains may appear from fast co-evolution of RNA virus proteins under selection pressures. The possibility of co-evolution Table 5. SLiMs mapped to the vicinity of amino acids that were reported as genetic signatures or are associated with the adaptation of IAV NP proteins to the host.  of RNA virus proteins and mammalian host proteins, on the other hand, is expected to be extremely low. Another concern is that the inability of a synthetic peptide to penetrate cells precluded it from therapeutic usefulness. Nevertheless, discovery of peptidomimetic compounds can be pursued based on the structure of the effective peptide.
In this study, the compositions of SLiMs (target sites of posttranslational modifications) of IAV RNPs were analyzed. Three groups of SLiMs with different occurrences for each RNP were found. The SLiMs identified in this study provide an invaluable resource for experimental virologists to study the interactions between IAV RNPs and host intracellular proteins. Moreover, the SLiM compositions of IAV RNPs also provide insights into the signal transduction pathways and protein interaction networks with which IAV RNPs might be involved or interfere. The information of SLiM mediated virus-host protein interactions might be helpful for the development of anti-IAV drugs.