Identification and Chronological Analysis of Genomic Signatures in Influenza A Viruses

An increase in the availability of data on the influenza A viruses (IAV) has enabled the identification of the potential determinants of IAV host specificity using computational approaches. In this study, we proposed an alternative approach, based on the adjusted Rand index (ARI), for the evaluation of genomic signatures of IAVs and their ability to distinguish hosts they infected. Our experiments showed that the host-specific signatures identified using the ARI were more characteristic of their hosts than those identified using previous measures. Our results provided updates on the host-specific genomic signatures in the internal proteins of the IAV based on the sequence data as of February 2013 in the National Center for Biotechnology Information (NCBI). Unlike other approaches for signature recognition, our approach considered not only the ability of signatures to distinguish hosts (according to the ARI), but also the chronological relationships among proteins. We identified novel signatures that could be mapped to known functional domains, and introduced a chronological analysis to investigate the changes in host-specific genomic signatures over time. Our chronological analytical approach provided results on the adaptive variability of signatures, which correlated with previous studies’ findings, and indicated prospective adaptation trends that warrant further investigation.


Introduction
Influenza A viruses (IAV) are members of the Orthomyxoviridae family, and are enveloped negative-stranded RNA viruses with a segmented genome [1]. The envelope of an IAV consists of 2 surface glycoproteins, HA and NA, and a small domain of the M2 protein, underlain by the matrix protein M1. The IAV have the capacity to evade host immune systems because of a wide variety of potential combinations of the 16 HA and 9 NA subtypes. Because of their vast genetic diversity and unique host range, IAV have caused recurrent annual epidemics and several major worldwide pandemics in human history.
The accumulation of point mutations during genome replication, and the reassortment of viral gene segments during mixed infections, promotes the evolution of influenza viruses [2]. Because the number of viral sequences is continuously increasing, investigators have developed computational methods to recognize and verify interspecies transmission candidate determinants at the sequence level, despite the absence of specific knowledge on the antigenic properties of the viruses being investigated. For example, a number of large-scale phylogenetic and sequence alignment analyses have suggested that the IAV ribonucleoprotein (RNP) genes have evolved into divergent host-associated lineages, and that selected amino acids at specific positions of each internal protein are characteristic of the species origin for the sequences [3][4][5][6].
The successful establishment of an influenza virus in a new host is rare because it is a multistep process that requires the efficient and effective transmission, replication, and adaptation of the virus. However, pandemics caused by widely circulating viruses with the potential to transmit to humans remain a threat [2]. The emergence and spread of novel IAV remain of major global concern; therefore, increased understanding of the host range is essential to maintain the efficacy of antiviral drugs and influenza vaccines. In addition to the analysis of the molecular mechanisms underlying host specificity, using in vitro systems and reverse genetics of influenza viruses, the analysis of a considerable amount of available viral sequence data provides a cost-effective approach for the identification of host-associated genomic signatures as hostrange determinants.
In this study, we proposed an alternative measure for the evaluation of the host-specific characteristic sites in the IAV based on the adjusted Rand index (ARI) [7], and produced a novel catalogue of host-specific genomic signatures from the viral sequence data as of February 2013 in the National Center for Biotechnology Information (NCBI). In comparison with the sites identified from the same sequence data using measures such as entropy and mutual information (MI), the sites we identified have higher species specificity for the determination of the host range. Genomic signatures can change because of point mutations or interspecies reassortments; therefore, we also performed a chronological analysis of the genomic signatures. We divided the IAV data into chronological groups according to the time of their discovery, and identified the genomic signatures in each of the groups. These signatures were host-specific and time-specific. We analyzed the transitions of these signatures across various periods to evaluate the adaptation trends, and successfully identified several adaptation trends that correlated with the results by related studies. We also identified additional adaptation patterns that warrant further investigation.

Materials and Methods
We show the process flow of this study's approach for hostspecific genomic signature identification and chronological analysis in Figure 1. All available IAV sequences in the NCBI in February, 2013 were downloaded. The data were postprocessed by removing the redundant sequences and sequences missing quality annotations. The HA and NA proteins were excluded because of their genetic diversity, which impedes the production of satisfactory alignments. The influenza proteins were classified into 3 groups according to the type of host: avian, human, and swine ( Table 1). The multiple sequence alignment tools ClustalW [8] and MUSCLE [9] were applied to the IAV protein sequences, and the alignments were then analyzed to identify the characteristic sites as potential signatures to distinguish different host-restricted IAV proteins.
Imbalance in the number of sequences can affect the identification of genomic signatures. To alleviate bias, two random sampling strategies were adopted to balance the data size in an alignment: undersampling and oversampling. For example, the number of avian PA records is 4052, which is substantially larger than 2698, the number of human PA records. In an alignment of all sequences, avian PA clearly dominates human PA, which could bias the calculation of the   [11]. A genome-wide association analysis by Miotto et al. applied MI to identify the characteristic sites in avian and human strains of the IAV [12]. Using methods based on phylogenetic models, Tamuri et al. identified amino acid sites with strong support of the selection constraints in avian and human viruses [13]. Entropy measures the degree of uncer-tainty of a variable, whereas MI examines the strength of an association between two variables. Our study proposed the use of the adjusted Rand index (ARI) [7], an extension of the Rand index [14], for the evaluation of the ability of characteristic sites to distinguish between different hosts.
A higher ARI value indicates greater agreement between the 2 partitions. If P is compared to the partition of the IAV protein sequences according to the host (e.g., avian vs. human), and Q is compared to the partition based on the amino acids in a particular column of the protein sequence alignment, a column with a high ARI value is a characteristic site of the host-specific protein sequences. Using 11 sequences for illustration, an alignment was partitioned into 2 subsets according to the host ( Table 2). This partition was termed P = {(s 1 , s 2 , s 3 , s 4 , s 5 , s 6 ), (s 7 , s 8 , s 9 , s 10 , s 11 )}. Based on the amino acids in Site 1, the sequences were partitioned into 4 subsets, denoted by Q1 = {(s 1 ), (s 2 , s 3 , s 4 , s 5 , s 6 ), (s 7 , s 8 , s 9 ), (s 10 , s 11 )}. Similarly, based on the amino acids in Site 2, the sequences were partitioned into 2 subsets, denoted by Q2 = {(s 1 , s 7 , s 8 , s 9 ), (s 2 , s 3 , s 4 , s 5 , s 6 , s 10 , s 11 )}. The first 2 subsets of Q1 are exclusively avian, and the other 2 subsets are exclusively human, whereas the subsets of Q2 are both mixed avain and human. Therefore partition Q1 is more specific to the hosts than Q2. The ARI between P and Q1 was 0.58, and the ARI between P and Q2 was 0.13, which reflects the fact that Site 1 is preferable to Site 2. A study by Milligan and Cooper evaluated several different indices for the measurement of the agreement between 2 partitions, and recommended the ARI [15]. Therefore, in this study, the ARI was adopted for the evaluation of the host-specific characteristic sites in the IAV.
To investigate the adaptation trends, the IAV sequences were partitioned into chronological groups according to the time of their discovery: 1902-1918, 1919-1957, 1958-1968, 1969-1977, 1978-2009, and 2010-2013. The IAV sequences identified during the different periods were aligned separately and balanced between different hosts using the described random sampling techniques. For each column (i.e., a site) in an alignment, the average ARI was calculated to measure its association with the host, and the characteristic sites were identified. The characteristic sites identified from different chronological groups were then compared to further investigate 2 types of adaptation trend: validity and identity.
Validity refers to a genomic signature identified in one period becoming a nonsignature in subsequent periods, or vice versa. A site is considered ''valid'' within a period if it is a hostspecific signature within that period. When a valid site is no longer a signature within another period, it becomes an ''invalid'' site. An invalid site in one period can become valid within a different period if it is a host-specific site during that period. The purpose of the validity analysis was to examine the sites for changes in validity, caused by amino acid substitutions, over time.    Identity refers to the changes, or absence of changes, in the amino acid residues of a site that remains constantly valid. For example, in the 2009 H1N1 pandemic strains, one amino acid at position NP-100 mutated from V during the preepidemic period to I during the late period [16]. The identity analysis enabled the monitoring of the amino acid residue transitions on the characteristic sites over time.

Identification of the Host-specific Genomic Signatures
In this study, we aligned the human IAV protein sequences in the NCBI between 1902 and 2013 with the avian and swine IAV protein sequences using the MUSCLE software [9]. After balancing the size differences between the different host groups using random sampling (see ''Materials and Methods''), we calculated the average ARI for each site in the alignments.
In our analyses of genomic signatures, based on the ARI, we considered the top 20 sites, and selected only the higherranked sites that differed in their dominant amino acids as genomic signatures. We identified 129 avian-human and 77 swine-human genomic signatures in the internal proteins PB1, PB2, PA, NP, M1, M2, NS1, NS2, and PB1-F2, an alternative protein product of PB1. Table 3 shows the numbers of signatures in each protein. Tables 4 and 5 show the top-ranked characteristic sites with discriminating amino acid residues. We compared these signatures with those reported previously [10,12,13,16], as shown in Tables 6 and 7. Several previously reported characteristic sites were unable to distinguish the protein sequences of different species in the more recent data used in our study. However, we discovered some novel genomic signatures. For example, in the NP protein, we eliminated positions 375 and 423 as avian-human signatures, but identified novel signatures at positions 351 and 353. In the swine group, we identified NP-16, 283, and 313 as novel swinehuman signatures. These sites have been identified to be associated with a barrier against the zoonotic introduction of IAV into the human population. Previous experimental results indicated that adaptive mutations at these sites in the NP of the 1918 and 2009 pandemic strains could have contributed to increased resistance in murine Mx1 and human MxA [17].  [12], we identified that the NP (20 sites), PB2 (20 sites), and PA (15 sites) were among the proteins that had the highest numbers of signatures. These proteins, together with the PB1 polymerase, form the RNP complex that encloses the genomic segments in the virion. Our analyses showed consistent results that most of the avian- Positions were sorted in the descending order of ARI. b We showed only the dominant amino acid residues with more than 20% conservation. doi:10.1371/journal.pone.0084638.t004 human PB2 characteristic sites were located in the PB1 and NP binding areas. We observed similar results in the swine-human PB2 signatures ( Figure 2). Unlike Miotto et al., who identified a single avian-human characteristic site (PB1-336) in PB1, we identified 14 sites, including PB1-336 ( Figure 3). One notable signature we identified, which previous computational methods failed to recognize [5,10,12,13,16], was PB1-375. Its amino acid is an S in most human viruses and an N in most avian viruses. Previous studies of human pandemics showed a cross-species amino acid substitution at this site, and suggested  an important role of PB1-375 in adaptation to mammals [6,18]. The number of characteristic sites mapping to a reported domain was smaller in the PA protein than in the PB2 or PB1. Previous studies have identified some of these sites as located in proximity to the epitopic regions [19,20], or in the proteolysis domain [21] and nuclear localization signal area [22], as shown in Figure 4. In the NP protein shown in Figure 5, most of the characteristic sites were involved in the PB2 interactions. A study by Mä nz et al. verified that the mutations at 305, 351, 353, and 357 affect Mx1 resistance [17]. Our findings were in accordance with previous functional analyses implicating PB2 as a putative target of Mx1 [23]. Based on their positions in the sequence, the genomic signatures of M1 could be divided into 2 groups. As shown in Figure 6, the group in proximity to position 126 was within the membrane-binding region [24], and the other was located in the RNP-binding region [25]. The M2 protein contains 3 clusters of signatures and a single outlier M2-28 in the transmembrane region [26], as shown in Figure 7. The cluster of signatures to the left of the M2-28 is within the M2 extracellular region (M2e) [26,27], the cluster of signatures to the right of the M2-28 represents part of an amphipathic helix [28], and the final cluster of signatures is located in the M2 protein tail, which reportedly interacts with the M1 protein [29]. Most of the signatures of NS1 were located in the RNA binding domain [30,31], the eIF4GI subunit-binding domain [32,33], or the epitopic regions [19,34], as shown in Figure 8. Some of the signatures could be mapped to multiple domains, such as the NS1-215 and 227. By comparing the signatures identified in NS2 with the reported domains, we identified that all of the signatures constituted part of the M1 binding domain or the epitope (Figure 9). The genomic signatures of PB1-F2 could be divided into two parts by the position 42, as shown in Figure 10. The group of signatures to the right of PB1-F2-42 were mapped to the mitochondria targeting domain [35]. Among them we also identified PB1-F2-62, 66 and 70, which were part of the H-2 D b binding peptide [36]. The other group of signatures were mostly clustered at positions 23-31. Further investigation is required to elucidate their their roles in polymerase activity. Table S1 in Materials S1 shows the reported domains in each internal protein.

Identification of the Chronological Host-specific Genomic Signature
Genomic signatures can change over time because of point mutations or interspecies reassortments. According to the years in which previous human pandemics occurred, we divided the time between 1902 and 2013 into 6 periods: 1902-1918, 1919-1957, 1958-1968, 1969-1977, 1978-2009, and 2010-2013. We assigned each IAV to one of the 6 chronological groups according to its year of discovery. From each chronological group, we identified the chronological genomic signatures that were characteristic of the hosts and specific to that period. Table 8 shows the number of signatures in each period for the internal proteins.
As shown in Table 8, the PB2, PA, and NP proteins had the largest average numbers of avian-human chronological genomic signatures. These findings were consistent with the results from genomic signature analyses during 1902-2013, as shown in Table 3. When examining the numbers of signatures across all periods, we observed that the numbers of chronological signatures in the PB2, PA, and NP proteins were relatively stable, except during 1902-1918. A stable number of signatures over time suggested that the PB2, PA, and NP share similar evolutionary pathways, and a large number of characteristic sites indicated that they undergo rigorous multigenic adaptation to a new host. These findings supported the hypothesis that the coevolution of RNP proteins is a crucial factor that restrains the genomic segments from forming interspecies reassortants, and limits the evolutionary divergence between host-specific lineages [2]. As shown in Table 8, we observed that the average swine-human chronological genomic signatures in the RNP proteins were among the largest, excluding NS1. Unlike the avian-human chronological signatures, the numbers markedly reduced during 1978-2009 and 2010-2013. These findings suggested that the sequencelevel genetic differences in the RNP proteins between swine and human viruses might have reduced in recent years. We further observed that the number of PB1 chronological    Table S2 in Materials S1, and Figure S1  Our study's analyses further showed that the number of M1 avian-human signatures remained relatively constant: between 4 and 6 throughout all periods. Although the numbers of signatures remained relatively stable during each period, the signatures identified in each period varied in their positions and amino acid residues (see details of the avian-human chronological genomic signatures of M1 in Table S3 in Materials S1, and Figure S2

Analysis of Host-specific Genomic Signature Transitions
To further investigate the chronological genomic signatures, we analyzed the transitions of the amino acid residues of the signatures in each internal protein across different periods. In the validity analyses, we examined the changing roles of the characteristic sites (signature or nonsignature) during different periods. These results provided information on the relationships between amino acid substitutions and host range phenotypes, and could increase our understanding of the effects of genetic diversity on the adaptation of the IAV. In the identity analyses, we examined the characteristic sites that remained valid throughout all periods for changes in their amino acid residues over time. The details of the transitions of the amino acid residues on the characteristic sites in each internal protein during the 6 periods are given in Tables S4 and S5 (in Materials S1). The various amino acid transitions in the characteristic sites might indicate the differences in the pathogenic and adaptive mechanisms of the IAV during different periods.
We identified a distinct amino acid transition pattern in PB2-590 and 591. Both signatures became valid after 2009, following a G590S and a Q591R mutation, respectively (Table  S4 in Materials S1). The amino acid transitions in PB2-627 showed a contrasting pattern: PB2-627 maintained a valid signature from 1902 to early 2009, but after 2009 the dominant amino acid of PB2-627 in the human strain changed from K to E, as in the avian strain. Previous studies have conducted biochemical and modeling experiments to investigate the adaptive strategies of influenza viruses to evade restriction in hosts [40][41][42][43]. Our results on signature transitions supported the findings by Mehle et al. on the SR polymorphism that enables glutamic acid at position 627 to evade restriction in human cells [41]. These observations suggested that signature transition analysis can be applied as a preprocess to identify prospective functional sites in influenza viral proteins prior to further biochemical investigations. We also observed that the PB2-54, 65, 147, 184, 225, 315, 340, 559, and 645 showed similar transition patterns to those of the PB2-590 and 591 (Table S4 in Materials S1). Most of these sites are located in the NP binding domains (Table S1 in Materials S1, and Figure 2). In addition to the PB2-627, which switched from a signature to a nonsignature during 2010-2013, the PB2-199, 475, and 567 showed a similar signatureto-nonsignature tendency. These sites are involved in NP binding, nuclear localization signaling, and RNA cap binding. Further investigation is required to elucidate their roles in polymerase activity.
Most of the characteristic sites served as genomic signatures (i.e., valid sites) within specific periods. For example, M2-18 became a signature after 1958, and NS1-74 became a signature after 2010. Very few characteristic sites remained valid throughout all periods. The dominant amino acid residues on a characteristic site can vary across different periods or remain the same. For example, the dominant amino acids of NS1-67 in the avian and human IAV varied throughout different periods, whereas the dominant amino acids of M2-18 remained the same in the human virus and varied in the avian virus. In the PB2 protein, we identified one characteristic site, PB2-271, as a valid signature specific to avian and human IAV across all periods. Its amino acid transitions were a mutation from A to T in avian IAV and from T to A in human IAV during 1902-1918, as shown in Table S4 in Materials S1. In the signature analyses comparing swine and human IAV, the dominant amino acids of the PB2-271 in the swine virus switched from T to A during 1978-2009, and 2010-2013, as shown in Table S5 in Materials S1. A study by Kendra et al. showed that the mutation T271A in PB2 increases polymerase activity and virus growth in human cells. Results from in vitro reporter gene and sequence analyses indicated that the PB2-271A in S-OIV was likely to have contributed to its efficient transmission among humans during the 2009 H1N1 pandemic [44]. Our findings on the PB2-T271A in the swine virus further supported this hypothesis. Although a previous study using phylogenetic modeling [13] excluded the PB2-271 as a characteristic site, our analysis of the chronological genomic signature transitions successfully identified its relevance in host adaptation. In addition to the 271A mutation, the authors identified another PB2 mutation A588I that increased polymerase activity in mammalian cells [44]. Our results support the occurrence of an adaptive mutation of the conserved residues from A to I at site 588 in the human virus (Table S4 in Materials S1). Although PB2-588 had a conserved amino acid A in the avian virus throughout all periods, it showed a transition from A to T in the swine virus during 1978-2009 and 2010-2013. However, in the human virus, the PB2-588 showed an early transition from A to I, which later changed to T, as in the swine virus (Tables S4, S5 in Materials S1). Further investigation is required to establish if the change from I to T in the human virus reduced polymerase activity in human cells and contributed to the end of 2009 H1N1 pandemic. In addition to the PB2-271, the NP-100, 136, and 313 remained genomic signatures throughout all periods (Table S4 in Materials S1). Like PB2-271, their dominant amino acids showed chronological changes, but only in the human virus. Our identity analyses showed that only NP-33, 357, and M2-14 remained signatures throughout all periods and also maintained the same dominant amino acids (Table S4 in Materials S1). The stabilities of these sites indicate a crucial association with host range phenotypes and pathogenic mechanisms, which requires further verification.

Discussion
The availability of a considerable amount of data on the IAV has enabled computational approaches to identify amino acid residues as host-specific genomic signatures. Previous studies have performed large-scale complete-proteome analyses of the IAV sequences [10,12,16]. Our study used more recent sequence data from the NCBI database (in February, 2013) than those studies. Unlike earlier computational methods that relied on a threshold to discriminate signatures from nonsignatures, such as the MI threshold of 0.4 in a study Miotto et al. [12], and the entropy threshold of 0.33 in a study by Chen and Shih [10], we used the ARI to evaluate and compare the ability of each site in the IAV sequences for the distinguishing of a host. Tamuri et al. speculated that the verification of characteristic sites in different viral proteins based on one single threshold is questionable because different viral proteins evolve according to different selective constraints [13]. The appropriate threshold might differ in different viral proteins because of their distinct characteristics and changes they might undergo during various circumstances. In addition, a threshold requires adjustment after novel data becomes available. For example, Chen and Shih changed their entropy threshold from 0.4 to 0.33 after an increase in the number of the identified IAV protein sequences [10].
For comparison, we calculated the entropy and the MI for each site. Low entropy in a site indicates that its amino acid residues are well-conserved, and thus the site is likely to represent a candidate genomic signature [10]. However, an approach that identifies signatures based on low entropy can overlook potential characteristic sites. For example, in the sequences in the NCBI from 1902 to 2013, the amino acid residues at position PB2-588 are dominated by A (95%) in avian viral sequences. However, in human viral sequences, the identical position is relatively equally dominated by I (57%) and T (42%). Although Chen and Shih considered PB2-588 in the sequences as of May 28, 2009 in the NCBI to be a genomic signature [10], the entropy-based method could mistakenly eliminate PB2-588 as a characteristic site from more recent sequence data because of the high entropy of PB2-588 in the human strain. We could have included PB2-588 as a signature by ignoring the entropy constraint, but this would have incurred an increase in the false-positive rate. In contrast, PB2-588 could be considered a signature because of its high MI and AR values. Because MI and ARI are different measures, instead of comparing their values to determine which measure is more effective in signature evaluation, we compared the rankings of the sites, and their conserved amino acid residues, according to MI and ARI. Table S6 in Materials S1 shows the top 20 MI-and ARI-ranked sites in avian and human PB2 proteins. PB2-645 and 591, the 19th-and 20thranked sites according to MI, have the same dominant amino acid in avian and human proteins. Though PB2-81, the 12thranked site, has different dominant residues between avian and human (T vs. M), the conservation levels of T and M in human are almost the same (M = 42.98% vs. T = 42.77%). Therefore, none of these sites is an appropriate genomic signature. In contrast, all the top 20 ARI-ranked sites showed differences in the dominant amino acid residues. We observed similar trends in other internal proteins. Overall, these findings suggested that the ARI provides a more appropriate measure for the ranking of characteristic sites when compared with MI.
Our experimental results showed that the ARI provides a more effective measure for detecting host-associated characteristic sites than entropy or MI. Using the IAV data in the NCBI, we identified novel signatures in 9 internal viral proteins that previous approaches failed to recognize. Several of the signatures could be mapped to known structural, functional, or antigenic domains of the proteins, which suggested their molecular functions and indicated the value of our approach.
Point mutations or interspecies reassortments can change the genomic signatures in viral sequences. Some previous studies analyzed adaptation trends in the previously identified genomic signatures without considering their phylogenetic relationships [10,12,16]. Other studies considered the phylogenetic structures, but applied theoretical modeling of site substitution rates [13]. In addition to providing updated data on the host-specific genomic signatures (Tables 4 and 5), our study analyzed the genomic signatures in their chronological order. We initially grouped the protein data chronologically, identified the signatures from each separate group, and then analyzed their variations. Therefore, unlike previous studies' methods, our approach included the identification and analysis of sequence signatures and their chronological relationships. For example, from the chronological analysis of avian-human signatures, we detected a difference in the transition pattern in PB1 when compared with the other internal proteins: a comparatively larger number of characteristic sites had the same (or similar) dominant amino acid residues in the avian and the human viruses during 1919-1957 and 1958-1968, and the dominant residues varied after 1968 mostly in the human virus, but rarely in avian (Table S4 in Materials S1). These observations supported the reassortment hypothesis that PB1 gene was introduced from avian to human prior to the 1957 pandemic, and was maintained in human until 1968 [18,45]. In addition, the chronological analysis of swine-human signature transitions during 1978-2009 further showed that there were relatively more characteristic sites with the same dominant residues in PB1 than in the other internal proteins (Table S5 in Materials S1), and the number of signatures dropped from 20 to 3, the minimum across all periods (Table 8). These observations indicated that the genetic variability in PB1 between swine and human was minimal during 1978-2009. Several studies into the lineages and evolutionary genomics of the 2009 S-OIV suggested that the PB1 of S-OIV emerged from a triple-reassortment virus circulating in North American swine, and the PB1 gene in the source triple-reassortant was derived from human at the time of the triple reassortment events in 1998 [45][46][47]. Our findings of the chronological signature transition patterns in PB1 were in accordance with the reassortment history of the S-OIV. Further investigation is required to identify other chronological transition patterns of the other internal proteins, and to verify their relations to the historical reassortment events.
Chronological genomic signatures provide the basis for novel types of investigation into the multiple genetic determinants of a host range. The numbers of chronological signatures during different periods and their variance can correlate with the level of rigorousness of multigenic adaptation of influenza viruses to a new host. A larger number of signatures indicates greater difficulty in the transmission and adaptation of a viral protein to a new species, whereas larger variance in the number of signatures across different periods suggests a wide variety of amino acid residues switching from signature to a nonsignature roles or vice versa. For example, the larger number of chronological genomic signatures in the PB2, PB1, PA, and NP proteins (Table 8) explains the occasional, but rare, transmission or adaptation of avian influenza viruses to humans [2]. The fluctuations in the ARI value for a signature across different periods indicate its genetic variability, through mutations or reassortment events, with time. Signatures with similar patterns in their ARI values are likely to be involved in related molecular functions and activities. Our analyses suggested correspondence between the ARI patterns (NS1-23 and 98) and increased viral growth [39].
The changes in the conserved amino acid residues of chronological genomic signatures throughout different periods indicate a chronological relationship between signature transitions and adaptation trends. Based on the chronological transitions of a signature, we can evaluate its stability according to validity and identity. Our results showed a unique transition pattern in the dominant amino acid residues of PB2-590 and 591, which was consistent with previous biochemical modeling results on the SR polymorphism [41]. We also identified other chronological signatures with similar amino acid transitions, such as the PB2-54, 65, and 147. According to the domains to which the chronological signatures are mapped, we were able to identify the variations in the domains during different periods. The chronological associations between the signatures and the mapped domains could provide alternative insight into previous findings on influenza virus evolution. Our analytical approach could serve as a preprocess to identify prospective characteristic sites that warrant further investigation.

Conclusion
In this study, we proposed an alternative measure, based on the ARI, for the evaluation of the abilities of the genomic signatures of the IAV to distinguish the host. Using the data in the NCBI (in February, 2013), we identified 129 avian-human and 77 swine-human genomic signatures, including novel signatures that previous methods failed to recognize. Several of these novel signatures could be mapped to known domains to show the biological significance of the novel signatures, and indicate the value of the ARI in the evaluations. These novel signatures could potentially increase our understanding of genetic determinants and their potential combinations involved in host restriction. To chronologically analyze the genomic signatures, we divided the virus data into chronological groups, and then identified the genomic signatures from these groups. A comprehensive analysis of the chronological signatures throughout different periods indicated adaptation trends that were consistent with previously published results. Our chronological approach considers the underlying phylogenetic relationships of genomic signatures, and can identify adaptation trends more accurately than existing approaches  Figure S1 The ARI of NS1 chronological signatures in each period. The X-axis shows the periods; the Y-axis represents the ARI. Several signatures show similar ARI transition patterns over the periods, such as NS1-23, 56, 98, 112, and 119. Materials S1 Supporting information of host-specific genomic signatures and transitions of characteristic sites. Table S1: Catalogue of reported domains in 8 internal proteins. Table S2: NS1's chronological genomic signatures identified in 6 periods and their amino acid residues. Table S3: M1's chronological genomic signatures identified in 6 periods and their amino acid residues. Table S4: Transitions of amino acid residues on avian-human characteristic sites. Table S5: Transitions of amino acid residues on swine-human characteristic sites.