Figures
Abstract
Influenza A virus genomes are composed of eight negative sense RNAs. In total, 16 proteins encoded by eight positive sense RNAs were identified. One putative protein coding sequence (PCS) encoded by genomic strand RNA of segment 8 has been previously proposed. In this study, 95,608, 123,965 and 35,699 genomic strand RNA sequences from influenza A viruses from avian, human and mammalian hosts, respectively, were used to identify PCSs encoded by the genomic strand RNAs. In total, 326,069 PCSs with lengths equal to or longer than 80 amino acids were identified and clustered into 270 PCS groups. Twenty of the 270 PCS groups which have greater than 10% proportion in influenza A viruses from avian, human or mammalian hosts were selected for detailed study. Maps of the 20 PCSGs in the influenza A virus genomes were constructed. The proportions of the 20 PCSGs in influenza A viruses from different hosts and serotypes were analyzed. One secretory and five membrane proteins predicted from the PCS groups encoded by genomic strand RNAs of segments 1, 2, 4, 6, 7 and 8 were identified. These results suggest the possibility of the ambisense nature of the influenza A virus genomic RNAs and a potential coding sequence reservoir encoding potential pan proteomes of influenza A viruses.
Citation: Yang C-W, Chen M-F (2016) Uncovering the Potential Pan Proteomes Encoded by Genomic Strand RNAs of Influenza A Viruses. PLoS ONE 11(1): e0146936. https://doi.org/10.1371/journal.pone.0146936
Editor: John Stambas, Deakin University, AUSTRALIA
Received: July 26, 2015; Accepted: December 22, 2015; Published: January 13, 2016
Copyright: © 2016 Yang, Chen. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All influenza virus genomic sequence data used in this study are available from the NCBI Influenza Database (ftp://ftp.ncbi.nih.gov/genomes/INFLUENZA/). Additional data are available in the Supporting Information files.
Funding: The authors received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Influenza A virus (IAV) genomes are composed of eight negative (genomic) sense RNAs [1,2]. Currently, 16 proteins encoded by eight positive sense RNAs have been identified. Three proteins (PB1, PB1-F2 and N40) encoded by the positive sense RNA of segment 2 start at the 1st, 4th and 5th AUG, respectively [3,4,5]. Four proteins (PA, PA-X, PA-N155 and PA-N182) are encoded by the positive sense RNA of segment 3. The PA-X protein is a ribosomal frame-shifting product composed of the N-terminal domain of the PA protein (191 amino acids) and a short C-terminal domain (61 amino acids) that results from a +1 frameshift of the PA open reading frame (ORF) [6,7]. The PA-N155 and PA-N182 proteins are translated from the 11th and 13th in-frame AUGs in the PA ORF and are, therefore, N-terminally truncated forms of PA protein [8]. Three proteins (M1, M2 and M42) are encoded by the positive sense RNA of segment 7. An alternatively spliced mRNA encodes an M2 variant, called M42, which functionally complements M2 in vitro and in vivo [9]. The genomic map of these ORFs is summarized in S1 Fig. In addition to the proteins encoded by eight positive sense RNAs, a hypothetical protein sequence encoded by the genomic strand RNA of segment 8 was proposed [10,11,12]. These studies raise the possibility regarding the coding potential of the eight genomic strand RNAs of IAVs.
In this study, a large-scale in silico investigation was performed using the IAV genome sequences from the NCBI Influenza Database to identify putative protein coding sequences (PCSs) in the genomic strand RNAs of IAVs. In total, 270 PCS groups (PCSGs) composed of 326,069 PCSs with lengths equal to or longer than 80 amino acids were identified. Twenty of the 270 PCS groups with greater than 10% proportions in IAVs from avian, human or mammalian hosts (AIAV, HIAV and MIAV) were selected for further study.
Materials and Methods
Data Collection
In total, 322,235 IAV genome sequences were retrieved from the NCBI Influenza Database. After checking completeness by length and integrity of open reading frame in the positive strand, 255,273 IAV genomic strand RNA sequences were used. This data set includes 95,608, 123,965 and 35,699 genomic strand RNA sequences of IAVs from avian, human and mammalian hosts (AIAVs, HIAVs and MIAVs), respectively (S1 Table). A set of genomic strand RNA sequences from pandemic, outbreak or highly virulent HIAVs, including H1N1 1918 HIAV from the "Spanish Flu", H2N2 HIAV from the 1957 pandemic, H3N2 HIAV from the 1968 pandemic, H1N1 HIAV from the 1977 Russia outbreak, 2009 H1N1 HIAV from the “swine flu”, H5N1 HIAV from the 1997 Hong Kong outbreak, the 2004–2008 highly pathogenic H5N1 HIAVs from Vietnam, Indonesia and Thailand and the highly pathogenic H7N9 HIAV from China 2013 were used as models of special strains.
PCS Prediction, Sequence Clustering and Phylogenetic Analysis
The computer programs for data manipulation and putative protein coding sequence (PCS) prediction were written by the authors using the Perl programming language. In total, 326,069 PCSs with a length equal to or longer than 80 amino acids were identified from 255,273 IAV genomic strand RNAs. The PCSs predicted from each genome segment were clustered using the CD-HIT software [13]. After sequence clustering, the CD-HIT software chose a longest sequence as a representative sequence for each sequence cluster and computed the identities between the representative sequence and the other sequences within the sequence group. Protein sequence identity of 60% was used as the cutoff of the first run of sequence clustering because the protein sequences from these PCSs exhibit both sequence and length variations. After first run clustering, 270 PCS groups (PCSGs) were obtained. Twenty of the 270 PCSGs with greater than 10% proportion in at least one of AIAVs, HIAVs or MIAVs were chosen for further study. Each of the 20 PCSGs was divided by IAV types (AIAV, HIAV and MIAV) and serotypes (H1N1, H2N2, H3N2, H5N1, H7N9 and other). Each of the 20 PCSGs was used to perform the second run of sequence clustering based on 95% identity. Representative protein sequences of sub-clusters from each of the 20 PCSGs were used to perform phylogenetic analysis using the MEGA6 software [14]. Phylogenetic trees of the 20 PCSGs were built by the NJ method with bootstrap 1000 times. One-way ANOVA analyses of the sequence identity and length variations of 20 PCSGs among the AIAVs, HIAVs and MIAVs were performed using EXCEL.
Results
PCS groups identified in the genomic strands of IAV RNAs in silico
The genomic map of the 20 PCSGs (positions relative to the 10 proteins encoded by positive sense RNAs) is shown in Fig 1. PCSG y in the genomic strand RNA of IAV segment x is represented as Sx PCS Gy in the figures, tables and text in the remainder of the article. The eight ORFs (PB1, PB2, PA, HA, NP, NA, M1 and NS1) in the eight positive sense RNAs were used as reference reading frames (+1 reading frame) for PCSGs encoded by eight genomic strand RNAs. The length and location (start and end) of all PCSGs in the genomic strand RNAs are inconsistent between IAV genomes. Therefore, the genomic map in Fig 1 shows the regions covered by the shortest and longest PCSs of each PCSG in genomic strand RNAs. The length distributions of the 20 PCSGs in AIAVs, HIAVs and MIAVs are shown in Fig 2. S4 PCS G3 was not identified in AIAVs; therefore, the sequence length of S4 PCS G3 of AIAVs is 0 and ANOVA was not performed. Overall, two PCSGs (S5 PCS G2 and S7 PCS G3) have lower length variation (ANOVA p = 5.7E-8 and 3.28E-14) than the other 17 PCSGs (ANOVA p = 0). After sequence clustering, the CD-HIT software chose a longest sequence as a representative sequence for each sequence cluster and computed the identities between the representative sequence and the other sequences within the sequence group. The distributions of sequence identities for each PCSG in AIAVs, HIAVs and MIAVs are shown in Fig 3. S4 PCS G3 was not identified in AIAVs; therefore, the sequence identity of S4 PCS G3 of AIAVs is 0 and ANOVA was not performed. Overall, only three PCSGs (S4 PCS G1, S4 PCS G2 and S6 PCS G2) have lower identity variation (ANOVA p = 1.3E-289, 3.6E-287 and 7.6E-273) than the other 16 PCSGs (ANOVA p = 0).
Blue arrows indicate open reading frames encoded by eight positive sense RNAs of influenza A viruses. Red arrows indicate putative protein coding sequences encoded by eight genomic strand RNAs of influenza A viruses.
The y-axis indicates the length of the putative protein coding sequences. AIAV, HIAV and MIAV indicate influenza A viruses from avian, human and mammalian hosts, respectively. One-way ANOVA was performed and p values were used to indicate the differences between the putative protein coding sequence lengths among influenza A viruses from different hosts.
The y-axis indicates the sequence identity of putative protein coding sequences. AIAV, HIAV and MIAV indicate influenza A viruses from avian, human and mammalian hosts, respectively. One-way ANOVA was performed and p values were used to indicate the differences between the putative protein coding sequence identities among influenza A viruses from different hosts.
Phylogenetic analysis of the 20 PCSGs
The number of sequences in the 20 PCSGs is listed in S2 Table. Because a large amount of sequences cannot be used to perform phylogenetic analysis, protein sequences in each of the 20 PCSGs were further clustered into subgroups based on 95% sequence identity. The number of sequence clusters in the 20 PCSGs is listed in S2 Table. Representative sequences of the subgroups were used to perform multiple sequence alignment followed by phylogenetic analysis. Twenty phylogenetic trees of the corresponding 20 PCSGs are shown in S2 Fig. S7 PCS G2 exhibits the highest sequence to cluster ratio (14945:32), which indicates that this PCSG has the lowest sequence diversity. In contrast, S5 PCS G3 exhibits the lowest sequence to cluster ratio (7169:604), suggesting that this PCSG has the highest sequence diversity.
Evolving histories of the 20 PCSGs
The earliest and latest years of the sequences recorded in the NCBI Influenza Virus database for the PCSs of the 20 PCSGs are shown in Figs 4–11. Eight PCSGs (S1 PCS G1, S1 PCS G2, S2 PCS G2, S3 PCS G1, S3 PCS G2, S5 PCS G1, S5 PCS G3 and S8 PCS G1) were present in the records of AIAVs as early as 1902. Four of the eight PCSGs (S1 PCS G1, S3 PCS G1, S5 PCS G1 and S8 PCS G1) were present in the records of HIAVs as early as 1918. Three PCSGs (S2 PCS G1, S7 PCS G1 and S7 PCS G2) were present in the records of HIAVs as early as 1918 but appeared in the records of AIAVs as late as 1949 and 1965. In contrast, S4 PCS G2 was present in the records of AIAVs and HIAVs as late as 2003 and 2007, respectively. S4 PCS G3 was present in the records of MIAVs as late as 2003. Most of PCSGs appeared in the records from 1930 to 1980. Once they appeared, all of the 20 PCSGs were continuously identified in at least one of the AIAVs, HIAVs or MIAVs till 2013 or 2014.
Blue arrows indicate open reading frames encoded by segment 1 positive sense RNAs of influenza A viruses. Red arrows indicate putative protein coding sequences encoded by segment 1 genomic strand RNAs of influenza A viruses. AIAV, HIAV and MIAV indicate influenza A viruses from avian, human and mammalian hosts, respectively. The “HxNy” in the pie chart indicates the proportion of segment 1 genomic strand RNAs with putative protein coding sequences that belong to H1N1, H2N2, H3N2, H5N1 or H7N9 serotypes. The “Other” in the pie chart indicates the proportion of segment 1 genomic strand RNAs with putative protein coding sequences that do not belong to the H1N1, H2N2, H3N2, H5N1 or H7N9 serotypes. The “Without” in the pie chart indicates the proportion of segment 1 genomic strand RNAs without a putative protein coding sequence.
Blue arrows indicate open reading frames encoded by segment 2 positive sense RNAs of influenza A viruses. Red arrows indicate putative protein coding sequences encoded by segment 2 genomic strand RNAs of influenza A viruses. AIAV, HIAV and MIAV indicate influenza A viruses from avian, human and mammalian hosts, respectively. The “HxNy” in the pie chart indicates the proportion of segment 2 genomic strand RNAs with putative protein coding sequences that belong to H1N1, H2N2, H3N2, H5N1 or H7N9 serotypes. The “Other” in the pie chart indicates the proportion of segment 2 genomic strand RNAs with putative protein coding sequences that do not belong to the H1N1, H2N2, H3N2, H5N1 or H7N9 serotypes. The “Without” in the pie chart indicates the proportion of segment 2 genomic strand RNAs without a putative protein coding sequence.
Blue arrows indicate open reading frames encoded by segment 3 positive sense RNAs of influenza A viruses. Red arrows indicate putative protein coding sequences encoded by segment 3 genomic strand RNAs of influenza A viruses. AIAV, HIAV and MIAV indicate influenza A viruses from avian, human and mammalian hosts, respectively. The “HxNy” in the pie chart indicates the proportion of segment 3 genomic strand RNAs with putative protein coding sequences that belong to H1N1, H2N2, H3N2, H5N1 or H7N9 serotypes. The “Other” the in pie chart indicates the proportion of segment 3 genomic strand RNAs with putative protein coding sequences that do not belong to the H1N1, H2N2, H3N2, H5N1 or H7N9 serotypes. The “Without” in the pie chart indicates the proportion of segment 3 genomic strand RNAs without a putative protein coding sequence.
Blue arrows indicate open reading frames encoded by segment 4 positive sense RNAs of influenza A viruses. Red arrows indicate putative protein coding sequences encoded by segment 4 genomic strand RNAs of influenza A viruses. AIAV, HIAV and MIAV indicate influenza A viruses from avian, human and mammalian hosts, respectively. The “HxNy” in the pie chart indicates the proportion of segment 4 genomic strand RNAs with putative protein coding sequences that belong to H1N1, H2N2, H3N2, H5N1 or H7N9 serotypes. The “Other” in the pie chart indicates the proportion of segment 4 genomic strand RNAs with putative protein coding sequences that do not belong to the H1N1, H2N2, H3N2, H5N1 or H7N9 serotypes. The “Without” in the pie chart indicates the proportion of segment 4 genomic strand RNAs without a putative protein coding sequence.
Blue arrows indicate open reading frames encoded by segment 5 positive sense RNAs of influenza A viruses. Red arrows indicate putative protein coding sequences encoded by segment 5 genomic strand RNAs of influenza A viruses. AIAV, HIAV and MIAV indicate influenza A viruses from avian, human and mammalian hosts, respectively. The “HxNy” in the pie chart indicates the proportion of segment 5 genomic strand RNAs with putative protein coding sequences that belong to H1N1, H2N2, H3N2, H5N1 or H7N9 serotypes. The “Other” in the pie chart indicates the proportion of segment 5 genomic strand RNAs with putative protein coding sequences that do not belong to the H1N1, H2N2, H3N2, H5N1 or H7N9 serotypes. The “Without” in the pie chart indicates the proportion of segment 5 genomic strand RNAs without a putative protein coding sequence.
Blue arrows indicate open reading frames encoded by segment 6 positive sense RNAs of influenza A viruses. Red arrows indicate putative protein coding sequences encoded by segment 6 genomic strand RNAs of influenza A viruses. AIAV, HIAV and MIAV indicate influenza A viruses from avian, human and mammalian hosts, respectively. The “HxNy” in the pie chart indicates the proportion of segment 6 genomic strand RNAs with putative protein coding sequences that belong to H1N1, H2N2, H3N2, H5N1 or H7N9 serotypes. The “Other” in the pie chart indicates the proportion of segment 6 genomic strand RNAs with putative protein coding sequences that do not belong to the H1N1, H2N2, H3N2, H5N1 or H7N9 serotypes. The “Without” in the pie chart indicates the proportion of segment 6 genomic strand RNAs without a putative protein coding sequence.
Blue arrows indicate open reading frames encoded by segment 7 positive sense RNAs of influenza A viruses. Red arrows indicate putative protein coding sequences encoded by segment 7 genomic strand RNAs of influenza A viruses. AIAV, HIAV and MIAV indicate influenza A viruses from avian, human and mammalian hosts, respectively. The “HxNy” in the pie chart indicates the proportion of segment 7 genomic strand RNAs with putative protein coding sequences that belong to H1N1, H2N2, H3N2, H5N1 or H7N9 serotypes. The “Other” in the pie chart indicates the proportion of segment 7 genomic strand RNAs with putative protein coding sequences that do not belong to the H1N1, H2N2, H3N2, H5N1 or H7N9 serotypes. The “Without” in the pie chart indicates the proportion of segment 7 genomic strand RNAs without a putative protein coding sequence.
Blue arrows indicate open reading frames encoded by segment 8 positive sense RNAs of influenza A viruses. Red arrows indicate putative protein coding sequences encoded by segment 8 genomic strand RNAs of influenza A viruses. AIAV, HIAV and MIAV indicate influenza A viruses from avian, human and mammalian hosts, respectively. The “HxNy” in the pie chart indicates the proportion of segment 8 genomic strand RNAs with putative protein coding sequences that belong to H1N1, H2N2, H3N2, H5N1 or H7N9 serotypes. The “Other” in the pie chart indicates the proportion of segment 8 genomic strand RNAs with putative protein coding sequences that do not belong to the H1N1, H2N2, H3N2, H5N1 or H7N9 serotypes. The “Without” in the pie chart indicates the proportion of segment 8 genomic strand RNAs without a putative protein coding sequence.
Proportions of the 20 PCSGs among IAVs from different hosts and serotypes
The proportions of the 20 PCSGs in the AIAVs, HIAVs and MIAVs are shown in Figs 4–11. Eight types of proportion distributions are found. The first PCSG type (order of proportions: AIAVs ≅ MIAVs ≅ HIAVs) has high proportion for all AIAVs, MIAVs and HIAVs. For example, S1 PCS G1 has proportions of 99.8%, 99.99% and 99.5% in the segment 1 RNAs of AIAVs, HIAVs and MIAVs, respectively. Another example is S2 PCS G2, which has proportions of 99.99%, 99.99% and 99.1% in the segment 2 RNAs of AIAVs, HIAVs and MIAVs, respectively. The second PCSG type (order of proportions: AIAVs ≅ MIAVs ≅ HIAVs), composed of S3 PCS G1, has low proportions of 25.4%, 28.77% and 13.8% in the segment 3 RNAs of AIAVs, MIAVs and HIAVs, respectively. The third PCSG type (order of proportions: AIAVs < MIAVs > HIAVs) is composed of two PCSGs (S1 PCS G2 and S7 PCS G4). The fourth PCSG type (order of proportions: AIAVs > MIAVs < HIAVs) is composed of S7 PCS G2. The fifth PCSG type (order of proportions: AIAVs < MIAVs ≅ HIAVs) is composed of three PCSGs (S2 PCS G1, S4 PCS G1 and S7 PCS G3). The sixth PCSG type (order of proportions: AIAVs ≅ MIAVs < HIAVs) is composed of two PCSGs (S4 PCS G2 and S6 PCS G2). The seventh PCSG type (order of proportions: AIAVs > MIAVs ≅ HIAVs) is composed of two PCSGs (S5 PCS G3 and S7 PCS G1). The eighth PCSG type (order of proportions: AIAVs ≅ MIAVs > HIAVs) is composed of two PCSGs (S3 PCS G2 and S5 PCS G1).
The proportions of the 20 PCSGs in the IAVs of different serotypes from different hosts are shown in Figs 4–11. Five PCSGs (S1 PCS G2, S3 PCS G2, S3 PCS G3, S5 PCS G1 and S7 PCS G3) have higher proportions from H1N1 than other serotypes in the HIAVs. In contrast, five PCSGs (S4 PCS G1, S4 PCS G2, S6 PCS G2, S7 PCS G2 and S8 PCS G1) have higher proportions from H3N2 than other serotypes in the HIAVs.
PCSs encoded by the genomic RNAs of pandemic, outbreak and highly pathogenic IAVs
The PCSs encoded by the genomic strand RNAs of pandemic, outbreak and highly pathogenic IAVs from human hosts are listed in Table 1. The PCSs encoded by the genomic strand RNAs of H1N1 WSN33 and H1N1 PR8 HIAVs are also listed as reference IAV strains (frequently used in laboratory experiments). The genomic maps of these PCSs are shown in S3 Fig. Three PCSGs are worth noting. First, a 239-amino acid PCS belonging to S1 PCS G2 is present in segment 1 of H1N1 from the 2009 swine flu. Another 174-amino acid PCS belonging to S1 PCS G2 is present in segment 1 of the H5N1 HK 1997 and H7N9 2013 HIAVs. Second, a 109-amino acid PCS belonging to S3 PCS G2 is present in segment 3 of H1N1 1918 HIAV. Shorter forms of 95~98-amino acids PCSs belonging to S3 PCS G2 are also present in segment 3 of the H5N1 and H7N9 2013 HIAVs. Third, a 154-amino acid PCS belonging to S6 PCS G1 is present in segment 6 of the H5N1 HIAVs from Indonesia, Thailand and Vietnam. As shown in Figs 4, 6 and 9, the three PCSGs have higher proportions of H1N1 HIAVs than other serotypes. An additional PCSG which does not belong to the 20 PCSGs was found encoded in the segment 5 genomic strand RNAs of H5N1 Hong Kong 1997 (A/Hong Kong/156/97(H5N1)) (page 5 in S3 Fig). The proportions of this PCSG are 180/11165 (0.0161), 33/12806 (0.0026) and 48/3535 (0.0136) in AIAVs, HIAVs and MIAVs, respectively.
Protein function domains predicted in silico
Signal sequences (for the protein secretory pathway) and trans-membrane domains were predicted in six PCSGs (S1 PCS G1, S2 PCS G1, S4 PCS G2, S6 PCS G2, S7 PCS G4 and S8 PCS G1) in silico (Fig 12). A signal sequence (and no trans-membrane domain) was identified in S1 PCS G1, suggesting that S1 PCS G1 may be a secretory protein (Fig 12A). Trans-membrane domains (and no signal sequence) were identified in S2 PCS G1, S4 PCS G2, S6 PCS G2, S7 PCS G4 and S8 PCS G1 suggesting that these PCSGs may be encoded membrane proteins on organelle or plasma membranes (Fig 12B–12F). Except for the signal sequences and trans-membrane domains in the three PCSGs, no protein function domain was identified in the 20 PCSGs using the SMART Database and NCBI Conserved Domain Database.
The y-axis indicates the probability of prediction. An N-terminal sequence with probability between 0.6 and 0.7 may be interpreted as a signal sequence. An internal sequence with a probability greater than 0.7 may be interpreted as a trans-membrane domain.
Positional overlapping between critical amino acids in the proteins encoded by the positive strand RNAs and 20 PCSGs encoded by the genomic strand RNAs
Several amino acids in the proteins encoded by positive sense RNAs have been reported to be associated with virulence and host adaptation and can be used as genetic markers. Many of these amino acid sites overlap the 20 PCSGs identified in this study. Maps of the positional overlapping of the critical amino acids, which in the proteins encoded by positive sense RNAs are associated with virulence [18–26], with the 20 PCSGs (53 amino acid sites) are shown in Fig 13. Maps of the positional overlapping of critical amino acids, which in the proteins encoded by positive sense RNAs are associated with genetic markers and host adaptation [27–40], with the 20 PCSGs (198 amino acid sites) are shown in Fig 14.
Amino acid sites in this figure are summarize from reference 19–26.
Amino acid sites in this figure are summarize from reference 27–40.
Discussion
In this study, 20 PCSGs were proposed to be encoded by the genomic strand of IAV RNAs. If the prediction is true, it has several implications. The first and simplest consequence is that IAV genome segments are “ambisense”. Originally, the term “ambisense” was used to describe the coding strategies of arenaviruses (family Arenaviridae) and members of the Phlebovirus genus of the Bunyaviridae in that some proteins are encoded by viral-complementary RNA sequences and others are encoded by the viral RNA sequence [41]. In those cases, ORFs are not overlapped. In contrast, the 20 IAV PCSGs proposed in this study overlapped with ORFs encoded by positive sense RNAs. Overlapping coding sequences may undergo co-evolution in a sophisticated manner. Amino acids encoded by overlapping coding sequences are structurally, functionally, and co-evolutionarily constrained.
The distribution of the length of the 5'-UTR in human mRNA is between 100 to 500 bp [42]. As shown in Fig 1, eight PCSGs (S1 PCS G1, S2 PCS G1, S3 PCS G1, S5 PCS G1, S6 PCS G1, S7 PCS G1, S7 PCS G2 and S8 PCS G1) have a start codon near from the 5'-end (eg. shorter than 500 bp) of the IAV genomic strand RNAs. The PCSGs near the middle and 3'-end of the genomic strand RNAs may either need a mechanism for internal translation initiation for protein synthesis or have a very low efficiency of translation initiation and protein synthesis activities. Alternatively, they may form a reservoir of coding sequences. These potential coding sequences may provide additional protein motifs for coding sequences near the 5’-end, whereas frame shift mutations occur or new splicing sites are generated by random mutations.
The difference in the host adaption and virulence between IAVs may be derived from the different composition of the viral core and pan proteomes. The 10 well-studied viral proteins (PB2, PB1, PA, HA, NP, NA, M1, M2, NS1 and NS2) comprise the core proteome of IAVs. Sequence and length variations of IAV proteins in the core proteome among different virus strains may lead to function or activity diversity of proteins and differences in virulence among IAVs. For example, the C-terminal PDZ domain of the NS1 protein is associated with the virulence of IAVs. IAVs with C-terminal deletion of the NS1 protein exhibit relatively lower pathogenicity than IAVs harboring NS1 proteins with the C-terminal PDZ domain [23]. In contrast, based on the counts of CDSs from the NCBI Influenza A Virus Database, the frequencies of PB1-F2 protein are 1, 0.76 and 0.61 for AIAV, MIAV and HIAV, respectively. The PB1-F2 protein may be considered a protein belongs to the pan proteome of IAVs. The PB1-F2 protein was reported to exhibit contributions to IAV pathogenesis in mice [43,44,45]. The lower frequency of the PB1-F2 protein in MIAVs and HIAVs than in AIAVs may suggest a disadvantage of the protein for IAVs in mammalian and human hosts. Similarly, the frequencies of PCSGs among AIAVs, MIAVs and HIAVs suggest that the functions of proteins encoded by these PCSGs may be associated with host adaptation (Figs 4–11). For example, S1 PCS G1 has frequencies of 99.8%, 99.99% and 99.5% in the segment 1 genomic strand RNAs of AIAVs, HIAVs and MIAVs, respectively. The function of the protein encoded by this PCSG may have large advantages for viral replication or survival. S2 PCS G1 has frequencies of 24.9%, 82.4% and 76.2% in the segment 2 genomic strand RNAs of AIAVs, HIAVs and MIAVs, respectively. The function of the protein encoded by this PCSG may have more advantages for viral adaptation in mammalian and human hosts than in avian hosts. Similarly, the function of the protein encoded by S7 PCS G3 may also have greater advantages for viral adaptation in mammalian and human hosts than in avian hosts (Fig 10). Alternatively, the function of the protein encoded by S7 PCS G1 may have greater advantages for viral adaptation in avian hosts than in mammalian and human hosts (Fig 10). The contributions of the 20 PCSGs to viral evolution, host adaptation and pathogenicity are worth further investigation.
The sequence and length variations of the 20 PCSGs among different virus strains may also lead to function or activity diversity of proteins synthesized from these PCSGs. For instance, the protein encoded by S1 PCS G1 has a predicted signal sequence (approximately 21 amino acids in length) and may be a secretory protein. However, the H7N9 2013 HIAV has an N-terminal deletion (15 amino acids in length) in the S1 PCS 1 protein. Whether this deletion leads to the accumulation of S1 PCS 1 protein inside cells and results in cell damages is worth further investigation.
Conclusions
The results of this study suggest the possibility of the ambisense nature of IAV genomes. A potential reservoir encoding the pan proteome may exist in the genomic strand RNAs of IAVs. The composition variations of the pan proteome (such as with or without the PB1-F2 protein) among IAV strains may contribute to viral evolution, host adaptation and pathogenicity.
Supporting Information
S1 Fig. A current model of the influenza A virus core proteome.
https://doi.org/10.1371/journal.pone.0146936.s001
(PDF)
S2 Fig. Phylogenetic trees of representative sequences from 20 putative protein coding sequence groups encoded by genomic strand RNAs of the influenza A viruses.
https://doi.org/10.1371/journal.pone.0146936.s002
(PDF)
S3 Fig. Genomic maps of 20 putative protein coding sequences encoded by eight genomic strand RNAs of pandemic, outbreak and highly pathogenic HIAVs.
https://doi.org/10.1371/journal.pone.0146936.s003
(PDF)
S1 Table. Number of IAV genomic RNA sequences used in this study.
https://doi.org/10.1371/journal.pone.0146936.s004
(DOC)
S2 Table. Number of sequences and sub-clusters of the 20 putative protein coding sequence groups.
https://doi.org/10.1371/journal.pone.0146936.s005
(DOC)
Author Contributions
Conceived and designed the experiments: CWY. Analyzed the data: CWY MFC. Contributed reagents/materials/analysis tools: CWY. Wrote the paper: CWY.
References
- 1. Neumann G, Brownlee GG, Fodor E, Kawaoka Y. Orthomyxovirus replication, transcription, and polyadenylation. Curr Top Microbiol Immunol. 2004;283:121–43. pmid:15298169
- 2. Vasin AV, Temkina OA, Egorov VV, Klotchenko SA, Plotnikova MA, Kiselev OI. Molecular mechanisms enhancing the proteome of influenza A viruses: an overview of recently discovered proteins. Virus Res. 2014 Jun 24;185:53–63. pmid:24675275
- 3. Chen W, Calvo PA, Malide D, Gibbs J, Schubert U, Bacik I, et al. A novel influenza A virus mitochondrial protein that induces cell death. Nat Med. 2001 Dec;7(12):1306–12. pmid:11726970
- 4. Wise HM, Foeglein A, Sun J, Dalton RM, Patel S, Howard W, et al. A complicated message: Identification of a novel PB1-related protein translated from influenza A virus segment 2 mRNA. J Virol. 2009 Aug;83(16):8021–31. pmid:19494001
- 5. Wise HM, Barbezange C, Jagger BW, Dalton RM, Gog JR, Curran MD, et al. Overlapping signals for translational regulation and packaging of influenza A virus segment 2. Nucleic Acids Res. 2011 Sep 1;39(17):7775–90. pmid:21693560
- 6. Jagger BW, Wise HM, Kash JC, Walters KA, Wills NM, Xiao YL, et al. An overlapping protein-coding region in influenza A virus segment 3 modulates the host response. Science. 2012 Jul 13;337(6091):199–204. pmid:22745253
- 7. Shi M, Jagger BW, Wise HM, Digard P, Holmes EC, Taubenberger JK. Evolutionary conservation of the PA-X open reading frame in segment 3 of influenza A virus. J Virol. 2012 Nov;86(22):12411–3. pmid:22951836
- 8. Muramoto Y, Noda T, Kawakami E, Akkina R, Kawaoka Y. Identification of novel influenza A virus proteins translated from PA mRNA. J Virol. 2013 Mar;87(5):2455–62. pmid:23236060
- 9. Wise HM, Hutchinson EC, Jagger BW, Stuart AD, Kang ZH, Robb N, et al. Identification of a novel splice variant form of the influenza A virus M2 ion channel with an antigenically distinct ectodomain. PLoS Pathog. 2012;8(11):e1002998. pmid:23133386
- 10. Baez M, Taussig R, Zazra JJ, Young JF, Palese P, Reisfeld A, et al. Complete nucleotide sequence of the influenza A/PR/8/34 virus NS gene and comparison with the NS genes of the A/Udorn/72 and A/FPV/Rostock/34 strains. Nucleic Acids Res. 1980 Dec 11;8(23):5845–58. pmid:7465426
- 11. Zhirnov OP, Poyarkov SV, Vorob'eva IV, Safonova OA, Malyshev NA, Klenk HD. Segment NS of influenza A virus contains an additional gene NSP in positive-sense orientation. Dokl Biochem Biophys. 2007 May-Jun;414:127–33. pmid:17695319
- 12. Clifford M, Twigg J, Upton C. Evidence for a novel gene associated with human influenza A viruses. Virol J. 2009 Nov 16;6:198. pmid:19917120
- 13. Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010 Mar 1;26(5):680–2. pmid:20053844
- 14. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol. 2013 Dec;30(12):2725–9. pmid:24132122
- 15. Letunic I, Doerks T, Bork P. SMART: recent updates, new developments and status in 2015. Nucleic Acids Res. 2015 Jan;43(Database issue):D257–60. pmid:25300481
- 16. Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S, Chitsaz F, Geer LY, et al. CDD: NCBI's conserved domain database. Nucleic Acids Res. 2015 Jan;43(Database issue):D222–6. pmid:25414356
- 17. Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001 Jan 19;305(3):567–80. pmid:11152613
- 18. Brown EG, Liu H, Kit LC, Baird S, Nesrallah M. Pattern of mutation in the genome of influenza A virus on adaptation to increased virulence in the mouse lung: identification of functional themes. Proc Natl Acad Sci U S A. 2001 Jun 5;98(12):6883–8. pmid:11371620
- 19. Lycett SJ, Ward MJ, Lewis FI, Poon AF, Kosakovsky Pond SL, Brown AJ. Detection of mammalian virulence determinants in highly pathogenic avian influenza H5N1 viruses: multivariate analysis of published data. J Virol. 2009 Oct;83(19):9901–10. pmid:19625397
- 20. Christman MC, Kedwaii A, Xu J, Donis RO, Lu G. Pandemic (H1N1) 2009 virus revisited: an evolutionary retrospective. Infect Genet Evol. 2011 Jul;11(5):803–11. pmid:21382522
- 21. Lee VJ, Yap J, Maurer-Stroh S, Lee RT, Eisenhaber F, Tay JK, et al. Investigation of causes of oseltamivir chemoprophylaxis failures during influenza A (H1N1-2009) outbreaks. J Clin Virol. 2011 Feb;50(2):104–8. pmid:21094080
- 22. Ping J, Keleta L, Forbes NE, Dankar S, Stecho W, Tyler S, et al. Genomic and protein structural maps of adaptive evolution of human influenza A virus to increased virulence in the mouse. PLoS One. 2011;6(6):e21740. pmid:21738783
- 23. Tscherne DM, García-Sastre A. Virulence determinants of pandemic influenza viruses. J Clin Invest. 2011 Jan;121(1):6–13. pmid:21206092
- 24. Zhang Z, Hu S, Li Z, Wang X, Liu M, Guo Z, et al. Multiple amino acid substitutions involved in enhanced pathogenicity of LPAI H9N2 in mice. Infect Genet Evol. 2011 Oct;11(7):1790–7. pmid:21896338
- 25. Bui VN, Ogawa H, Xininigen , Karibe K, Matsuo K, Awad SS, et al. H4N8 subtype avian influenza virus isolated from shorebirds contains a unique PB1 gene and causes severe respiratory disease in mice. Virology. 2012 Feb 5;423(1):77–88. pmid:22192630
- 26. Schrauwen EJ, de Graaf M, Herfst S, Rimmelzwaan GF, Osterhaus AD, Fouchier RA. Determinants of virulence of influenza A virus. Eur J Clin Microbiol Infect Dis. 2014 Apr;33(4):479–90. pmid:24078062
- 27. Ghedin E, Sengamalay NA, Shumway M, Zaborsky J, Feldblyum T, Subbu V, et al. Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution. Nature. 2005 Oct 20;437(7062):1162–6. pmid:16208317
- 28. Chen GW, Chang SC, Mok CK, Lo YL, Kung YN, Huang JH, et al. Genomic signatures of human versus avian influenza A viruses. Emerg Infect Dis. 2006 Sep;12(9):1353–60. pmid:17073083
- 29. Finkelstein DB, Mukatira S, Mehta PK, Obenauer JC, Su X, Webster RG, et al. Persistent host markers in pandemic and H5N1 influenza viruses. J Virol. 2007 Oct;81(19):10292–9. pmid:17652405
- 30. Allen JE, Gardner SN, Vitalis EA, Slezak TR. Conserved amino acid markers from past influenza pandemic strains. BMC Microbiol. 2009 Apr 22;9:77. pmid:19386124
- 31. Chen GW, Shih SR. Genomic signatures of influenza A pandemic (H1N1) 2009 virus. Emerg Infect Dis. 2009 Dec;15(12):1897–1903. pmid:19998748
- 32. Tamuri AU, Dos Reis M, Hay AJ, Goldstein RA. Identifying changes in selective constraints: host shifts in influenza. PLoS Comput Biol. 2009 Nov;5(11):e1000564. pmid:19911053
- 33. Miotto O, Heiny AT, Albrecht R, García-Sastre A, Tan TW, August JT, et al. Complete-proteome mapping of human influenza A adaptive mutations: implications for human transmissibility of zoonotic strains. PLoS One. 2010 Feb 3;5(2):e9025. pmid:20140252
- 34. Pan C, Cheung B, Tan S, Li C, Li L, Liu S, et al. Genomic signature and mutation trend analysis of pandemic (H1N1) 2009 influenza A virus. PLoS One. 2010 Mar 8;5(3):e9549. pmid:20221396
- 35. Li W, Shi W, Qiao H, Ho SY, Luo A, Zhang Y, et al. Positive selection on hemagglutinin and neuraminidase genes of H1N1 influenza viruses. Virol J. 2011 Apr 21;8:183. pmid:21507270
- 36. Kainov DE, Müller KH, Theisen LL, Anastasina M, Kaloinen M, Muller CP. Differential effects of NS1 proteins of human pandemic H1N1/2009, avian highly pathogenic H5N1, and low pathogenic H5N2 influenza A viruses on cellular pre-mRNA polyadenylation and mRNA translation. J Biol Chem. 2011 Mar 4;286(9):7239–47. pmid:21163951
- 37. Gíria MT, Rebelo de Andrade H, Santos LA, Correia VM, Pedro SV, Santos MA. Genomic signatures and antiviral drug susceptibility profile of A(H1N1)pdm09. J Clin Virol. 2012 Feb;53(2):140–4. pmid:22177272
- 38. Makkoch J, Suwannakarn K, Payungporn S, Prachayangprecha S, Cheiocharnsin T, Linsuwanon P, et al. Whole genome characterization, phylogenetic and genome signature analysis of human pandemic H1N1 virus in Thailand, 2009–2012. PLoS One. 2012;7(12):e51275. pmid:23251479
- 39. Forbes NE, Ping J, Dankar SK, Jia JJ, Selman M, Keleta L, et al. Multifunctional adaptive NS1 mutations are selected upon human influenza virus evolution in the mouse. PLoS One. 2012;7(2):e31839. pmid:22363747
- 40. Gutiérrez RA, Viari A, Godelle B, Frutos R, Buchy P. Biased mutational pattern and quasispecies hypothesis in H5N1 virus. Infect Genet Evol. 2013 Apr;15:69–76. pmid:22063822
- 41. Nguyen M, Haenni AL. Expression strategies of ambisense viruses. Virus Res. 2003 Jun;93(2):141–50. pmid:12782362
- 42. Cenik C, Derti A, Mellor JC, Berriz GF, Roth FP. Genome-wide functional analysis of human 5' untranslated region introns. Genome Biol. 2010;11(3):R29. pmid:20222956
- 43. Alymova IV, York IA, McCullers JA. Non-avian animal reservoirs present a source of influenza A PB1-F2 proteins with novel virulence-enhancing markers. PLoS One. 2014 Nov 4;9(11):e111603. pmid:25368997
- 44. Yoshizumi T, Ichinohe T, Sasaki O, Otera H, Kawabata S, Mihara K, et al. Influenza A virus protein PB1-F2 translocates into mitochondria via Tom40 channels and impairs innate immunity. Nat Commun. 2014 Aug 20;5:4713. pmid:25140902
- 45. Kamal RP, Kumar A, Davis CT, Tzeng WP, Nguyen T, Donis RO, et al. Emergence of Highly Pathogenic Avian Influenza A(H5N1) Virus PB1-F2 Variants and Their Virulence in BALB/c Mice. J Virol. 2015 Jun;89(11):5835–46. pmid:25787281