Influenza A is a significant public health threat, partially because of its capacity to readily exchange gene segments between different host species to form novel pandemic strains. An understanding of the fundamental factors providing species barriers between different influenza hosts would facilitate identification of strains capable of leading to pandemic outbreaks and could also inform vaccine development. Here, we describe the difference in predicted RNA secondary structure stability that exists between avian, swine and human coding regions. The results predict that global ordered RNA structure exists in influenza A segments 1, 5, 7 and 8, and that ranges of free energies for secondary structure formation differ between host strains. The predicted free energy distributions for strains from avian, swine, and human species suggest criteria for segment reassortment and strains that might be ideal candidates for viral attenuation and vaccine development.
Citation: Priore SF, Moss WN, Turner DH (2012) Influenza A Virus Coding Regions Exhibit Host-Specific Global Ordered RNA Structure. PLoS ONE 7(4): e35989. doi:10.1371/journal.pone.0035989
Editor: Gajendra P. S. Raghava, CSIR-Institute of Microbial Technology, India
Received: February 2, 2012; Accepted: March 25, 2012; Published: April 25, 2012
Copyright: © 2012 Priore et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: S.F.P. was a trainee in the Medical Scientist Training Program funded by NIH T32 GM07356. This work was also supported by NIH R01 GM22939. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Influenza A is a (−)sense RNA virus of significant public health concern. Seasonal epidemics result in over 41,400 deaths and 200,000 serious illnesses each year in the United States . More concerning is the ability of Influenza A to cause pandemics. Pandemics are believed to arise because the genome consists of eight single-stranded RNA segments that can reassort from multiple host species to form novel strains to which humans have little or no prior immunity . The 2009 “Swine-flu" pandemic is thought to involve reassortment of genes from avian, swine, and human strains . Because swine cells have surface glycoproteins that are recognized by both avian and human HA protein, they have been proposed as a “mixing vessel" for reassortment , . Thus, it is believed that there is a natural species barrier largely preventing direct transmission between avian and human strains . Several studies have identified different aspects of influenza A that restrict host replication range , , , , but fundamental factors that differentiate host influenza strains are not well defined.
One potential fundamental species barrier that has not been considered is RNA secondary structure. Many RNA viruses contain functionally important structures that are crucial for efficient replication. Stabilities of such structures will be dependent on temperature, which varies with host species , . Bioinformatics approaches have identified several areas of the influenza genome that likely contain conserved RNA secondary structure, especially in the (+)RNA , , , , . While most known functional RNA structures are local in nature, extensive base pairing exists throughout the genomes of many (+)RNA viruses . This phenomenon is called Genome-scale Ordered RNA Structure (GORS) and is speculated to be involved in viral persistence and immune system avoidance. Whatever its purpose, it is clear that GORS plays an important role in the function of many viruses because evolutionary constraints on coding regions are apparently increased to maintain extensive base-pairing . A previous study has demonstrated the potential of Influenza A NS1 mRNA to form extensive base pairing and has hypothesized about the role of RNA secondary structure in influenza evolution and pathogenesis . If there were no functional implications, it would be expected that a virus would evolve to have more plasticity in its coding region to adapt to changing immune responses and to acquire drug resistance. The influenza (−)RNA is transcribed to (+)RNA in the host cell without the need for a DNA intermediate. Therefore, it is possible that influenza virus possesses GORS-like features in its coding regions, and that this may be one factor that separates influenza viruses that replicate in different host species. In this paper the GORS acronym is redefined to mean Global Ordered RNA Structure, because both strand orientations are considered. A combination of minimum free energy predictions of native and matched randomized controls is used in this study. Z-scores are calculated from these data to give a measure of how much “excess" stability a sequence possesses than would be predicted for random sequences . A more negative z-score indicates a more stably folded native sequence. Comparisons between predicted thermodynamic stabilities at 37°C of RNA secondary structure of all eight segments in avian, swine and human (+)RNA coding regions and z-scores reveal a pattern in which segments 1, 5, 7 and 8 exhibit host species-specific GORS and free energy distributions. These results contrast with the uniform lack of GORS in (−)RNA.
Segments 1(PB2), 2(PB1/PB1-F2), and 3(PA)
Segments 1, 2, and 3 code for the influenza A polymerase proteins. In addition, segment 2 has additional internal open reading frames that code for a pro-apoptotic peptide, PB1-F2 and a longer N40 protein of unknown function , . The cumulative distribution of z-scores (see Materials and Methods) for all three segments is shown in Figure 1. The average predicted z-scores and free energies are presented in Tables 1 and 2, respectively. In the (+)RNA, segment 1 had the most negative average z-score of these segments with avian having the lowest (−1.46), followed by swine (−1.31) and then human (−0.89). Only the distribution of z-scores for segment 1 was significantly shifted into the negative region, while the segment 2 distribution centered around zero and the z-scores for segment 3 were mainly positive. All three segments of (−)RNA had z-scores of approximately zero (Figure 1 and Table 1).
(+)RNA and (−)RNA are shown in the top and bottom panels, respectively. Z-scores are on the x-axis with z-scores predicting more stable RNA secondary structure toward the left of the plot. Fractions of sequences with z-scores greater than or equal to the abscissa are reported on the y-axis. Avian, swine and human sequences are represented by blue, red and green coloring, respectively.
Predicted free energy distributions for segments 1–3 are shown in Figure 2. There was a marked distinction between the three host species populations in segment 1. On average, human strains were predicted to be the least stable, with swine occupying an intermediate position between human and avian strains. While segments 1 and 2 were essentially the same length, the predicted average free energies for segment 1 were significantly more stable than for segment 2 for all three species. This difference was also apparent in the lower z-scores in segment 1 (Table 1). The (−)RNA of segment 1 generally had less stable predicted free energies compared to the (+)RNA. Segments 2 and 3 showed no significant change between strand orientations. Therefore, based on the distribution of z-scores, it appears that segment 1 has significant species-specific GORS, but segments 2 and 3 do not.
(+)RNA, (−)RNA, and 2009-present (+)RNA sequences are shown in the top, middle and bottom panels, respectively. Free energy bins are in 1 kcal/mol increments on the x-axis. Percentages of sequences in each bin are reported on the y-axis. Avian, swine and human sequences are represented by blue, red and green coloring, respectively.
Another interesting feature of the predicted free energy distributions can be seen with the 2009 pandemic strains. The predicted free energies for the human 2009 H1N1 strains coincide with the main region of overlap between swine and avian strains for segments 1 and 3 (Figure 2). This is not surprising as the phylogenetic origin for these segments was avian . However, this suggests that strains in overlap regions of predicted free energy distributions might be more prone to reassortment than strains at the extremes.
Segments 4(HA) and 6(NA)
Segments 4 and 6 code for the antigenic proteins that allow influenza A virus to enter and exit host cells. These surface glycoproteins are also major contributors to the immune response in humans. While most influenza A segments have consistently sized coding regions, 4 and 6 have numerous subtypes with different lengths. To compensate for this difference, all sequences for segments 4 and 6 were normalized with respect to length.
The cumulative distributions of z-scores for segments 4 and 6 are shown in Figure 3. The average z-scores in the (+)RNA were greater than −0.6 (Table 1) and the distributions did not appreciably fall within the negative range. The distributions of z-scores in the (−)RNA centered around zero (Figure 3). The detailed predicted free energies for segments 4 and 6 are shown in Figure 4. The average free energies were more stable in the (+)RNA than the (−)RNA for all three species (Table 2). The distributions for the (+)RNA were more distinct for each of the species than for the (−)RNA, but both orientations show a high degree of overlap between the three hosts. The slightly negative scores in the (+)RNA could be a sign of local RNA secondary structure present in these segments as previously predicted .
(+)RNA and (−)RNA are shown in the top and bottom panels, respectively. See Figure 1 for annotations and details.
(+)RNA and (−)RNA are shown in the top and bottom panels, respectively. See Figure 2 for annotations and details.
Segment 5 (NP)
Segment 5 codes for the NP protein that binds RNA and serves as a scaffold for ribonucleoprotein production that is critical for viral replication. Segment 5 genes from avian strains are known to cause significant attenuation when crossed with human strains and replicated in mammalian cells both in vivo and in vitro , .
In light of these findings, it is intriguing that segment 5 had the largest distribution separation of predicted (+)RNA free energies between avian and human strains of any of the influenza segments (Figure 5). The cumulative distribution of z-scores for all three species was significantly shifted to below zero (Figure 5). Thus, segment 5 also appears to possess GORS in the (+)RNA. The distributions of predicted free energies for swine and human strains overlapped almost completely, while avian strains were predicted to be more stable. Sequences more stable than −506 kcal/mol only included avian strains; thus these sequences might be good candidates to test for viral attenuation of human influenza viruses.
Left: Cumulative distribution plots of z-scores for segment 5 in the (+)RNA (Top) and (−)RNA (Bottom). See Figure 1 for annotations and details. Right: Predicted free energy distributions at 37°C for segment 5 in the (+)RNA (Top) and (−)RNA (Bottom). See Figure 2 for annotations and details.
The (−)RNA z-scores for segment 5 centered around zero (Figure 5). Predicted free energies in the (−)RNA were significantly less stable with markedly more overlap of the distributions for the different host species relative to the (+)RNA (Figure 5).
Segments 7 (M1/M2) and 8 (NS1/NEP)
Segments 7 and 8 are both alternatively spliced to code for two proteins each. Segment 7 codes for the M1 and M2 proteins, which are the structural components of the viral membrane and drive new virion assembly , . Segment 8 codes for the multi-functional NS1 protein that controls splicing, retention of nuclear RNAs, and other functions . The smaller NEP protein is responsible for the export of viral RNAs out of the nucleus for packaging .
Cumulative distributions for segment 7 z-scores in M1 and M2 coding regions are shown in Figure 6. The distribution for the (+)RNA of M1 were clearly shifted into the negative region, while M2 was only slightly shifted below zero for human and swine sequences. Thus, it appears that M1 possesses GORS and M2 may not. Distributions of z-scores for the (−)RNA of M1 and M2 were centered at zero (Figure 6). The free energy distributions for M1 and M2 are shown in Figure 7. The majority of the swine M1 sequences had the least stable free energies, while human M1 sequences centered near −260 kcal/mol (Table 2). Avian M1 sequences were the most stable on average and displayed a bimodal distribution with peaks at −244 and −268 kcal/mol. As for segment 5 there was an area below −278 where only avian sequences were represented. The predicted free energy distributions for M1 and M2 in the (+) and (−) RNA showed no distinction between the three host species (Figure 7).
M1 and M2 are shown in the left and right panels, respectively. (+)RNA and (−)RNA are shown in the top and bottom panels, respectively. See Figure 1 for annotations and details.
M1 and M2 are shown in the left and right panels, respectively. (+)RNA and (−)RNA are shown in the top and bottom panels, respectively. See Figure 2 for annotations and details.
Cumulative distributions of z-scores for segment 8 NS1 and NEP coding regions are shown in Figure 8. Average z-scores for the (+)RNA were the most negative of all segments for avian and swine strains, and with the exception of segment 7(M1), for human strains as well (Table 1). The z-score distributions for the (+)RNA of both NS1 and NEP were in the negative region, while the (−) RNA was centered around zero. Predicted free energy distributions for human strains of NS1 (+)RNA showed distinct peaks at −207 and −230 kcal/mol (Figure 9). Swine and avian strains were much more widely distributed, but on average swine strains were more stable than human, but less stable than avian strains. There was a region below −251 kcal/mole where only avian strains were represented. The predicted free energy distributions for NEP (+)RNA showed distinct populations for avian and human strains with swine strains widely distributed between the two (Figure 9). Predicted free energy distributions for both NS1 and NEP in the (−)RNA did not show a distinction between host species (Figure 9). In addition, the average free energy in the (−)RNA was significantly less stable when compared to the (+)RNA (Table 2).
NS1 and NEP are shown in the left and right panels, respectively. (+)RNA and (−)RNA are shown in the top and bottom panels, respectively. See Figure 1 for annotations and details.
This work demonstrates that GORS exists in (+)RNA segments 1, 5, 7, and 8 of Influenza A virus (Figures 1, 5, 6, and 8). For certain segments, this phenomenon is accompanied by distinct distributions of predicted free energies between avian, swine, and human strains. Except for segment 6 and segment 7 (M1 and M2), the most negative average z-scores were for avian strains (Table 1). Avian strains also had the most stable predicted free energy on average, with the exception of segment 7 (M2). It appears that global RNA structure for segments 1, 5, 7(M1), and 8 (NS1 and NEP) may have evolved to have host specificity. Avian, swine and human viruses replicate in distinct environments. Temperatures for the avian gut, and the swine and human respiratory epithelium are 41, 37, and 33°C, respectively , . In addition, the pH of the avian gut is presumably much lower than in the human and swine respiratory tract. These changes in ambient temperature and pH are expected to influence the equilibrium of RNA base pairing. For example, segments 7 and 8 are thought to contain regions with temperature dependent equilibria between two folds . It is possible that similar undiscovered equilibria may exist in segments 1 and 5. The original GORS paper postulated the importance of RNA structure to viral persistence and avoidance of host cell immunity . Other host cellular factors such as, protein/mRNA interactions, mRNA decay and translation efficiency could also play a role in directing the host-specific evolution of influenza mRNA stability. Thus, variation of structural stability could be one method of adapting to distinct host environments, but further experiments are needed to clarify the significance of these results. Whatever the reason, the presence of stable structures is likely important to the viral life cycle. While z-score distributions for segments 2, 3, 4, 6, and 7(M2) (+)RNA do not support GORS, there is still the possibility of locally conserved RNA secondary structure as described previously .
In contrast, the (−)RNA uniformly lacks stable structure. In every case, the distribution of z-scores in the (−)RNA is centered around zero with no distinction between avian, swine, and human strains. This supports previous results suggesting that conserved secondary structure is heavily favored in the (+)RNA . If the observed bias in stabilities of the (+)RNA were artifacts of sequence nucleotide composition or the free energy prediction model, then similar distributions should be seen in the (−)RNA. This, however, is not the case. The distributions for the (−)RNA are those expected from an ideal negative control for unstructured RNA. As above, the lack of GORS in the (−)RNA does not preclude locally conserved structure, especially in the untranslated regions which were not considered in this study.
The results from this study have several potential applications. Sequence dependent thermodynamics could be a new criterion for gauging the ability of some segments to reassort between host species. This information would be valuable as reassortants can lead to pandemic strains. Segment 1 has very distinct populations between all three host species, while segments 5 and 8(NEP) had relatively large separation between human and avian strains. The areas of overlap on these distributions represent the most likely sequences to reassort, as can be seen with the 2009 H1N1 swine flu (Figure 2). These thermodynamic criteria may also be useful for tracking the evolution of influenza strains.
Segments 5, 7(M1), and 8(NS1) all have areas of predicted free energy distributions where only avian strains are represented. These sequences may be good candidates for attenuation of human strains for vaccine development. While most seasonal influenza vaccines contain no live virus, the development of live attenuated virus vaccines may be desirable because of potential enhanced immunogenicity and commercial production efficiency .
It will be important to continue to increase our knowledge of the fundamental species barriers that separate different host strains of influenza virus and the factors that contribute to pandemic reassortment. This and previous studies , , , ,  highlight the need to elucidate the functional implications of global and local RNA structure of influenza A and possibly other viruses that infect multiple hosts.
Materials and Methods
Coding regions for all unique influenza A mRNAs were downloaded from the NCBI Influenza Virus Resource Page . The data were divided into groups for strains acquired before and after 2009, inclusive. This division was necessary to avoid over representing 2009 H1N1 sequences, which have nearly doubled the size of the NCBI influenza database. Sequences were scanned to remove truncated sequences or those with ambiguous nucleotides. The nearest-neighbor thermodynamic model  as implemented by RNAfold  was used to predict RNA secondary structure. These calculations were run for both (+)RNA and (−)RNA. All predictions were calculated at 37°C to maximize accuracy and approximate physiological conditions.
Z-scores were calculated for every sequence in both (+)RNA and (−)RNA to test if minimal free energy predictions were more stable than predicted on average for random sequences . Sequences were randomized ten times using the shuffle algorithm in the Simmonics package  to maintain dinucleotide frequencies. Free energies for shuffled sequences were calculated as described above.
Conceived and designed the experiments: SFP WNM. Performed the experiments: SFP WNM. Analyzed the data: SFP WNM. Contributed reagents/materials/analysis tools: SFP WNM DHT. Wrote the paper: SFP WNM DHT.
- 1. Dushoff J, Plotkin JB, Viboud C, Earn DJ, Simonsen L (2006) Mortality due to influenza in the United States–an annualized regression approach using multiple-cause mortality data. Am J Epidemiol 163: 181–187.
- 2. Taubenberger JK, Kash JC (2010) Influenza virus evolution, host adaptation, and pandemic formation. Cell Host Microbe 7: 440–451.
- 3. Smith GJ, Vijaykrishna D, Bahl J, Lycett SJ, Worobey M, et al. (2009) Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature 459: 1122–1125.
- 4. Hass J, Matuszewski S, Cieslik D, Haase M (2011) The role of swine as “mixing vessel" for interspecies transmission of the influenza A subtype H1N1: a simultaneous Bayesian inference of phylogeny and ancestral hosts. Infect Genet Evol 11: 437–441.
- 5. Ma W, Kahn RE, Richt JA (2008) The pig as a mixing vessel for influenza viruses: Human and veterinary implications. J Mol Genet Med 3: 158–166.
- 6. Kuiken T, Holmes EC, McCauley J, Rimmelzwaan GF, Williams CS, et al. (2006) Host species barriers to influenza virus infections. Science 312: 394–397.
- 7. Murphy BR, Hinshaw VS, Sly DL, London WT, Hosier NT, et al. (1982) Virulence of avian influenza A viruses for squirrel monkeys. Infect Immun 37: 1119–1126.
- 8. Snyder MH, Buckler-White AJ, London WT, Tierney EL, Murphy BR (1987) The avian influenza virus nucleoprotein gene and a specific constellation of avian and human virus polymerase genes each specify attenuation of avian-human influenza A/Pintail/79 reassortant viruses for monkeys. J Virol 61: 2857–2863.
- 9. Tian SF, Buckler-White AJ, London WT, Reck LJ, Chanock RM, et al. (1985) Nucleoprotein and membrane protein genes are associated with restriction of replication of influenza A/Mallard/NY/78 virus and its reassortants in squirrel monkey respiratory tract. J Virol 53: 771–775.
- 10. Baigent SJ, McCauley JW (2003) Influenza type A in humans, mammals and birds: determinants of virus virulence, host-range and interspecies transmission. Bioessays 25: 657–671.
- 11. McCauley JW, Penn CR (1990) The critical cut-off temperature of avian influenza viruses. Virus Res 17: 191–198.
- 12. Massin P, Kuntz-Simon G, Barbezange C, Deblanc C, Oger A, et al. (2010) Temperature sensitivity on growth and/or replication of H1N1, H1N2 and H3N2 influenza A viruses isolated from pigs and birds in mammalian cells. Vet Microbiol 142: 232–241.
- 13. Gultyaev AP, Fouchier RA, Olsthoorn RC (2010) Influenza virus RNA structure: unique and common features. Int Rev Immunol 29: 533–556.
- 14. Gultyaev AP, Heus HA, Olsthoorn RC (2007) An RNA conformational shift in recent H5N1 influenza A viruses. Bioinformatics 23: 272–276.
- 15. Gultyaev AP, Olsthoorn RC (2010) A family of non-classical pseudoknots in influenza A and B viruses. RNA Biol 7: 125–129.
- 16. Ilyinskii PO, Schmidt T, Lukashev D, Meriin AB, Thoidis G, et al. (2009) Importance of mRNA secondary structural elements for the expression of influenza virus genes. OMICS 13: 421–430.
- 17. Moss WN, Priore SF, Turner DH (2011) Identification of potential conserved RNA secondary structure throughout influenza A coding regions. RNA 17: 991–1011.
- 18. Simmonds P, Tuplin A, Evans DJ (2004) Detection of genome-scale ordered RNA structure (GORS) in genomes of positive-stranded RNA viruses: Implications for virus evolution and host persistence. RNA 10: 1337–1351.
- 19. Somvanshi P, Singh V, Arshad M (2008) Modeling of RNA secondary structure of non structural gene and evolutionary stability of the influenza virus through in silico methods. JPB 1: 219–226.
- 20. Clote P, Ferre F, Kranakis E, Krizanc D (2005) Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. RNA 11: 578–591.
- 21. Wise HM, Barbezange C, Jagger BW, Dalton RM, Gog JR, et al. (2011) Overlapping signals for translational regulation and packaging of influenza A virus segment 2. Nucleic Acids Res 39: 7775–7790.
- 22. Chen W, Calvo PA, Malide D, Gibbs J, Schubert U, et al. (2001) A novel influenza A virus mitochondrial protein that induces cell death. Nat Med 7: 1306–1312.
- 23. Trifonov V, Khiabanian H, Rabadan R (2009) Geographic dependence, surveillance, and origins of the 2009 influenza A (H1N1) virus. N Engl J Med 361: 115–119.
- 24. Chen BJ, Leser GP, Jackson D, Lamb RA (2008) The influenza virus M2 protein cytoplasmic tail interacts with the M1 protein and influences virus assembly at the site of virus budding. J Virol 82: 10059–10070.
- 25. Gomez-Puertas P, Albo C, Perez-Pastrana E, Vivo A, Portela A (2000) Influenza virus matrix protein is the major driving force in virus budding. J Virol 74: 11538–11547.
- 26. Hale BG, Randall RE, Ortin J, Jackson D (2008) The multifunctional NS1 protein of influenza A viruses. J Gen Virol 89: 2359–2376.
- 27. O'Neill RE, Talon J, Palese P (1998) The influenza virus NEP (NS2 protein) mediates the nuclear export of viral ribonucleoproteins. EMBO J 17: 288–296.
- 28. Greenberg H, Kemble G (2011) Live Attenuated Influenza Vaccine Influenza Vaccines for the Future. pp. 273–291. Birkhäuser Advances in Infectious Diseases: Springer Basel.
- 29. Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Zaslavsky L, et al. (2008) The influenza virus resource at the National Center for Biotechnology Information. J Virol 82: 596–601.
- 30. Mathews DH, Sabina J, Zuker M, Turner DH (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. JMB 288: 911–940.
- 31. Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, et al. (1994) Fast folding and comparison of RNA secondary structures. Monatsh Chem 125: 167–188.
- 32. Simmonds P, Smith DB (1999) Structural constraints on RNA virus evolution. J Virol 73: 5787–5794.