Locations and structures of influenza A virus packaging-associated signals and other functional elements via an in silico pipeline for predicting constrained features in RNA viruses

doi:10.1371/journal.pcbi.1012009

Table 1.

Summary of constrained regions found in influenza types, by analysis method.

More »

Expand

Fig 1.

PB2 3′ vRNA region whose cRNA is deemed constrained by RNAdescent.

A Example of graphical RNAdescent output, for analysis of raw constraint data in H5N8 PB2. Each dot/triangle represents a codon, with descents representing more constrained regions (see Methods and references [11, 12] for further details). Regions whose constraint is deemed significant by the algorithm are highlighted in orange, with manual annotation of the order in which the regions have been found and their bootstrap p-values. In this example, two constrained regions are found, corresponding to the packaging-associated regions at the ends of the vRNA. Plots for other combinations of subtype/host/type of data analysed may be found in S1 Code–S8 Code. B RNAalifold output from folding the PB2 3′ vRNA corresponding to the region of H5N8 cRNA deemed constrained in the RNAdescent analysis using raw constraint data. The fold is of the (− sense) vRNA, produced using RNAalifold [17, 18]. The termini of the predicted structure are labelled with the corresponding amino acid number of PB2. Portions of the two predicted stem-loops described in the main text are highlighted: an orange line highlights the nucleotides corresponding to codons 16 and 17 of PB2, and a purple line highlights the nucleotides corresponding to codons 27 and 28 of PB2. Base pairs are highlighted in deep/mid/light red when all/all but one/all but two sequences are capable of forming the pairs shown. Base pairs are highlighted in deep/mid/light yellow when all/all but one/all but two sequences are capable of forming the pair shown or one other pair (including GU pairs). Base pairs are highlighted in green when all sequences are capable of forming the pair shown or one of two other pairs (including GU pairs). RNAalifold was used with input options disallowing lonely pairs, allowing G-quadruplexes, and with the ribosum scoring matrix enabled. This avian strain fold was produced with the temperature set to 41°C. Fold predictions for other subtype/host/RNAdescent analysis type combinations may be found in Fig A of S1 Appendix.

More »

Expand

Fig 2.

Folds of PB1 3′ vRNA regions whose cRNA is deemed conserved by RNAdescent.

A small conserved stem-loop can be seen. Labels correspond to the influenza A subtype, host, and analysis method (raw constraint data or ranked constraint data) that predicted the constrained region. Folds are of the (− sense) vRNA, produced using RNAalifold. The folds displayed are truncated to the regions containing information on the stem-loop we discuss, using base pair predictions from the folds of the full regions predicted to be constrained; the full region folds may be found in S4 and S8 Figs. The folds are annotated with PB1 codon numbers. Base pairs are highlighted by RNAalifold in deep red when all sequences are capable of forming the pairs shown. Base pairs are highlighted in deep/mid yellow when all/all but one sequences are capable of forming the pair shown or one other pair (including GU pairs). RNAalifold was used with input options disallowing lonely pairs, allowing G-quadruplexes, and with the ribosum scoring matrix enabled. Avian strain folds were produced with the temperature set to 41°C.

More »

Expand

Fig 3.

Representative fold of stem-loop in PA 3′ vRNA region whose cRNA is deemed constrained by RNAdescent.

The fold displayed is of H7N9 avian host sequences, truncated to a region covering NC_026424.1 nucleotides 23–46, using base pair predictions from the folds of the full regions predicted to be constrained. The nucleotides corresponding to codons 9 and 15 are labelled. RNAalifold predicts similar folds for multiple analogous constrained regions in other subtype/host/RNAdescent analysis type combinations (S2, S6, S7 and S8 Figs), and one slightly differing fold in one analogue (S4 Fig). The displayed fold is of the−sense vRNA, produced using RNAalifold [17, 18]. Base pairs are highlighted by RNAalifold in deep red when all sequences are capable of forming the pairs shown. Base pairs are highlighted in deep/mid yellow when all/all but one sequences are capable of forming the pair shown or one other pair (including GU pairs). RNAalifold was used with input options disallowing lonely pairs, allowing G-quadruplexes, and with the ribosum scoring matrix enabled. Avian strain folds were produced with the temperature set to 41°C.

More »

Expand

Fig 4.

Constraint analysis of PA 3′ cRNA (5′ vRNA) region.

Top: Examples of RNAdescent analysis output, for unweighted (raw) constraint data for H5N1 avian host sequences and for weighted constraint data for H7N9 human host sequences. Each point corresponds to one codon in PA, ordered 5′ to 3′ in cRNA. Codons that do not yield any information (weight zero) are plotted with open triangles. Orange points correspond to codons in regions deemed significantly constrained by the algorithm. In both cases, a steep descent can be seen in the centre of the plot corresponding to the conserved PA/PA-X out of frame overlap. A steep descent can also be seen to the 3′ side of the plot, indicating the long constrained regions found. Constraint is apparent throughout the entire regions. Bottom: Fold of (− sense) vRNA, from RNAalifold analysis of H5N8 avian host sequences. The region folded is the region of interest found in the 3′ cRNA from analysing ranked codon constraint data. To the right of the figure, a stem-loop corresponding to codons 685–689 may be seen. To the left of the figure, a stem-loop corresponding to codons 701–704 may be seen. The fold is of the−sense vRNA, produced using RNAalifold. Selected nucleotides are numbered with corresponding amino acid numbers. Base pairs are highlighted by RNAalifold in deep red when all sequences are capable of forming the pairs shown. Base pairs are highlighted in deep/mid yellow when all/all but one sequences are capable of forming the pair shown or one other pair (including GU pairs). RNAalifold was used with input options disallowing lonely pairs, allowing G-quadruplexes, and with the ribosum scoring matrix enabled. The fold was produced with the temperature set to 41°C.

More »

Expand

Table 2.

Sequences of comparable constrained regions in the 5′ vRNA (3′ cRNA) packaging-associated region of HA.

More »

Expand

Fig 5.

Predicted stem-loop in M vRNA in the region corresponding to M1 codons 65–72.

The image shows a truncation to the region of interest (nucleotides corresponding to codons 65 and 72 marked) of the RNAalifold prediction of the constrained region arising from RNAdescent analysis of H5N8 avian host sequences, using raw (unranked) constraint data. All RNAdescent analyses, except those of H1N1 human host sequences with ranked constraint data and H3N2 human host sequences with raw constraint data, predict constraint in this region and the base pairings displayed here are replicated in all other RNAalifold predictions. (Some RNAalifold analyses predict one or two additional base pairings that elongate the stem.) Base pairs are highlighted by RNAalifold in deep red when all sequences are capable of forming the pairs shown. Base pairs are highlighted in deep yellow when all sequences are capable of forming the pair shown or one other pair (including GU pairs). RNAalifold was used with input options disallowing lonely pairs, allowing G-quadruplexes, and with the ribosum scoring matrix enabled. Avian strain folds were produced with the temperature set to 41°C.

More »

Expand

Fig 6.

Kozak contexts of PB1-N92 initiation codons in investigated mammalian and avian strains.

Although characteristic differences can be seen between individual codons for mammalian versus avian strains, all strains have strong initiation contexts. Note that the H7N9 (human) viruses have more in common with avian than with other mammalian viruses, as might be expected for a strain where infection in humans is predominantly a direct zoonosis. Plots were generated using WebLogo3 [59, 60].

More »

Expand

Fig 7.

Examples of predicted structures near the PA/PA-X frameshift.

Predictions are shown using RNAalifold [18] on sections of the alignments for the labelled subtypes, with the fourth position in the slippery site UCC_UUU_CGU marked with black arrows and numbered “+1”. Folding was performed on subsequences beginning before and ending after this position as indicated in the labels; these subsequences correspond to regions deemed constrained by our algorithm. Where a long region was folded, only the portion of the fold containing the slippery site and stem-loop is displayed (where this results in discontinuous display of a portion of nucleic acid, a nucleotide is labelled to give its position in relation to the slippery site). Possible GAAA (in some cases with an A substituted by G) motifs that can base pair with the UUUC of the slippery site are marked in orange. A number of possible base pairings leading to stem-loop motifs are predicted. We postulate that an ensemble of such stem-loops is in fact seen, with the composition of the ensemble capable of modifying the relative PA/PA-X abundances (see main text): the differing predicted secondary structures would therefore arise because small differences in the input data result in the algorithm used reaching different, but numerically very close, modified free energy minima. Predicted structures for analysed but not shown subtypes have similar topology to one of the structures displayed (S3–S8 Figs).

More »

Expand

Fig 8.

Possible stem-loop structure in the NS intron.

Left: The displayed sequence is the H5N8 consensus sequence, with putative base pairs from manual inspection. NS1 codons 28–30 and 35–38 have high levels of constraint. The UCCU motif in the loop is complementary to the NS2 splice acceptor site; we note that some sequences contain an alternative UUCU motif (annotated in blue), which is still capable of pairing. Right: RNAalifold prediction of pairing in H7N7 sequences, using as input the region predicted by analysis of ranked constraint data. Base pairs are highlighted in deep/mid red when all/all but one sequences are capable of forming the pairs shown. Base pairs are highlighted in deep yellow when all sequences are capable of forming the pair shown or own other pair (including GU pairs). RNAalifold was used with input options disallowing lonely pairs, allowing G-quadruplexes, and with the ribosum scoring matrix enabled. The H7N7 fold was produced with the temperature set to 41°C. Other structures predicted by RNAalifold may be found in S4–S8 Figs.

More »

Expand

Fig 9.

RNAalifold structure prediction of the conserved stem-loops near the HA cleavage site in H7N9 human host viruses.

Nucleotide numbering follows RefSeq NC_026425.1 (GenBank KF021597.1). RNAalifold [18] uses a minimum free energy algorithm, modified to take into account how many sequences in an alignment can form predicted base pairs. Red (light red, pink)/yellow (light yellow, pale yellow) highlighted base pairs indicate that all (all but one, all but two) sequences in the alignment can form the base pair, with no/one alternative base pairing. We reproduce the small previously predicted [28] stem-loop starting at nt1030; although we predict a larger stem-loop near the cleavage site, it differs from stem-loops predicted for other strains.

More »

Expand