Extensive C->U transition biases in the genomes of a wide range of mammalian RNA viruses; potential associations with transcriptional mutations, damage- or host-mediated editing of viral RNA

doi:10.1371/journal.ppat.1009596

Table 1.

COMPOSITIONAL FEATURES OF RNA VIRUS SEQUENCE DATASETS USED IN THE STUDY^¹.

More »

Expand

Fig 1.

Transition frequencies and asymmetries in RNA virus alignments.

Relative frequencies of each mutation type expressed as a percentage of all changes (y-axis) in the 36 RNA virus alignments at sites showing <5% heterogeneity. (B) Comparison of normalised transition asymmetry values; the dotted red line shows the expected unbiased transition asymmetry. For both graphs, distributions were compared using the Mann-Whitney U test; significant (p < 0.05) p values shown. Box plots show maximum, upper interquartile range (IQR), median, lower IQR and minimum values of each distribution.

More »

Expand

Fig 2.

Frequency related transition asymmetries.

Transitional asymmetries of two virus datasets showing C->U/ G->A asymmetries. Normalised values (y-axis) were calculated for sites showing different levels of sequence heterogeneity (x-axis): 0.02: 0.02 or less; 0.05: <0.05 and ≥0.02; 0.1: <0.1 and ≥0.05; 0.25: <0.25 and ≥0.1; 0.4: <0.4 and ≥0.25; 1.00: ≥0.4.

More »

Expand

Fig 3.

Distribution of C->U and U->C mutations in individual sequences.

Numbers of C->U and U->C transitions in individual coding region sequences of HCV-1a and EV-A71 plotted as frequency histograms. Distributions were fitted to Poisson distributions based around their mean numbers of substitutions (light and dark blue lines).

More »

Expand

Fig 4.

Association of transition asymmetries with RNA secondary structure.

The association of transition asymmetry values with MFED values, indicating of the degree of genome RNA folding. Correlation values (R) and significance using linear regression for C->U / U->C and G->A / A->G asymmetries are shown.

More »

Expand

Table 2.

PREDICTIVE FACTORS FOR G->A AND C->U TRANSITION ASYMMETRIES^¹ BY ANOVA.

More »

Expand

Fig 5.

Influence of 5’ and 3’ bases on C->U mutation frequencies.

Influence of the identities of the immediate 5’ base and 3’ bases on C->U mutation frequencies in a range of RNA viruses showing C->U/U->C transition asymmetry. Normalised C->U/C->U transition asymmetries in each 5’ and 3’ context were adjusted to account for 5’ or 3’ base frequencies. The y-axis shows the over- or under-representation of the asymmetry values in each context relative to the value for all contexts; the null expectation (no effect of 5’ or 3’ base) was 1.0 (red dotted line). Distributions of values for each context were compared by Mann-Whitney U test; p values < 0.05 shown.

More »

Expand

Fig 6.

Using the association index to determine informativeness of individual sites.

Schematic summary of the steps used to investigate site informativeness. Individual alignment positions are sequentially analysed for their concordance to a global phylogeny. Base identity is used to assign groups which are then use for calculation of an association value through group segregation in a neighbour-joining tree of the alignment where non-bootstrap supported branches are collapsed. The AI index is its ratio to the mean association value of 10 sequence label order randomised controls (representing the null expectation of no association). Finally, the Shannon entropy score, representing site heterogeneity is recorded.

More »

Expand

Fig 7.

Distribution of association index values in virus datasets.

Frequency distributions of AI values at variable sites in alignments of representative viruses showing unbiased (left) or elevated (right) C->U/U->C transition asymmetries and with comparable overall sequence divergence (MPD values listed in Table 1). Histograms were sub-divided based on their Shannon entropy range (see key for colour coding; minimally variable sites (Shannon entropy < 0.3) were excluded). Insets show the corresponding tree topologies for each virus analysed, for large datasets (HCV, EV-A71; DENV1), trees based on randomly selected representative sequences are shown for clarity. Phylogenetic trees drawn to scale are provided in S2 Fig.

More »

Expand

Fig 8.

Effect of association index values and site variability transition frequencies.

Relative frequencies of different transitions at sites varying in AI value, reflecting their phylogenetic informativeness (A), and in sequence heterogeneity (B). Bar heights show means of the four component virus datasets; error bars show standard errors of the mean). Frequencies of C->U were compared with frequencies of the other three transitions in each band using the Mann-Whitney U test; significant values shown in red.

More »

Expand

Fig 9.

Proportionate excess of C->U over U->C transitions in phylogenetically informative and non-informative sites.

Excess C->U mutations (number of sites with majority C->U transition–sites with U->C) expressed as a proportion of all variable sites in genome alignments of viruses showing C->/U->C asymmetry. Proportions were normalised by mononucleotide base frequencies. Separate proportions were calculated by AI band, representing sites that were phylogenetic informative (low AI values) through to uninformative (high AI values).

More »

Expand