Fig 1.
MAF–mutation annotation format table. Each filtered MAF combines mutations from all samples into a single dataset. Details of mutation call filtering and grouping as well as abbreviations are explained in the text of the "Design of the analysis" section.
Fig 2.
Comparison of base substitution spectra between rubella virus and SARS-CoV-2 datasets from filtered MAFs.
Creation of filtered SARS-CoV-2 MAFs and their abbreviated names NoDupsNonFunc, NoDupsFunc, NoDups, are described in Fig 1 and in associated text. The spectrum from each SARS-CoV-2 filtered MAF was compared with rubella mutation spectrum. Bars represent densities of base substitutions in each dataset calculated by dividing counts of each base substitution by counts of the substituted base in the reference sequence. Connecting lines visualize overall parallelism between rubella and each filtered MAF. Insert boxes show Spearman r, its 95% CI, and one-tailed p-value for hypothesis about positive correlation between rubella and SARS-CoV-2 spectra. Source data are in S4 Table.
Fig 3.
Comparison of base substitution mutagenesis between locations prone to loop or stem formation in viral RNA genomes.
Creation of filtered SARS-CoV-2 MAFs and their abbreviated names NoDupsNonFunc, NoDupsFunc, NoDups, are described in Fig 1 and in associated text. Bars represent densities of base substitutions in stem- or in loop-forming sections. Densities are calculated by dividing counts of each base substitution in either loop or in stem by counts of the substituted base in the loop-forming or in stem-forming regions of the reference sequence. Statistical comparison between mutagenesis in stem vs loop for every base substitution was done by two-tailed Fisher’s exact test. P-values were considered after correcting by FDR. Brackets indicate pairs passing FDR = 0.05. *<0.05, ** <0.005, *** <0.0005. Source data including exact p-values are in S5 Table.
Fig 4.
Trinucleotide-centered mutation motifs with statistically significant enrichment over random mutagenesis.
Creation of filtered SARS-CoV-2 MAFs and their abbreviated names NoDupsNonFunc, NoDupsFunc, NoDups are described in Fig 1 and in the associated text. The order of data in each group is the same as in the panel’s legend—Rubella, NoDupsNonFunc, NoDupsFunc, NoDups. Zero values are shown for the convenience of following the order of values within each group. Bars represent minimum estimates of mutation load that can be assigned to motif-specific mutagenic mechanism (MutLoad) as described in Methods. Statistical evaluation of enrichments was done by two-tailed Fisher’s exact test and corrected by FDR including p-values for all 192 possible trinucleotide-centered base substitution motifs. MutLoad for FDR>0.05 = 0. Only results for motifs which included the most frequent base substitutions in the plus-strand, C to U, G to A, A to G, U to C, are shown. Reverse complement motifs in the plus-strand corresponding to the statistically significant motifs mutated in the minus-strand are shown in parentheses. If both plus-strand motifs in the reverse complement pair were statistically-significantly enriched in at least one dataset (in rubella or in a filtered SARS-CoV-2 MAF), they are highlighted in red font. Source data including calculations for all 192 motifs are in S6 Table.
Table 1.
Analyses of base substitutions prevailing in rubella virus and in SARS-CoV-2 plus-strand genomic RNAs.