Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

SARS-CoV-2 viral genomes accumulate specific sets of SNVs over time.

(A) Frequency histogram showing the steady increase of SNVs called per viral isolate over time (Collection Date), indicating their accumulation in SARS-CoV-2 genomes. (B) Distribution of substitutions at unique SNVs. C>T and A>G substitutions have been previously associated with APOBEC and ADAR deaminase activities, on the SARS-CoV-2 ssRNA(+) genome or its dsRNA intermediate, respectively. (C) Graphical representation of SNV substitution profiles at various SARS-CoV-2 ORFs, illustrating intrinsic mutational bias for C>T dominating the mutation pattern in some ORF’s (i.e. 1a and 1b), but being masked (likely by selection) in other ORFs like ORF2 encoding Spike region. ORF lengths (kb) are given in parentheses across the x axis.

More »

Fig 1 Expand

Table 1.

Nucleotide substitution ratios of synonymous to non-synonymous changes among transtitions, G-to-A, A-to-G, C-to-U, U-to-C in 2020.

More »

Table 1 Expand

Fig 2.

Accumulation of mutations in SARS-CoV-2 genomes and evolution of variants in 2020.

(A) Dot plot representation of missense mutations identified in the SARS2-CoV-2 genome, of which fourteen were found in the most abundant SNVs, including Threonine-to-Isoleucine (T85I) and Proline-to-Leucine (P323L) changes in ORF1b (present in about 48.79% and 80.2% of the sequences, respectively) and the well-documented Aspartic-acid-to-Glycine (D614G) change in ORF2 (found in 80.76% of the sequences). A summary further detailing these predominant mutations is provided in the Table 2. (B) Density histograms (showing how the most common mutations from Fig 2A change over time), reveal that the most common mutations can be grouped into four distinct patterns (A-D); these mutational co-occurrences thus indicate the presence of at least three major variants. (C) Unique profiles of co-occurring mutational signatures from the dataset were employed to compile 48 sub-variant putative signatures (s1-s48; S2A Fig), distinct from the original Wuhan viral isolate (s0). 14 signatures and the s0, were found in more than 0.1% of the sequences. A time-scaled phylogenetic tree of those 14 subvariants and s0 (highlighted in red) reveals accumulation of mutations and more complex signatures with an acute burst of mutations in the summer of 2020 likely leading to a novel homegrown variant (s48). The first and last sequences by time profiled (per signature) are denoted with light blue and red dots respectively. The reference genome (Wuhan-Hu-1) is denoted with a dark purple dot. Gain of mutations in the clades is denoted with red letters for each specific mutation, while loss with grey. The most abundant signatures in the end of 2020 and early 2021 are s6, s22 and s48 (also shown in S2B Fig).

More »

Fig 2 Expand

Table 2.

Summary of predominant mutations detected in SARS-CoV-2 genomes in 2020, indicating their nucleotide position (relative to the reference genome), the ORF they are located in, the associated amino acid change, the related protein that recoding may impact, and the frequency (% of sequences) at which they occur.

More »

Table 2 Expand

Fig 3.

Some low frequency mutations in late 2020 rise to prevalence in 2021.

(A) Dot plots showing accumulation of multiple LFSMs (>0.1% of cohort) in ORF2 over time in 2020. The amino acid position per LSFM is shown at the bottom, while quartiles, from the first (Q1) till the last (Q4) are denoted on the right. Mutations in Spike domains are further denoted by shaded areas. A detailed list of all LFSM in 2020 is provided in S3 File. (B) Dot plot representation of missense mutations identified in the SARS2-CoV-2 genome in 2021 and revealing VOC’s arrival. 34 mutations were found in at least 10% of the genomes (dark purple), most of which are common with the ones identified in 2020 (Fig 2A, Table 1), including P323L and D614G, which are present in 100% of the genomes analyzed in 2021. A number of newly predominant mutations in the Spike appeared in 2021 (red labels), include some previously found in 2020 Q4 (red labels with asterisk) as well as novel ones originating from VOC’s. Not all mutations are named but the main ones are S13I, A570D, P681H, W152C, S982A and D1118H. The complete set of mutations found in more than 10% of the sequences in 2021 is shown in S3A Fig.

More »

Fig 3 Expand

Fig 4.

SARS-CoV-2 viral isolate signature frequencies change over time, but with different patterns across states, showing dynamic evolution by mutation, drift, selection and migration.

State-specific ridgeline plots indicate the density of each signature (y axis) over collection date (x axis). In each plot, peak colors gradually change to highlight transition in time (x axis), with the pink-shaded areas corresponding to periods of time where data was not available. States shown were selected only on abundance of sequence data throughout the year (n, number of viral genomes per state). Of note, the reference strain s0 is virtually absent by June 2020, while signature s48 is common from July 2020 through the end of the first quartile of 2021. Several new signatures found predominantly in 2021 (in more than 0.1% of the genomes) reveal a new and complex mutational profile, with a number of them being related to B.1.2 (blue labels), or B.1.1.7 (red labels) lineages (which were introduced by migration), as well as of other VOC lineages (S3A, S4 and S5 Figs).

More »

Fig 4 Expand

Fig 5.

Variants of concern emerging in the United States include novel “add-on” mutations in key Spike protein functional domains.

6 variants of concern (VOCs), B.1.1.7, B1.351, B.1.427, B.1.429 and P.1, were detected in the cohort of sequences we explored, primarily in the last quartile of 2020 and the first of 2021. Clearly VOC genomes are accumulating a diverse set of new spike mutations, in addition to their defining mutations (S3B Fig). Bar plots show the number of sequences (n) of the 6 different VOCs per collection date. Red labels annotate the introduction of new spike mutations in the genomes of the reported VOCs in time in more than 1% of the respective VOC, while less frequent mutations (>0.1% of genomes) are shown in light blue labels. Of note, the most common mutation is L5F, which, in being present robustly throughout time and across variants, suggests it may be a recurrent mutation. Interestingly, other mutations previously seen as LFSMs in 2020 (Fig 3A), including Q677H or Q677R, T859I, E1202Q, V1040F, V1176F are also present in distinct VOCs, suggesting multiple recurrent mutations that may reflect mutational or selection bias.

More »

Fig 5 Expand