Fig 1.
Most of the analyzed SARS-CoV-2 sequences differ from WT spike protein, but exhibit only few non-synonymous mutations.
(A) The histogram shows the number of non-synonymous spike protein mutations detected in the analyzed samples. (B) The mean (red) and median (blue) number of mutations per spike protein sequence increased over time. The top line gives the number n of samples per month. The boxplots indicate the monthly distributions of mutations per sample.
Fig 2.
Recurrent variants are found throughout the whole spike protein.
(A) Most of the detected variants were recurrent events occurring in at least two samples from the assembly or NGS data sets. (B, C) Each data point represents a distinct protein sequence mutation in the spike protein. The labels indicate the amino acid exchange for variants found in more than 1% of the assemblies (B) or NGS samples (C). The RBD is highlighted in red. (D) 1,637 variants (grey) were detected both in the assemblies and the NGS data. (E) A subset of 35 variants co-occurred in at least 5000 of the mutated spike protein sequences (assemblies and NGS data combined). For better visibility, co-occurrences in less than 5000 samples were set to 0 (white tiles).
Fig 3.
Variant frequencies of spike protein mutants indicate presence of multiple SARS-CoV-2 mutants in some samples.
(A) The boxplot shows the distributions of the fraction of supporting reads of the mutations found in the NGS data. The numbers of underlying samples are indicated above the collection dates. Most of the observed variants have a variant allele frequency of > = 0.95 and can be accounted as clonal. (B) Filtering for high-confidence subclonal variants (green) with sequencing depth > = 30 reads and fractions of supporting reads between 0.1 and 0.95. (C) Sample-wise depiction of high-confidence subclonal events. Some of the observed subclonal variants were recurrent (blue) and only few were individual (red). The samples were ordered by collection date (see also color bar at the bottom of the plot) and point sizes indicate sequencing depth (log10 scale). Subclonal variants of the same sample are linked with grey lines. The fraction of supporting reads of variants found in the same sample differed notably in some cases.
Fig 4.
Variants affect antibody and T cell target sites.
(A) The number of published T cell epitopes (listed in the IEDB or recognized by CD8+ or CD4+ T cells as reported by Snyder et al. [25] that are affected by recurrent or individual spike protein variants is depicted. Most of the variants hit at least one epitope. (B) Predicted binding affinity rank of wild type epitope (WT) vs. the quotient of the predicted rank of wild type and mutant epitope (MUT) depicted as heat map. Lower ranks indicate better binding. A small quotient indicates worse binding prediction for the mutant epitope.