Fig 1.
Experimental design of the artificially subsampled and contaminated genomes for the 14 levels of contamination (low and high levels) at three sequencing depths (low—12,500 reads, medium -25,000 reads, and high—50,000 reads). The controlled datasets were generated from known clinical SARS-CoV-2 samples. The sequence combinations were either known omicron sequences mixed with known alpha sequences or known delta sequences mixed with known delta sequences. Textual representation of experimental plan is depicted in Table 1. Created with BioRender.com.
Table 1.
Standardized terms and parameters of the artificially subsampled and contaminated genomes.
Fig 2.
Global nucleotide comparison of artificially generated contaminated samples and their corresponding clinical SARS-CoV-2 reads as the background samples at a low sequencing depth.
A) A heatmap of the pairwise p-distance comparison of the LSD_SV samples—a delta background sequence (AY.25.1) contaminated with a similar delta contaminant sequence (AY.27). B) A heatmap of the pairwise p-distance comparison of the LSD_DV samples–an omicron background sequence (BA.1) contaminated with an alpha contaminant sequence (B.1.1.7).
Fig 3.
Phylogenetic tree and heatmap showing single nucleotide variants (SNVs) at different positions of the SARS-CoV-2 genome for (A) AY.25.1 (delta variant) contaminated with an AY.27 (delta variant) sequence at contamination levels 1–10%, 20%, 30%, 40%, and 50% at low sequencing depth sequence. (B) BA.1 (omicron variant) contaminated with a B.1.1.29 (alpha variant) sequence at contamination levels 1–10%, 20%, 30%, 40%, and 50% at low sequencing depth sequence.
Table 2.
Quality control metrics comparison for artificially subsampled and contaminated genomes of contamination by similar variants at a low sequencing depth–all LSD_SV genomes.
Table 3.
Quality control metrics comparison for artificially subsampled and contaminated genomes of contamination by different variants at a low sequencing depth–for all LSD_DV genomes.
Fig 4.
Mutational profile comparison of SARS-CoV-2 genome for the clinical genomes to the artificially generated genomes for (A) LSD_SV (AY.25.1 contaminated with an AY.27 variant) sequence at contamination levels 1–10%, 20%, 30%, 40%, and 50%. (B) LSD_DV (BA.1 contaminated with a B.1.1.29 variant) at contamination levels 1–10%, 20%, 30%, 40%, and 50%.
Table 4.
A summary of the identified threshold for the artificially subsampled and contaminated reads as well as the origin of both background and contaminant samples.