Table 1.
Basic sequencing matrices for Illumina and Oxford Nanopore (ONT) outputs of 37.
Fig 1.
PHRED base call quality score distribution of samples sequenced by Illumina and ONT.
Distribution plot of PHRED (probability of error per base call in a log scale) quality score (x axis) and error probability (secondary x axis) derived from the PHRED score for the data set sequenced from Illumina (n = 37) and ONT (n = 37). The scores of ONT are shown in blue and Illumina in red. The mean PHRED scores/error probability are shown with the dashed line for each technology. The mean PHRED scores averaged at 32.35 in Illumina reads and 10.78 in ONT.
Fig 2.
Comparison of amino acid changes detected in SARS-CoV-2 genomes by both sequencing technologies.
Annotated amino acid substitutions and deletions detected in each sample (n = 37). Mutations colored in green indicates they are synonymously detected by both sequencing technologies, whereas yellow and red indicate mutations detected exclusively by only one technology. The X axis indicates each amino acid, which is denoted by the original amino acid, its position in the protein and the substitution/deletion. Amino acid deletions are denoted by “del”.
Fig 3.
Percentage of ambiguous bases (% of N) compared to the no of mutations (SNP) detected in each sample.
The trend between percentage of ambiguous bases is shown in the x axis and number of mutations projected onto the consensus sequences is shown in the y axis. The numbers displayed on each shape denotes the sample identification number (n = 37) sequenced by Illumina (red) or ONT (blue). Both technologies had detected a maximum SNPs with 1% - 5% ambiguous bases. The consensus sequences generated by Illumina had a varying percentage of ambiguous bases between 1% - 30%, whereas ONT sequencing generated either < 5% or more than 40% ambiguous bases due to its longer read lengths. Altogether ONT had detected more SNPs than Illumina between of 1% - 5% ambiguous bases.
Fig 4.
Combined maximum likelihood phylogenetic tree created using sequence pairs of 37 the samples.
The ML tree was generated using the consensus sequences of each sequencing technology with 1000 bootstrap replicates using TIM2+F+R2 model. Tree is rooted on SARS-CoV-2 reference MN908947.3 and with samples sequences by Illumina coloured red and those sequenced by ONT coloured blue. Bootstrap support values are shown on each branch. 21/37 samples coupled together with < 98% bootstrap support had > 90% genome coverage from both Illumina and ONT datasets while 7/37 samples coupled together with less than 98% bootstrap support. 9/37 of the samples which failed to couple with their counterpart from ONT or Illumina had moderate to high (3% - 31%) ambiguous bases in either sequences.
Fig 5.
Correlation of sub-consensus allele frequencies observed for SNV and Indels between two sequencing technologies.
a) Correlation between sub-consensus single nucleotide substitution frequencies observed for Illumina and ONT. Nucleotide substitutions detected exclusively by one technology are indicated in green and blue whereas the substitutions detected by both technologies are colored red. Even though more nucleotide substitutions exclusive to ONT were observed, there is a clear positive correlation (R2 = 0.79) between two sequencing technologies. b) Correlation between sub-consensus indel frequencies observed for Illumina and ONT. More indels exclusive to ONT can be seen with a weak correlation (R2 = 0.13) between the indel frequencies between two technologies suggesting ONT tend to result in more false-positive indels.