Nanopore sequencing of SARS-CoV-2: Comparison of short and long PCR-tiling amplicon protocols

doi:10.1371/journal.pone.0259277

Fig 1.

The percentage of successfully sequenced multiplexed samples over time.

A sample is considered as successfully sequenced if the resulting sequence produced by the Artic pipeline has fewer than 500-bp (A) or 3-kb (B) marked as missing bases. Each run is represented by several time points, each point showing the percentage of successfully sequenced barcodes (y-axis) upon reaching a specified amount of sequenced data per barcode (x-axis).

More »

Expand

Table 1.

Overview of the MinION sequencing runs.

More »

Expand

Fig 2.

Reasons for discarding reads in the Artic pipeline.

The sequencing reads must pass through a series of filters to ensure correct sample assignment and the read quality. The bar graphs show the percentage of reads discarded for various reasons as well as those passing all filters. Panel (A): Summary per run. Panel (B): Detailed per-barcode analysis for UKBA-2 samples, 2-kb amplicons, standard flow cell. Group (a): reads without barcode identification. Group (b): reads with only one barcode (Artic pipeline requires barcodes on both ends to ensure that the whole read was sequenced and to decrease the probability of barcode bleeding). Group (c): low-quality reads (base caller quality less than 7). Group (d): reads that do not align to the SARS-CoV-2 reference. Group (e): reads that are too short (likely due to fragmentation). Group (f): reads that are too long (i.e. chimeric reads). The pipeline keeps reads of lengths between 1500 and 3000 for 2-kb amplicons, between 350 and 619 for 400-bp amplicons. The reads passing all filters are included in group (g).

More »

Expand

Fig 3.

Reads derived from the sub-genomic RNAs.

Sub-genomic RNAs (black), amplicons of primer pool 1 from the 2-kb primer set (red), and spliced alignments of a random sample of 50 reads from barcode 07 from UKBA-2 run with 2-kb amplicons classified as sub-genomic (blue). Visualization was created by the UCSC genome browser [40].

More »

Expand

Fig 4.

Coverage along the genome in two MinION runs for batch UKBA-2.

In both runs, an initial portion of the run containing on average 40-Mbp of sequencing data per barcode was used. Coverage values higher than 1000 were clipped at this value and are shown in blue. Coverage below 20 (default Artic cutoff) is shown in red. Medians of 10-bp windows are shown for smoothing. The very starts and ends of the genome are not covered by amplicons and are thus displayed in red. Shaded area in the left column corresponds to amplicon 13. Some barcodes have a visible dip in the coverage at the left end of this amplicon; this difference in coverage is caused by reads originating from sub-genomic RNAs corresponding to the gene S. Similar plots for additional runs are shown in S2 Fig.

More »

Expand

Table 2.

Percentage of sub-genomic RNAs out of reads that align to the SARS-CoV-2 genome and can be demultiplexed were considered.

More »

Expand

Fig 5.

Coverage distribution in different sequencing runs.

For each barcode, coverage by reads passing the Artic filter was computed along the genome (shown in Fig 4 and S2 Fig) and the distribution of the coverage values was summarized as a violin plot (blue), cropped at coverage 1000. Orange dots represent median coverage and green dots 10th percentile (approx. 3,000 bases of the genome have coverage below the green dot value). In all runs, an initial portion containing on average 40-Mbp of sequencing data per barcode was used.

More »

Expand

Fig 6.

Genome coverage by long amplicons.

Average coverage along the genome for seven runs with 2-kb amplicons (batches UKBA-2,3,4,6,10,11,12) and two runs with 2.5-kb amplicons (UKBA-19 with the original primer set and UKBA-21 with the last primer pair replaced by its counterpart from the 2-kb scheme). Each line depicts the average coverage over all samples in a run at the time point when 40-Mbp per sample was sequenced on average. Medians of 50-bp windows are shown for smoothing. Note a drop-out in the amplicon 13 (2-kb scheme) which covers a 3’ end of orf1b and about a third of the S gene including the region associated with mutations in Variants of Concern such as B.1.1.7.

More »

Expand