Oligonucleotide capture sequencing of the SARS-CoV-2 genome and subgenomic fragments from COVID-19 individuals

doi:10.1371/journal.pone.0244468

Fig 1.

a) Schematic workflow b) Capture pools. Presented in the workflow are the different steps involved in the SARS-CoV-2 capture and sequencing methodology. Fig 1A (first row)—RNA is isolated from mid-turbinate nasal swab samples followed by Real-Time RT-PCR to detect SARS-CoV-2. Positive samples are quantified, and RNA is converted to cDNA. Fig 1A (second row)–The cDNA is used to generate Illumina libraries with molecular barcodes and these libraries are pooled based on the Ct. values into 6 pools and enriched using the SARS-CoV-2 capture probes. Additionally, Wuhan and UK strain (B.1.1.7) SARS-CoV-2 synthetic controls were hybridized as a pool or with patient samples as shown in Fig 1B. Calculated Ct values of the controls were also included. Enriched libraries were then sequenced on the Illumina NovaSeq 6000 instrument to generate 2x150 bp length reads. Data was analyzed to reconstruct genomes, identify variants and junction reads.

More »

Expand

Fig 2.

Sequence data.

Ct value vs percent raw sequencing reads mapped to SARS-CoV-2 in (a) Capture enriched samples; (b) Pre-capture samples; (c) Positive and Negative controls. Percentage of reads mapped to the ‘SARS-CoV-2’ genome, to the ‘human’ reference genome and a third category called ‘reads others’, which is the combined total of trimmed reads and reads that do not fall under the two other categories are plotted in this figure. Ct values in bold indicate samples that provided full-length genome assemblies.

More »

Expand

Fig 3.

Scatter plot showing genome completeness as a function of Ct value.

Pink circles represent post-capture samples and black asterisks represent pre-capture samples.

More »

Expand

Fig 4.

Schematic representation of 192000051B assembly.

Black bars represent loci where the assembly called alleles different from the NCBI reference sequence NC_045512. Green bars represent mixed loci where both reference and alternative alleles were called. All mixed loci are in the ORF1ab gene, and are listed in the table, along with the frequency of the alternate allele at the position, and the predicted effect in translation.

More »

Expand

Fig 5.

SARS-CoV-2 subgenomic mRNAs.

(a) Junction read quantification per gene (log transformed), estimated using one million reads/sample from five pre-capture and 17 capture samples. Samples chosen for this analysis have above 95% genome completeness. The coverage level per sample is shown below the gene heatmap. Samples in bold denote same sample sequenced as pre-capture and capture. (b) ORF read coverage shown as normalized read counts (RPKM) per gene for 17 capture samples. The boxes represent the first quartile to the third quartile of the normalized read counts for each ORF. The horizontal lines in the boxes represent the median values, and the whiskers represent the variability outside the lower and upper quantiles.

More »

Expand