A proposal of alternative primers for the ARTIC Network’s multiplex PCR to improve coverage of SARS-CoV-2 genome sequencing

A group of biologists, ARTIC Network, has proposed a multiplexed PCR primer set for whole-genome analysis of the novel coronavirus, SARS-CoV-2, soon after the start of COVID-19 epidemics was realized. The primer set was adapted by many researchers worldwide and has already contributed to the high-quality and prompt genome epidemiology of this rapidly spreading viral disease. We have also seen the great performance of their primer set and protocol; the primer set amplifies all desired 98 PCR amplicons with fairly small amplification bias from clinical samples with relatively high viral load. However, we also observed an acute drop of reads derived from some amplicons especially amplicon 18 and 76 in “pool 2” as a sample’s viral load decreases. We suspected this low coverage issue was due to dimer formation between primers targeting those two amplicons. Indeed, replacement of just one of those primers, nCoV-2019_76_RIGHT, to a newly designed primer resulted in a drastic improvement of coverages at both regions targeted by the amplicons 18 and 76. Given this result, we further replaced four primers in “pool 1” with each respective alternative. These modifications also improved coverage in eight amplicons particularly in samples with low viral load. The results of our experiments clearly indicate that primer dimer formation is one critical cause of coverage bias in ARTIC protocol. Importantly, some of the problematic primers are detectable by observing primer dimers in raw NGS sequence reads and replacing them with alternatives as shown in this study. We expect a continuous improvement of the ARTIC primer set will extend the limit for completion of SARS-CoV-2 genomes to samples with lower viral load, that supports better genomic epidemiology and mitigation of spread of this pathogen.

the medical community around the world. In modern epidemiology, it is important to capture variations in genome sequence among isolates of such outbreaking pathogens for monitoring pathogen's evolution or tracking epidemiological chains in local to even global scale.
Relatively large genome of the corona virus (approx. 30 kb), however, makes it challenging to reconstruct whole genome of the virus from samples with various viral loads in costeffective manner. Recently, a group of molecular biologists which is called ARTIC Network

Results
In ARTIC primes set, the PCR amplicons, 18 and 76, are amplified by the primer pairs nCoV-2019_18_LEFT and nCoV-2019_RIGHT and nCoV-2019_76_LEFT and nCoV-2019_RIGHT, respectively, which are included in the same multiplexed reaction "pool_2". We noticed that two of those primers, nCoV-2019_18_LEFT and nCoV-2019_76_RIGHT were perfectly complement to each other by their 10-nt sequence at the 3'-end (Fig 1). From this observation, we reasoned that the rapid decrease of the amplification efficiencies of those amplicons was due to a primer dimer formation between nCoV-2019_18_LEFT and nCoV-2019_76_RIGHT, that could compete to the amplification of desired targets. Indeed, we observed many NGS reads derived from the predicted dimer in raw FASTQ data (data not shown).
Then, we replaced one of those 'unlucky' primer pair, nCoV-2019_76_RIGHT, in the pool_2 to a newly designed nCoV-2019_76_RIGHTv2 (Table 1) which locates at 48-nt downstream to nCoV-2019_76_RIGHT.   The above result indicated that formation of primer dimers plays critical role in coverage bias in samples with low viral load. Given this observation, we detected additional six primer dimers (Fig 3) from raw NGS read data. The primers involved in these dimers were all included in the pool 1. Interestingly, the eight amplicons related to those primers (7, 9, 15, 21, 29, 45, 73 and 89) consistently showed relative low depth (Fig 2A and Fig S1, highlighted by red strips).  Fig 2A) is shown in Fig 2B. Although using 35 PCR cycles resulted in better balanced yield of PCR product among samples with various viral loads compared to that with 30 cycles (Fig 4), which eases multiplexing in NGS library construction, this increase in number of PCR cycles enlarged coverage bias overall regardless when original or new primer sets was used (Fig 2AB and Fig S1). Nevertheless, improvement of coverages in the eight amplicons were seen with modified primer set ( Fig 2B and Fig S1, highlighted by red strips).

Discussion
Since we published the first version of this manuscript which described about the modification of nCoV-2019_76_RIGHT primer, the ARTIC Network group has updated their primer set to 'V2' to cope with the dropout of the amplicon 18 and 76 (https://github.com/articnetwork/artic-ncov2019/tree/master/primer_schemes/nCoV-2019/V2). Although the modification was done on the primer 2019_18_LEFT instead of 2019_76_RIGHT, their V2 primer set is likewise expected to improve the coverage of amplicon 18 and 76 as seen this study.
In this version of manuscript, we added result of further modification in four primers contained in pool_1 to prevent six dimer formations. This modification entangled interference among batch of primers which involved in as many as eight PCR targets. With this modified primer set, one could recover more genomic region particularly from samples with low viral loads (>30) with smaller sequencing effort.
The result of these experiments clearly indicates that primer dimer formation is one critical cause of coverage bias in ARTIC protocol. Importantly, some of those problems can be fixed easily by observing primer dimers in raw NGS sequence reads and replacing them with alternatives as shown in this study.

Materials and Methods
RNA extracted from clinical specimens (pharyngeal swabs) are reverse transcribed as  Obtained reads were mapped to the reference genome of SARS-CoV-2 MN908947.3 (Wu et al., 2020) by using bwa mem (Li and Durbin, 2009). To estimate the coverage of each PCR products, we counted depth of genomic parts only specific to each PCR product (Fig 4) using samtools depth function (Li and Handsaker et al., 2009) with '-a' option. The depth counts were summarized and visualized using the python3.6 and matplotlib library (Hunter 2007).

Supportive figure legend
Fig S1 Depth plots on the SARS-CoV2 genome for mapped NGS reads obtained from ARTIC protocol for all eight clinical samples. Both results with the original or alternative pool_2 primer set or with the original or alternative pool_1 and pool_2 primer sets are shown. It should be noted that the experiment was conducted for 4-fold diluted cDNA sample than our usual protocol. The green and red vertical strips highlight regions of amplicons potentially affected by dimer formation among primers in pool_1 or pool_2, respectively.