Development of an efficient Sanger sequencing-based assay for detecting SARS-CoV-2 spike mutations

Novel strains of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) harboring nucleotide changes (mutations) in the spike gene have emerged and are spreading rapidly. These mutations are associated with SARS-CoV-2 transmissibility, virulence, or resistance to some neutralizing antibodies. Thus, the accurate detection of spike mutants is crucial for controlling SARS-CoV-2 transmission and identifying neutralizing antibody-resistance caused by amino acid changes in the receptor-binding domain. Here, we developed five SARS-CoV-2 spike gene primer pairs (5-SSG primer assay; 69S, 144S, 417S, 484S, and 570S) and verified their ability to detect nine key spike mutations (ΔH69/V70, T95I, G142D, ΔY144, K417T/N, L452R, E484K/Q, N501Y, and H655Y) using a Sanger sequencing-based assay. The 5-SSG primer assay showed 100% specificity and a conservative limit of detection with a median tissue culture infective dose (TCID50) values of 1.4 × 102 TCID50/mL. The accuracy of the 5-SSG primer assay was confirmed by next generation sequencing. The results of these two approaches showed 100% consistency. Taken together, the ability of the 5-SSG primer assay to accurately detect key SARS-CoV-2 spike mutants is reliable. Thus, it is a useful tool for detecting SARS-CoV-2 spike gene mutants in a clinical setting, thereby helping to improve the management of patients with COVID-19.


Introduction
The World Health Organization (WHO) declared the coronavirus disease (COVID- 19), which is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), as a global pandemic on March 11, 2020 [1]. SARS-CoV-2 is a highly transmissible virus and has a long incubation time before the manifestation of symptoms, such as fever, cough, shortness of breath, and diarrhea [2]. SARS-CoV-2 has a single-stranded, positive-sense RNA genome of a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 approximately 29.9 kb, which encodes several proteins, including the structural proteins, and spike (S) [3,4]. The S gene encodes the S1 and S2 subunits, and the S1 subunit contains an N-terminal domain and a receptor-binding domain, the latter of which is associated with human infections [5]. Across its genome, the virus accumulates mutations that are associated with its transmissibility, virulence, or resistance to some neutralizing antibodies [6,7].
SARS-CoV-2 variants have been recently identified, raising concerns abouta subsequent wave of the pandemic. Since the emergence of multiple variants, the WHO and the Centers for Disease Control and Prevention (CDC) have set up a classification scheme for monitoring the potential impact of emerging variants. The variants are classified into variants of interest (VOIs), variants of concern (VOCs), and variants of high consequence (VOHCs) [8,9] [10], have been associated with changes in receptor binding, reduced neutralization by antibodies generated against previous infection or vaccination, reduced efficacy of treatments, potential diagnostic effects, or a predicted increase in transmissibility or disease severity. VOCs, including alpha (B.1.1.7), beta (B.1.351), delta (B.1.617.2), and gamma (P.1) [10], have been associated with an increase in transmissibility and disease severity, a significant reduction in neutralization by antibodies generated during previous infection or vaccination, reduced effectiveness of treatments or vaccines, or diagnostic detection failures. The D614G substitution is the most prevalent mutation observed [11,12]. However, VOHC-related lineages have not yet been classified [10].
These lineages have been analyzed using next generation sequencing (NGS) methods and classified using Phylogenetic Assignment of Named Global Outbreak (PANGO) lineages [13]. The use of NGS has been very useful for obtaining accurate information on genetic variability and transmission [14]. However, as outbreaks occur sporadically and cannot be predicted, it is not always possible to have all resources required to perform the tests necessary to detect SARS-CoV-2 variants, especially in resource-limited settings [15]. To overcome this limitation, both PCR and Sanger sequencing have been applied [16,17].
Here, we aimed to accurately and rapidly detect nine key S mutations (ΔH69/V70, T95I, G142D, ΔY144, K417T/N, L452R, E484K/Q, N501Y, and H655Y) in strains classified as VOCs and/or VOIs using our laboratory-developed five SARS-CoV-2 S gene (5-SSG) primers via PCR assay in conjunction with Sanger sequencing. In addition, we compared the results of our assay with those of a commercially available NGS assay to evaluate its accuracy and reliability in detecting and identifying variants.

Strain information and cultivation
To determine analytical specificity, 67 strains of viruses, bacteria, and fungi were used with or without respiratory pathogens, including 42 strains of virus (24 strains of SARS-CoV-2, Coronavirus OC43 and 229E, and 18 other viruses), 19 strains of bacteria, and six strains of fungi (Tables 2 and S2 Table 2.

Clinical specimen collection and storage
As part of the routine procedure using the Allplex™ SARS-CoV-2 assay for SARS-CoV-2 testing (Seegene Inc., Seoul, Republic of Korea), anonymized residual of 17 SARS-CoV-2 positive nasopharyngeal swab specimens of patients diagnosed with SARS-CoV-2 positive between February and June 2021 were obtained and used for this study. All samples were processed using an automated nucleic acid extraction system, namely MagNA Pure 96 (Roche, Basel, Switzerland), according to the manufacturer's protocol, and stored at −80˚C until use [23].

One-step RT-PCR and agarose gel electrophoresis
The template (2.5 ng) from viral, bacterial, and fungal strains was added for one-step RT-PCR (Nanohelix Co., Daejeon, Republic of Korea) analysis, which was performed using a SeeAmp (Seegene) instrument. PCR assays with the 5-SSG primers were performed using the following thermal cycling conditions: 45˚C for 15 min (reverse transcription), followed by 94˚C for 15 min (initial denaturation), and 45 cycles of 94˚C for 10 s (denaturation), 64˚C for 30 s (annealing), and 72˚C for 30 s (extension). A final extension step was conducted at 72˚C for 5 min. Next, the PCR products were analyzed using 2% agarose gel electrophoresis with 0.5× TBE buffer, and the gels were stained with ethidium bromide (Biosesang, Seongnam, Republic of Korea). PCR amplicons from the 67 samples were analyzed using agarose gel electrophoresis in a horizontal unit (CBS Scientific, San Diego, CA, USA) operating at 280 V for 28 min, and the band sizes on ethidium bromide-stained gels were quantified using a Gel-Doc XR+ system (Bio-Rad Laboratories, Hercules, CA, USA).

PCR product purification and sequence analysis
All PCR-positive products were purified with MEGAquick-spin™ plus (iNtRON Biotechnology, Seongnam, Republic of Korea), according to the manufacturer's instructions [25]. The sequence analysis of PCR products (partial S gene amplified to~800 bp) was performed using the 5-SSG primers (5 0 tagged M13 primer) and the BigDye Terminator v3.1 cycle sequencing kit reagent (Applied Biosystems, Foster City, CA, USA). The sequence analysis conditions were as follows: 96˚C for 1 min (incubation), followed by 25 cycles of 96˚C for 10 s (denaturation), 50˚C for 5 s (annealing), and 60˚C for 4 min (extension). Dye-labeled products were analyzed using an ABI 3730 sequencer (Applied Biosystems). Sequencing chromatograms were analyzed manually using Variant Reporter™ v3.0 software (Applied Biosystems). Samples were classified as mutants if the sequencing results from the specific regions matched those of lineage information [26].

NGS and data analysis
NGS was performed using the SARS-CoV-2 FLEX Panels (Paragon Genomics, Hayward, CA, USA) and an Illumina MiSeq platform (Illumina, San Diego, CA, USA) in accordance with the manufacturer's instructions [27]. Reverse transcription was performed using 55 ng of nucleic acid, and multiplex PCR was performed using 343 pairs of primers. A second PCR was conducted using CleanPlex Dual-Indexed PCR Primers for Illumina1 Set A (Paragon Genomics).

Ethics statement
Ethical aspects of this study were reviewed and approved by the Seegene Medical Foundation Institutional Review Board (approval number, SMF-IRB-2021-006), provided that after conducting the laboratory diagnoses of SARS-CoV-2 testing, the remaining samples be destroyed. All data were fully anonymized administrative data without patient identifiers, and patient consent was waived by the institutional review board.

PCR efficiency and 5-SSG primers performance analysis
The analytical performance of the 5-SSG primers was confirmed using a total of 67 strains, including viruses, bacteria, and fungi. The PCR results were determined to be positive or negative based on the expected PCR product sizes (Tables 1, 2 and S2). As shown in Tables 2 and S2, the 5-SSG primer pairs achieved consistent results for twenty-two strains of SARS-CoV-2, whereas a negative result was obtained for the remaining 45 stains (other viruses, bacteria, and fungi).

Sanger sequencing analysis
As shown in Table 2, the nucleotide sequences of the positive PCR products obtained from 22 strains were compared with the existing S mutations through the Sanger Sequencing method using the M13 primer. The key deletion mutations ΔH69/V70 and ΔY144 were found in four strains (Twistbio-601443, Twistbio-7105258, NCCP-43381, and NCCP-43386). In addition, other substitution mutations were found to be 100% consistent with those in each strain's corresponding lineage, except for two cases in which substitutions at T95I for the NCCP-43390 strain and W152C for the NCCP-43384 strain were mismatched in the CDC classification (Figs 2 and S1, and S1 Table). Overall, it was confirmed through Sanger sequencing that the 5-SSG primers can detect predominant S gene mutations of SARS-CoV-2 observed in the major mutant strain categories, VOIs and VOCs, with high sensitivity and efficiency.

Validation of clinical sample variants using the 5-SSG primers
To confirm the detection accuracy of the 5-SSG primer assay using clinical samples, Sanger sequencing results were compared with those of NGS analysis ( Table 5). The results of VOCs  (Table 5). NGS assays and Sanger sequencing using the 5-SSG primers showed 100% consistent results for all strains. We concluded that the 5-SSG primer assay also had a very efficient performance with clinical samples.

Discussion
In the ongoing COVID-19 pandemic, it has been demonstrated that the rapid detection of the pathogen is critical to prevent the rampant spread of the disease [30]. The emergence of SARS-CoV-2 variants, which are associated with increased transmission, disease severity, and resistance to vaccines, is a grave concern [31]. The alpha (B.1.1.7) and beta (B.1.351) lineages of SARS-CoV-2, which account for 98.7% of total variant cases, contain the mutations ΔH69/ V70, E484K, and N501Y [32]. S protein-based vaccines might provide less protection against these mutants (ΔH69/V70, E484K, and N501Y) of SARS-CoV-2 [33]. Therefore, a simple and rapid screening assay to monitor the emergence and spread of these variants is essential to implement public health strategies [31].
In this study, we developed primers for the rapid and accurate detection of the key mutants of the S gene of SARS-CoV-2 and evaluated the reliability and reproducibility of these primers (Tables 2 and 3). The 5-SSG primers (69S, 144S, 417S, 484S, and 570S) had high analytical specificity for SARS-CoV-2 strains and no cross-reactivity with other strains (Tables 2 and S2). Results of Sanger sequencing using 5-SSG primers and commercial NGS were in 100% agreement; however, the three approaches differed in their ability to detect the E484K and D215G variants of the beta (B.1.351) lineage, E484K of the gamma (P.1) lineage, and G142D of the delta (B.1.617.2) lineage (Tables 4 and 5). These results indicate that in NGS analysis, lowdepth levels of mutants (G142D, D215G, and E484K) are detected, because the target amplification is affected by a mutation in the reverse primer binding site (ΔE156/F157, R158G, ΔLAL242-244, and N501Y). In addition, NGS is limited to the environment in which the equipment is built, and it also takes a longer as it is more complex than typical Sanger sequencing [34]. Therefore, the Sanger sequencing-based 5-SSG primer assay system can rapidly and accurately detect key mutants of the S gene without resource constraint, and is a useful tool that can overcome the limitation of relatively low read-depth caused by mutations in primerbinding site during NGS analysis.
One limitation of this study is that the performance of the 5-SSG primers was tested using small numbers of clinical samples through Sanger sequencing and NGS analysis, and thus, further studies using a larger number of clinical samples should be performed. In addition, the current 5-SSG primer system can identify lambda (C.37) variants with the 417S primer set, but the ΔRSYLTPGD246-253N mutation affects the 144S reverse primer. Therefore, improvements in primer performance for detection of additional variants (e.g. B. 1.617.3 and B. 1.621) and the development of new primers should be pursued in future studies.
Collectively, the 5-SSG primer assay system has high PCR sensitivity specifically for SARS-CoV-2 and is a useful tool that can detect various S gene mutants very quickly and accurately, thereby contributing to the faster control of pathogen transmission in the population.