The SARS-CoV-2 responsible for the ongoing COVID pandemic reveals particular evolutionary dynamics and an extensive polymorphism, mainly in Spike gene. Monitoring the S gene mutations is crucial for successful controlling measures and detecting variants that can evade vaccine immunity. Even after the costs reduction resulting from the pandemic, the new generation sequencing methodologies remain unavailable to a large number of scientific groups. Therefore, to support the urgent surveillance of SARS-CoV-2 S gene, this work describes a new feasible protocol for complete nucleotide sequencing of the S gene using the Sanger technique. Such a methodology could be easily adopted by any laboratory with experience in sequencing, adding to effective surveillance of SARS-CoV-2 spreading and evolution.
Citation: Salles TS, Cavalcanti AC, da Costa FB, Dias VZ, de Souza LM, de Meneses MDF, et al. (2022) Genomic surveillance of SARS-CoV-2 Spike gene by sanger sequencing. PLoS ONE 17(1): e0262170. https://doi.org/10.1371/journal.pone.0262170
Editor: Etsuro Ito, Waseda University: Waseda Daigaku, JAPAN
Received: October 27, 2021; Accepted: December 16, 2021; Published: January 20, 2022
Copyright: © 2022 Salles et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: This methodology covered 100% of the S gene sequenced (3,822 pb). The sequences obtained were deposited at GISAID (numbers EPI_ISL_4496739, EPI_ISL_4497141, EPI_ISL_4497286) and GenBank (numbers OM064632, OM064633, OM064634).
Funding: This work was material supported by Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro-FAPERJ [grant number E-26/201.840/2017] (RCA) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) – Public Notice Number 09/2020 - Prevention and Combat against Outbreaks, Endemics, Epidemics and Pandemics. Process number 223038.014313/2020-19 (TSS and FBC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
The SARS-CoV-2 responsible for atypical pneumonia, evidenced in China by the end of 2019, was classified into the severe acute respiratory syndrome-related coronaviruses, member of Betacoronavirus genus, Coronaviridae family, been denominated Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2).
Coronaviruses are enveloped positive single-strand RNA viruses, with 30,000 bases in length, being the largest RNA genome identified up to date . The SARS-CoV-2 genome has several ORFs; the first ORF1a/b stands at the RNA 5’ end and translates the non-structural proteins (nsP1 –nsP16). The RNA 3’ end holds the genes of the four structural (E, M, N e S) and accessories proteins. In the mature virus particle, protein S, a homo-trimeric type I fusion glycoprotein, is located on the surface of the virus particle and is responsible for binding to the cell receptor. In humans, the angiotensin-converting molecule (ACE-2) was assigned as the primary receptor for SARS-CoV2.
Several research groups have solved the complete structure of the SARS-CoV-2 S protein attached or not to the receptor ACE-2 . This protein has approximately 1,273 amino acids, and its domains are delimited. Due to the relevance for virus attachment and entrance at susceptible cells, mutations in the receptor-binding domain (RDB) receive greater attention. In addition, mutations at other domains, like the amino (N) -terminal domain (NTD), can also lead to conformational changes in S protein structure and impact their function .
SARS-CoV-2 has particular evolutionary dynamics, and an extensive polymorphism is observed. However, the frequency of mutation across the SARS-CoV-2 genome is not uniform. Polymorphism (SNP) is mainly observed in protein S, RNA polymerase, RNA primase, and nucleoprotein . According to the World Health Organization (WHO), isolates those present changes in amino acids that lead to suspected or confirmed cases with a phenotypic impact are considered variants of interest (VOI). Furthermore, these variants are classified as a concern (VOC) when they are associated with increased transmissibility, virulence, changes in the clinical presentation of COVID-19, and reduced containment measures, such as escaping diagnostic tools decreasing the effectiveness of vaccines and therapies .
Since the S protein is the primary target of neutralizing antibodies, monitoring insertions, deletions, or substitutions of amino acids can reveal variants with the potential to evade vaccine immunity. In this context, genomic information is quickly shared through initiatives like the GISAID platform, and variants are counted and georeferenced . Up to Jun 2021, five variants were classified as a concern (VOC), named; B.1.1.7 (Alpha), B.1.351(Beta), P.1(Gamma), B.1.617+ (Delta), first detected in the United Kingdom, South Africa, Brazil, and India, respectively (Fig 1).
Highlighted are the S gene and the main mutations described in the variants of concern. As a measure of comparison, the length of the S gene is already equivalent to the one of the whole Dengue virus genome.
Early identification of the variants of concern (VOC) could provide excellent auxiliary information to decision making, allowing an earlier action towards measures to refrain the spreading of the virus such as reinforcement of mobility restriction or relaxation of such measures in areas where the variants are no present. Fig 2 shows a great variety in COVID-19 lethality in the different countries around the world. Unfortunately, due to the imposing genome size of the SARS-CoV-2, economic and laboratory challenges are manifest when monitoring the evolution of this virus. Fig 3 exhibits the significant disparity in the genome shared distribution per country.
We made use of the GISAID platform’s data to estimate COVID lethality and genome sharing per 105 inhabitants. We obtained the geospatial data for plotting the map in the open-source software library written for the Python programming language, Geopandas. The areas in grey are without reported data.
The areas in grey are without reported data. We have normalized the data to compare countries of different population sizes. The geospatial data for plotting the map was obtained in the open-source software library written for the Python programming language, Geopandas (source: GISAID platform).
Despite the reduction in the costs of new generation sequencing (NGS), the implementation of this system still requires a significant financial contribution, and the price per sample remains high for developing countries. The discrepancy in the number of sequences deposited in databases between countries reflects the difficulties of sequencing, as also shown in Table 1.
Unlike NGS methodologies, nucleotide sequencing based on the Sanger technique is widespread worldwide. In addition, the costs for sequencing small fragments are affordable. Therefore, to support the urgent surveillance of changes in SARS-CoV-2 S gene, this work describes a feasible protocol for complete nucleotide sequencing of the S gene using the Sanger technique. Thus, any laboratory with experience in sequencing can adopt this protocol.
Materials & methods
Ethics and study population
This work was previously approved by the Ethics Committee of Clementino Fraga Filho University Hospital (HUCFF/UFRJ) (number: 4.546.307). To evaluate this study, three samples from patients of confirmed COVID-19 presenting high viral load (Ct value < 20) were randomly selected. Patients were admitted to different hospitals in Rio de Janeiro, and a nasopharyngeal swab was collected to confirm clinical diagnosis by Rio de Janeiro Public Health Reference Laboratory—LACEN-RJ. Human samples were used after the conclusion of the diagnostic investigation. All patients’ personal information was anonymized, only the municipalities of residence were disclosed. Therefore, the ethics committee waived the requirement for informed consent from patients.
According to the manufacturer’s instructions, the commercial kit MagMax Viral Pathogen (Thermo fisher, EUA) was used in the automated equipment King Fisher Apex (Thermo Fisher, EUA) to obtain the viral RNA from 200uL of respiratory secretion samples collected in nasopharyngeal swabs.
RT-qPCR for detection of SARS-CoV-2
The suspected samples of COVID-19 were tested in the diagnostic routine of the Noel Nutels Central Public Health Laboratory (LACEN-RJ) using the SARS-CoV-2 Duplex Kit (E/RP), Biomaguinhos (Fiocruz, Brasil). The reactions were performed using the QuantStudio 5 (Applied Biosystems, Thermo Fisher, EUA). The samples with ct values below 20 were selected for sequencing.
Amplification of the S protein gene
Six sets of primers targeting the S segment and two sets flanking it were designed based on the sequences deposited in GISAID until September 2020. 29 samples from 13 regions were aligned, and conserved regions were chosen using Clustal W program. An overlap of 100 nucleotides was programmed (Fig 4). Table 2 presents the sequence of primers used. Standard RT-PCR was performed using Superscript III one-step RT-PCR kit (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s instructions, with 0.7 μM primers and temperature conditions according to Table 3.
Amplification of S gene is visualized in agarose Gel Electrophoresis (A). Schematic representation of the targeted fragments of each set of primers is shown in (B). PR1 represents primer set 1, PR2 primer set 2, PR3 primer set 3, PR4 primer set 4, PR5 primer set 5, PR6 primer set 6, PR7 primer set 7 and PR8 primer set 8.
The amplification of the fragments was visualized by 1,5% Agarose Gel Electrophoresis. The samples were quantified with the nanodrop one (Thermo Scientific™ NanoDrop™ One Microvolume UV-Vis Spectrophotometers) for sequencing.
Nucleotide sequence determination and analysis
The nucleotide sequences were determined from 200 ng of the amplicon, using the Big Dye Terminator kit 3.1 (Applied Biosystems), following the manufacturer’s procedure. Amplicons were sequenced in the ABI 3730 genetic analyzer (Applied Biosystems, USA) following the manufacturer’s protocol. Raw sequence data were aligned, edited, assembled using the BioEdit Sequence Alignment Editor, Version 18.104.22.168.
Results and discussion
This methodology covered 100% of the S gene sequenced (3,822 pb). The sequences obtained were deposited at GISAID numbers EPI_ISL_4496739, EPI_ISL_4497141, EPI_ISL_4497286.
All the eight primers set produced single amplicons for the three samples used to evaluate this protocol (Fig 4A); therefore, sequencing reaction could be performed without extracting the bands from agarose gel. In addition, no mismatch in the primer regions that could lead to the escape of known VOCs was observed (S2 File).
The samples sequenced in this study originated from Rio de Janeiro City, Santo Antônio de Pádua and Seropédica, in Rio de Janeiro state. The obtained sequences were aligned with reference sequences of each VOC, in order to detect and compare mutations. Spike protein from Rio de Janeiro city sample displayed the same amino acid changes found in reference sequence of Gamma variant, suggesting that this sample is probably classified into P.1 lineage (Table 4). According to the literature, the P.1 lineage (gamma) emerged in Manaus, Amazonas, evolved from a B.1.1.28 clade in late November 2020 and replaced its parental lineage in less than two months. We found a strain displaying similar spike protein with that of P.1 lineage circulating in Rio de Janeiro as early as February 2021.
The samples from Santo Antônio de Pádua and Seropédica didn’t show similar mutation patterns with gamma VOC (Table 4), however, they presented some mutations of importance, like E484K and D614G (Fig 5). The change from glutamic acid to a lysine in the 484th amino acid position of the Spike protein (E484K) already occurred 228,871 times (4.27% of all samples with spike sequence) in 166 countries, according to GISAID Spike Glycoprotein Mutation Surveillance. This mutation has been reported in the literature to be related to enhanced host receptor binding  and antigenic drift  either alone or in association with other mutations . The mutation D614G is widely spread and has already occurred 5,285,437 times (98.51% of all samples with Spike sequence) in 204 countries. It was reported to be related to the increase in infectivity of SARS-CoV-2, higher viral loads, increased replication fitness, and virulence [11,12].
Apart from the mutations of high importance, the sequence from Seropédica also presented some rare mutations. The amino acid substitutions D775V, T866P and M869K are present in less than three sequences in GISAID database. The effects of these mutations are still unknown.
Due to its essential role in establishing infection, as well as inducing immune response, the genomic surveillance of the S protein of SARS-CoV-2 is of paramount importance. Monitoring the emergence of new variants, and the interactions between their mutations, allow the scientific community to develop better strategies to control the pandemic.
The count of genomic sequences obtained in each country reveals a vast disproportion that becomes evident in surveillance platforms like GISAID. One of the reasons for this disparity is the limited access to NGS methodologies by most groups. Therefore, this work describes a protocol for complete nucleotide sequencing of the S gene using the Sanger technique, which could be helpful to keep tracking SARS-CoV-2 protein S evolution.
S1 File. The PDF of the protocol described in this peer-reviewed article published on protocols.io dx.doi.org/10.17504/protocols.io.bx6kprcw.
S2 File. Map of primer pairs in VOCs sequences.
We want to thank all health professionals, especially LACEN-RJ staff, for their collaboration during the implementation of this protocol and for all efforts in facing the COVID-19 pandemic.
- 1. Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, et al. A Novel Coronavirus from Patients with Pneumonia in China, 2019. The New England journal of medicine. 2020;382: 727–733. pmid:31978945
- 2. Wrapp D, Wang N, Corbett KS, Goldsmith JA, Hsieh C-L, Abiona O, et al. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science (New York, NY). 2020;367: 1260–1263. pmid:32075877
- 3. Resende PC, Delatorre E, Gräf T, Mir D, Motta FC, Appolinario LR, et al. Evolutionary Dynamics and Dissemination Pattern of the SARS-CoV-2 Lineage B.1.1.33 During the Early Pandemic Phase in Brazil. Frontiers in Microbiology. 2021;11: 3565. pmid:33679622
- 4. Yin C. Genotyping coronavirus SARS-CoV-2: methods and implications. Genomics. 2020;112: 3588–3596. pmid:32353474
- 5. WHO. World Health Organization: COVID-19 weekly epidemiological update, 25 February 2021. Special ed. 2021. https://apps.who.int/iris/handle/10665/339859.
- 6. Elbe S, Buckland-Merrett G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Global challenges (Hoboken, NJ). 2017;1: 33–46. pmid:31565258
- 7. Naveca FG, Nascimento V, de Souza VC, Corado A de L, Nascimento F, Silva G, et al. COVID-19 in Amazonas, Brazil, was driven by the persistence of endemic lineages and P.1 emergence. Nature medicine. 2021;27: 1230–1238. pmid:34035535
- 8. Starr TN, Greaney AJ, Hilton SK, Ellis D, Crawford KHD, Dingens AS, et al. Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding. Cell. 2020;182: 1295–1310.e20. pmid:32841599
- 9. Ho D, Wang P, Liu L, Iketani S, Luo Y, Guo Y, et al. Increased Resistance of SARS-CoV-2 Variants B.1.351 and B.1.1.7 to Antibody Neutralization. Research square. 2021. pmid:33532763
- 10. Wibmer CK, Ayres F, Hermanus T, Madzivhandila M, Kgagudi P, Oosthuysen B, et al. SARS-CoV-2 501Y.V2 escapes neutralization by South African COVID-19 donor plasma. bioRxiv: the preprint server for biology. 2021.
- 11. Korber B, Fischer WM, Gnanakaran S, Yoon H, Theiler J, Abfalterer W, et al. Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus. Cell. 2020;182: 812–827.e19. pmid:32697968
- 12. Hou YJ, Chiba S, Halfmann P, Ehre C, Kuroda M, Dinnon KH 3rd, et al. SARS-CoV-2 D614G variant exhibits efficient replication ex vivo and transmission in vivo. Science (New York, NY). 2020;370: 1464–1468. pmid:33184236