Molecular epidemiology of SARS-CoV-2 in Cyprus

Whole genome sequencing of viral specimens following molecular diagnosis is a powerful analytical tool of molecular epidemiology that can critically assist in resolving chains of transmission, identifying of new variants or assessing pathogen evolution and allows a real-time view into the dynamics of a pandemic. In Cyprus, the first two cases of COVID-19 were identified on March 9, 2020 and since then 33,567 confirmed cases and 230 deaths were documented. In this study, viral whole genome sequencing was performed on 133 SARS-CoV-2 positive samples collected between March 2020 and January 2021. Phylogenetic analysis was conducted to evaluate the genomic diversity of circulating SARS-CoV-2 lineages in Cyprus. 15 different lineages were identified that clustered into three groups associated with the spring, summer and autumn/winter wave of SARS-CoV-2 incidence in Cyprus, respectively. The majority of the Cypriot samples belonged to the B.1.258 lineage first detected in September that spread rapidly and largely dominated the autumn/winter wave with a peak prevalence of 86% during the months of November and December. The B.1.1.7 UK variant (VOC-202012/01) was identified for the first time at the end of December and spread rapidly reaching 37% prevalence within one month. Overall, we describe the changing pattern of circulating SARS-CoV-2 lineages in Cyprus since the beginning of the pandemic until the end of January 2021. These findings highlight the role of importation of new variants through travel towards the emergence of successive waves of incidence in Cyprus and demonstrate the importance of genomic surveillance in determining viral genetic diversity and the timely identification of new variants for guiding public health intervention measures.


Introduction
Corona Virus Disease-2019 (COVID-19) is a pulmonary disease caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) coronavirus [1]. SARS-CoV-2 was first detected in December 2019 in Wuhan city in the Hubei province of China in a patient with acute pneumonia [2,3]. Following global spread of the virus, on March 11, 2020 the World Health Organization (WHO) characterized COVID-19 as a pandemic, which is still ongoing and as of February 24, 2021 over 110 million confirmed cases and 2.47 million deaths were reported worldwide (https://covid19.who.int/).
The complete genome of SARS-CoV-2 isolated from a patient in Wuhan, China was initial published on Jan 5, 2020 [2] and since then analysis of viral sequences worldwide is continuous with more than 614,000 complete genomes currently available in public databases such as the GISAID (https://www.gisaid.org/). The global real-time tracing of the viral spread through whole genome sequencing is important in the race to timely identify the emergence of novel SARS-CoV-2 variants that change the transmission, antigenic properties and/or pathogenicity of the virus [4]. The recent identification of three major Variants of Concern (VOC), with increased transmissibility and the potential to reduce vaccine effectiveness has led to increased surveillance efforts worldwide [5,6].
The SARS-CoV-2 genome consists of a single-stranded positive-sense RNA of around 30 kb in size with a 5' cap and 3'-polyA tail. It contains of six major open reading frames (ORFs) that encodes 27 different proteins, in which four are structural proteins: Envelope (E), Membrane (M), Nucleocapsid (N) and Spike (S). The E protein function includes virion assembly and morphogenesis and regulates cell stress response and apoptosis by promoting inflammation [7]. The M protein plays a key role in the virion assembly within the host cells [8], while the N protein, can bind to the RNA genome, interact with the viral membrane protein, and play a critical role in enhancing the efficiency of virus replication, transcription and assembly throughout viral infection [9]. The most important factor that mediates virus entry and a primary determinant of cell tropism and pathogenesis of SARS-CoV-2 is the S protein [10]. S protein has two functional subunits, S1 and S2. S1 subunit contains the receptor-binding domain (RBD) which binds directly to the host receptor angiotensin-converting enzyme 2 (ACE2), enabling virus entry into host cells [11]. The S2 subunit is responsible for virus cell fusion [12]. Therefore, the S protein defines the infectivity of the virus and its transmissibility in the host cell [13]. As this protein is the major antigen inducing protective immune responses [14,15], all vaccines developed or there are under development are directed against it.
The Republic of Cyprus is one of the countries in Europe least affected by the COVID-19 pandemic, presumably due to its rapid and effective response strategy that included high number of COVID-19 tests, effective tracing and isolation of cases and their contacts together with preventive measures (e.g. social distancing, wearing face masks and hand washing) [16]. In addition, being an island guaranteed high effectiveness of airport closure with regard to importing of new cases. The first two cases of COVID-19 in Cyprus were identified on March 9, 2020 and since then 33,567 confirmed cases and 230 deaths were documented (https:// covid19.ucy.ac.cy/).
The aim of our study was to analyse and evaluate the genomic diversity of circulating SARS-CoV-2 lineages in Cyprus. We report 133 full genome sequences obtained from individuals tested positive by RT-PCR between March 2020 and January 2021 and evaluate their characteristics and relationship with the progression of the pandemic in Cyprus. Public sharing and use of genomic analyses are important tools for disease surveillance systems to track and trace acquired COVID-19 cases for more accurate decision making and appropriate public health action.

Sample collection
The Department of Molecular Virology of Cyprus Institute of Neurology and Genetics was assigned as the reference laboratory for SARS-CoV-2 by the Cyprus Ministry of Health of the Republic of Cyprus. More than 180,000 samples from public as well as private hospitals were received and analysed since March 2020. For this retrospective observational study 133 SARS--CoV-2 positive nasopharyngeal swabs obtained from patients referred for diagnostic purposes between March 2020 and January 2021 were selected for full genome sequencing. A random, unselected approach for sample selection was taken without pre-screening for variants of interest to avoid sampling bias. The viral RNA had been previously detected using a qRT-PCR assay and all samples selected had a cycle threshold value (Ct) lower than 30. The study was approved by the Cyprus National Bioethics committee (EEBK 21.1.01.03). According to the approval, as samples were fully anonymized, the Bioethics committee waived the requirement for informed consent.

Next generation sequencing
Total RNA was extracted from 200 μl of nasopharyngeal swab fluid in a final volume of 50 μl, using the MagMAX Viral/Pathogen Nucleic Acid Isolation Kit (Applied Biosystems) on a KingFisher™ Flex Purification System (Thermo Fisher Scientific). Libraries were prepared using the QIAseq SARS-CoV-2 Primer Panel in conjunction with the QIAseq FX DNA Library Kit (Qiagen) according to manufacturer's instructions. In brief, viral RNA was reverse transcribed to synthesize cDNA using random hexamers. cDNAs were amplified in two high-fidelity multiplex PCR reactions using two different primer pools that together cover the entire SARS-CoV-2 genome. The two enriched pools per sample were combined, purified using AMPure XP beads (Beckman Coulter) and quantified using the Qubit dsDNA HS Assay kit (Invitrogen). Fragmentation, end-repair and A-tailing was performed in a combined reaction per sample using 200 ng DNA. Next, Illumina platformspecific adapters were ligated to both ends of the DNA fragments. Library size selection and purification were carried out using AMPure XP beads (Beckman Coulter) in two rounds (0.8x and 1x respectively). The libraries were quality analyzed using the Agilent High Sensitivity D1000 ScreenTape on a 2200 TapeStation system (Agilent) and quantified using the Qubit dsDNA HS Assay kit (Invitrogen). Equimolar quantities of libraries were pooled (24 samples/pool) and sequenced on a Illumina MiSeq sequencer. All sequences obtained were deposited at the GISAID EpiCov database accession numbers EPI_ISL_1164626 -EPI_ISL_1164751 (www.gisaid.org).

Phylogenetic analysis
All phylogenetic analyses were conducted in MEGA7 [22]. The alignment of consensus sequences was performed using MUSCLE. Bayesian information criterion (BIC) scores were calculated for different models to determine the best fitting nucleotide substitution model. In addition, jModelTest [23] was used for evaluating the best fitting nucleotide substitution model under BIC yielding the same result (TN93model + gamma distributed rates). This model was subsequently used to construct a Maximum Likelihood phylogenetic tree with 1000 bootstrap replicates. All positions containing alignment gaps and missing data were eliminated only in pairwise sequence comparisons.

Results and discussion
Since the beginning of the pandemic and until the 24/2/2021, the Republic of Cyprus reported 33,567 SARS-CoV-2 confirmed cases (3.63% of the population) along with 230 associated casualties (24.2/100,000) (https://covid19.ucy.ac.cy/). Based on the number of positives identified (see Fig 1) the course of the pandemic in Cyprus so far can be distinguished into three main phases: a first phase during Spring 2020, starting with the first two cases identified on the 9 th of March that peaked in the first week of April that was contained with a nationwide lockdown and lasted approximately until June; a second phase during the summer that started after the re-opening of the airports; and a third phase (which is still continuing) during autumn/winter that peaked at the beginning of January and accounted for the majority of all cases registered so far.
Overall, between March 11 th 2020 and January 29 th 2021, 15 different lineages were identified among the 133 samples sequenced ( Table 1). As shown in Fig 3 the phylogenetic analysis is in support of the PANGOLIN lineages assignment and reveals a picture of independent distinct, importations that are followed by local transmission (Fig 3). In the bubble chart in three groups of lineages are clearly distinguishable and associate with the three phases of the SARS-CoV-2 pandemic. Overall, 345 SNPs were identified within the 133 genomes when compared to the Wuhan-Hu-1 reference genome. S1  251) were identified reflecting several independent importation events followed by local transmission. However, concomitant with the decline in reported positive cases following the country-wide lockdown, by the end of June these SARS-CoV-2 variants were no longer detected and none of these re-emerged during the second or the third wave.
Following the re-opening of the airports, the summer months of July and August were characterized by multiple distinct introductions of a variety of new lineages (B. 1.36, B.1.236, B.1.2,  B.1.1.67, B.1.1.288, B.1.1.192), which marked the beginning of a small second wave (Fig 3). The majority of the samples belonged to lineage B.1.2 and B1.236 accounting for 86% of the samples in that wave, however, all these lineages again subsequently vanished. The B.1.2 lineage has been otherwise observed only infrequently in Europe, but has been very common in the United States and in Canada [24].
In September, two new lineages were identified for the first time, which were set to dominate the autumn/winter wave until December, namely the B.1.177 and the B.1.258 variants.
The B.1.177 lineage, which is characterized by the A222V spike mutation, was previously been shown to have emerged in Spain in early summer and subsequently became widespread across Europe as well as Canada, accounting for the majority of sequences by autumn 2020 [25,26]. However, no evidence of increased transmissibility of this variant was reported [27]. In Cyprus, this lineage was first identified on September 19 th and continued to circulate until the end of January at a relatively low but constant frequency fluctuating between 4% and 15%.
In September, the B.1.258 lineage was introduced, which spread rapidly and largely dominated the autumn/winter wave with a peak prevalence of 86% during the months of November  [28]. This deletion, which has been shown to enhance viral infectivity [29] has arisen at least six times independently and frequently followed receptor binding AA replacements (i.e. N501Y, N439K, Y453F) [28]. In addition, the B.1.258 variant is characterised by the N439K mutation that confers an increased binding affinity to the hACE2 receptor and leads simultaneously to immune escape from a panel of neutralizing monoclonal antibodies as well as from sera of persons recovered from infection [30].
By the end of December, the B.1.1.7 UK variant, also known as 20I/501Y.V1 and Variant of Concern 202012/01 (VOC-202012/01), was identified for the first time in the Cypriot sample set. This lineage was first reported on December 20 th in the United Kingdom and drew immediate attention due to its rapid spread and increased transmissibility [31]. The earliest sampled genome of the B.1.1.7 variant dates back to September 20 th . Since then it has grown rapidly in frequency in the UK becoming the most prevalent lineage there with a similar development also observed in a variety of other European countries, including Ireland, Spain and Greece [32]. It is characterised by an unusual large number of mutations including N501Y in the receptor-binding domain that increases ACE2 receptor affinity and P681H that creates a furin cleavage site between S1 and S2, which was previously to promote transmission. It is speculated that it may have originated in a chronically infected immunocompromised person. In our sample set the frequency of this variant increased in a similar rapid manner from 3.4% in December to 38% by the end of January despite declining case numbers highlighting its superior fitness. Similar studies have been reported from a plethora of countries around the world including the UK, Ireland, Russia, India, Qatar etc. [26,[33][34][35][36]. These studies provide valuable information on the routes of importation of new strains and the emergence of critical mutations that change key parameters such as transmissibility or virulence, as was recently highlighted by the discovery of the B.1.617 variant and its B.1.617.2 sub-lineage in India [37], which led to a surge of SARS-CoV-2 infections in Delhi. It could be shown that B.1.617.2 exhibits a very high transmissibility, as much as 50% greater than B.1.1.7, while no increase in the case-fatality ratio (CFR) was observed.
In Cyprus, the CFR after an initial peak exhibited a falling trend that stabilised over time (see S1 Fig). However, it is difficult to draw conclusions about the virus pathogeny from the CFR alone, as reason for the initially high CFR may lie in the lower testing capacity at the beginning of the pandemic leading to an underestimation of the total number positive cases. Other critical factors that were shown to impact on CFR and likely contributed to the observed reduction are hospital capacity and COVID-19 treatment protocols [38].

Conclusions
Whole genome sequencing of viral specimens following molecular diagnosis is a powerful analytical tool of molecular epidemiology that can critically assist in resolving chains of transmission, identifying of new variants or assessing pathogen evolution and allows a real-time view into the dynamics of a pandemic [39]. In this study we describe for the first time the changing pattern of circulating SARS-CoV-2 lineages in Cyprus since its appearance between March 2020 and the end of January 2021. Distinct lineages of SARS-CoV-2 contributing to three separate waves of infections reflective of the epidemiological pattern were observed.
The global real-time tracing of the viral spread through whole genome sequencing has led recently to the identification of three major Variants of Concern (VOC). Lineage B.1.1.7 (20I/ 501Y.V1, VOC-202012/01) better known as the UK variant possesses an unusual high number of mutations is believed to be more transmissible than the wild-type virus. Lineage B.1.351 (501.V2, 20H/501Y.V2, VOC-202012/02) first detected and reported in South Africa in early October 2020, shares several mutations with B.1.1.7 and is feared to reduce to some extend vaccine effectiveness [40]. In Brazil, the P.1 (20J/501Y.V3, VOC-202101/02) lineage emerged in December 2020 [41]. The P.1 lineage contains ten mutations in the spike protein that may affect its ability to be recognized by antibodies [42,43].
Of these three VOC only the B1.1.7 variant has so far been identified in Cyprus, which after being encountered for the first time at the end of December was able to reach 37% prevalence within one month. The majority of the Cypriot samples analysed belonged to the B.1.258Δ lineage, which has been first detected at the beginning of August in Switzerland and has since then spread to numerous European countries where it became one of the most prevalent lineages by December including the Czech Republic, Slovakia and Sweden [28,44]. The N439K receptor binding motif characteristic of this lineage has previously been reported to confer an increased binding affinity to the hACE2 receptor and simultaneously leading to immune escape from a panel of neutralizing monoclonal antibodies as well as from sera of persons recovered from infection [30]. The association of mutations found in the B.1.258Δ lineage with increased fitness and immune evasion along with the high prevalence in several European countries stipulates further characterisation of this variant. A continuous surveillance of SARS-CoV-2 by whole genome sequencing continues to be critical for timely detection of emerging variants, identifying transmission modes and guiding public health intervention.  Table. Frequency of SNPs in lineages observed. Single nucleotide polymorphisms (SNPs) were identified in the genomes with respect to the reference sequence (NC_045512). The frequency of SNPs is calculated by the number of sequences with the SNP/total number of sequences per lineage. SNPs are ordered by their position on the genome. Number of sequences of each lineage are indicated in brackets in the header. SNPs with frequency >75% are marked in green, SNPs with frequency between 50% and 75% are marked in yellow. (PDF)