Wave-wise comparative genomic study for revealing the complete scenario and dynamic nature of COVID-19 pandemic in Bangladesh

As the COVID-19 pandemic continues to ravage across the globe and take millions of lives and like many parts of the world, the second wave of the pandemic hit Bangladesh, this study aimed at understanding its causative agent, SARS-CoV-2 at the genomic and proteomic level and provide precious insights about the pathogenesis, evolution, strengths and weaknesses of the virus. As of Mid-June 2021, over 1500 SARS-CoV-2 genomesequences have been deposited in the GISAID database from Bangladesh which were extracted and categorized into two waves. By analyzing these genome sequences, it was discovered that the wave-2 samples had a significantly greater average rate of mutation/sample (30.79%) than the wave-1 samples (12.32%). Wave-2 samples also had a higher frequency of deletion, and transversion events. During the first wave, the GR clade was the most predominant but it was replaced by the GH clade in the latter wave. The B.1.1.25 variant showed the highest frequency in wave-1 while in case of wave-2, the B.1.351.3 variant, was the most common one. A notable presence of the delta variant, which is currently at the center of concern, was also observed. Comparison of the Spike protein found in the reference and the 3 most common lineages found in Bangladesh namely, B.1.1.7, B.1.351, B.1.617 in terms of their ability to form stable complexes with ACE2 receptor revealed that B.1.617 had the potential to be more transmissible than others. Importantly, no indigenous variants have been detected so far which implies that the successful prevention of import of foreign variants can diminish the outbreak in the country.


144
The Wuhan genome reference sequence (NC_045512.2) was retrieved from NCBI GenBank (60).   Surface Area (SASA) is used in molecular dynamic simulations to predict the hydrophobic core 219 stability of proteins. In this study, SASA was calculated using the "sasa" module and the resulting 220 graph was visualized using Xmgrace.

228
In comparison to the Wuhan reference sequence, all sequences from both waves appeared to have 229 two or more mutations (Supplementary file 5). The average number of mutations per sample was 230 found to differ significantly between the two waves (Fig 1) based on two sided t-test

243
The occurrence of several classes of mutations, as well as the percentages of each class for both 244 waves, are documented in the Supplementary file 6. Single-nucleotide polymorphisms (SNPs) 245 seemed to be highly prevalent in both cases (58.89% in wave-1 and 61.5% in wave-2) (Fig 3). 246 Extragenic mutations were also found to some extent, but all were either in 5'-UTR or in 3'-UTR   All mutational events were also classified into different variant types to explain the higher 255 frequency of SNPs (Supplementary file 7). Though both SNP transitions (purine > purine, 256 pyrimidine > pyrimidine) and transversion (purine > pyrimidine and vice versa ) were observed 257 among all samples, C>T transition was the most frequent mutation in both waves (Fig 4). The 258 percentage of occurrence of this transitional event was 45.75% in wave-1 and 42.44% in wave-2.

259
While A>G transition is the second most common event in wave-1(12.53%), G>T transversion 260 possessed this place in the case of wave-2(13.57%). Oligonucleotide deletion was also commonly 261 present in the samples from wave-2. In the second wave of the COVID-19 pandemic in 262 Bangladesh, two oligonucleotide deletion events (TCTGGTTTT and CTTGCTTTA) appeared to 263 be much more pervasive (2.46% and 1.63% respectively).

268
The presence of mutational changes in specific coordinates of the SARS-CoV-2 genome sequences 269 was also analyzed in this study (Supplementary file 8). In both waves, the A23403G, C3037T, 270 C14408T, and C241T mutations showed a similar pattern of abundance (Fig 5). Although the 271 GGG28881AAC trinucleotide substitution was the 5th most prevailing event in the case of Wave-272 1, its existence was much lower in the case of Wave-2 ( only 0.75%). Rather TCTGGTTTT11288 273 deletion was substantially more common in the second phase, which is consistent with previous 274 findings of this study.

281
We also summarized the impacts of these mutations on the protein sequence of SARS-CoV-2 in 282 the final step of this mutational investigation (Supplementary file 9). The D614G (aspartate to frequency in the samples of both waves (Fig 6). From this observation it can be said that the G-

295
The four most frequent amino acid substituting events had the same type of distribution for both 296 waves. Besides, some mutations in spike and nucleocapsid protein (e.g., S:N501Y, N:T205I, 297 S:D80A ) were found in a significant amount in wave-2. Furthermore, the ORF3a:Q57H variant, 298 which is a marker variant for GH clade, was also very common in this case.

300
Furthermore, the distribution of various SARS-CoV-2 clades and most frequent variants was also 301 compared across two waves in Bangladesh (Supplementary file 10). Throughout the pandemic in 302 Bangladesh, the G-clade and its derivatives (GH, GV, GR, GRY) continued to be dominant (Fig   303   7). Although the GR clade was predominant during wave-1(75.86%), in wave-2 the GH clade took 304 the lead (61.26%). However, the percentage of other G-clades was pretty much similar in both 305 phases of the pandemic. On the contrary, in wave-1, the L, O, and S clades had a very low 306 frequency, and in wave-2, the L and S clades disappeared. The variants from the B lineage were 307 extremely common in wave-1, with B.1.1.25 accounting for 72.46% of the total (Fig 8). Besides, 308 the alpha variant (B. 1.1.7), a variant of concern, also showed up to some extent. In the scenario of 309 wave-2, however, the B.1.351.3 (57.44%) variant dominated throughout the entire time frame.

310
During this wave, there was also a progressive increase of VOC variants (alpha, beta, delta).     were very similar in the sense that the atoms near the end of the complex were more flexible than 344 the rest of the complex (Fig 9b). otherwise rest of the time (Fig 9c).  617 (Fig 9d).  SNPs, wave-2 tended to have a higher number of deletion events (Fig 3). Such recurring recurrent 381 deletion events in the SARS-CoV-2 genome had been reported to facilitate its transmission with 382 altered antigenicity and antibody escape mechanism (73). Furthermore, despite the fact that C>T 383 transitions prevailed in both waves, G>T transversion was rather frequent in wave-2 (Fig 4)  ORF3a:Q57H, N:G204R protein variants were equally abundant in both waves (Fig 6). All of 404 these are marker variants for the G clade and its derivatives (Table 1), which explains why 405 Bangladesh experienced a greater distribution of these clades (Fig 7). which is a marker variant for GH clade, was also very common in this case.  Bangladesh (Fig 8). of folding, and susceptibility to disruption by solvents (Fig 9). These findings coincide with those