Figures
Abstract
Global pandemic interventions have reshaped host-virus dynamics, potentially altering the evolution of endemic pathogens. Here, we report accelerated genomic evolution of human coronavirus OC43 (HCoV-OC43)—a close relative of pandemic-associated coronaviruses—following recent worldwide epidemiological shifts. Bayesian analysis of longitudinal surveillance data revealed a 3.76-fold increase (8.9403 × 10 ⁻ ⁴ nucleotide substitutions/site/year, 95% HPD: 4.9075 × 10 ⁻ ⁴, 1.3053 × 10 ⁻ ³) in the spike gene substitution rate of the currently dominant genotype K post-2020. Positively selected mutations were mainly located in the spike protein, and some colocalize with antigenic epitopes. Crucially, structural modeling demonstrated that broadly neutralizing antibodies targeting conserved stem-helix (S2P6) and fusion-peptide (COV44–62/79, 76E1) epitopes of high-pathogenicity betacoronaviruses cross-bind HCoV-OC43 spike protein, establishing a mechanistic basis for immune-driven selection. These findings suggest that population-level immune imprinting may play a potential driving role in mutations within key domains of HCoV-OC43, although further validation is required. Sustained co-surveillance of co-circulating coronaviruses is imperative to anticipate emergent variants with altered pathogenicity.
Author summary
Since the global outbreak of COVID-19, extensive public health interventions—such as mask wearing, social distancing, and travel restrictions—along with widespread vaccination and antiviral treatments, have profoundly influenced the transmission dynamics of various respiratory viruses. HCoV-OC43 and SARS-CoV-2 both belong to the genus Betacoronavirus and share structural and antigenic similarities, which may lead to potential cross-reactivity and cross-protective immunity between them. The pandemic may have exerted immune selection pressures that drive adaptive mutations in HCoV-OC43, enabling it to maintain endemic circulation. However, current research on the molecular basis of cross-immunity between these coronaviruses and its impact on viral evolution remains limited. By comparing substitution rates and analyzing antigen–antibody interactions, our study revealed an accelerated evolutionary trend of HCoV-OC43 under potential immune selection pressure. These findings underscore the importance of ongoing joint surveillance of co-circulating coronaviruses to anticipate the emergence of new variants with altered pathogenicity.
Citation: Lu S, Shen Q, Wang H, Cai M, Hu J, Li Y, et al. (2026) Pandemic-driven immune imprinting accelerates evolution of human coronavirus OC43. PLoS Negl Trop Dis 20(3): e0014109. https://doi.org/10.1371/journal.pntd.0014109
Editor: Esaki M. Shankar, Central University of Tamil Nadu, INDIA
Received: December 9, 2025; Accepted: March 2, 2026; Published: March 17, 2026
Copyright: © 2026 Lu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files.
Funding: This research was funded by the Shanghai Municipal Public Health Research Program (2024GKQ18 to ZT). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
In 1967, McIntosh and his colleagues first isolated human coronavirus OC43 (HCoV-OC43) from organ cultures of human embryonic trachea obtained from patients with respiratory disease [1]. Among the seven coronaviruses known to infect humans, SARS-CoV, MERS-CoV, and SARS-CoV-2 are highly pathogenic, capable of causing large-scale outbreaks and fatal pneumonia. In contrast, HCoV-OC43, HCoV-229E, HCoV-NL63, and HCoV-HKU1 are endemic seasonal coronaviruses that typically cause mild upper respiratory tract illness [2,3]. Compared to the other three common endemic seasonal coronaviruses, HCoV-OC43 exhibits a relatively higher prevalence of respiratory tract infections in children and adults. Furthermore, novel genotypes of HCoV-OC43 are continuously being identified, posing a considerable threat to immunocompromised individuals, infants, and the elderly [4–12]. The genome of HCoV-OC43, similar to that of SARS-CoV-2, is a positive-sense single-stranded RNA (+ssRNA) approximately 30.7 kb in length, with the first two-thirds region encoding the ORF1ab polyprotein (ORF1ab) and the remaining one-third encoding structural proteins and non-structural proteins, including the spike surface glycoprotein (S), envelope protein (E), membrane protein (M), nucleocapsid protein (N), hemagglutinin-esterase (HE), non-structural protein 2 (ns2), and non-structural protein 12.9 (ns12.9) [13–16].
The emergence of SARS-CoV-2 was first observed at the end of December 2019, subsequently giving rise to a global pandemic with far-reaching implications for human health and the advancement of medical research on coronaviruses. Given its low risk and structural similarity, HCoV-OC43 is often used as a low-risk model to study SARS-CoV-2 [17]. By seroepidemiology, Thoisy et al. found that the long-term endemic equilibrium of seasonal coronaviruses is the result of a dynamic balance among increasing population immunity through new infections, waning antibody levels, and the introduction of newly susceptible children [18]. Furthermore, several studies have found cross-reactivity and partial cross-protective immunity between SARS-CoV-2 and endemic coronaviruses such as HCoV-OC43 in host immune cells, owing to frequent reinfections and widespread vaccination [19,20].
SARS-CoV-2 has been circulating for over five years, in contrast to HCoV-OC43, which has persisted for more than fifty years. The close phylogenetic relationship between the two viruses, along with their localized similarity in the primary or higher-order structures of several proteins, strongly suggests the potential for antibody cross-reactivity. Consequently, the SARS-CoV-2 pandemic may have disrupted the long-term endemic equilibrium of HCoV-OC43 and accelerated its evolution, particularly under immune selection pressure. In this context, we performed genotyping of HCoV-OC43 based on phylogenetic analysis using whole genome (WG) sequences as well as full-length S, RNA-dependent RNA polymerase (RdRp), and N gene sequences. Across these four genetic levels, we then estimated and compared the nucleotide substitution rates of the predominant genotypes before and after the pandemic and conducted selection pressure analyses. Moreover, at the genetic level of the mutation-prone spike protein, we analyzed the historical evolutionary patterns of positively selected sites, assessed polymorphism of amino acid variation sites, predicted both linear and conformational B-cell epitopes, and evaluated antigen–antibody binding interactions. These analyses aimed to explore the mechanisms of the sustained epidemic of HCoV-OC43 and the potential for cross-reactivity between SARS-CoV-2 and HCoV-OC43. This study provides insights for the prevention of future outbreaks of HCoV-OC43 and other related coronaviruses.
Results
Results of HCoV-OC43 genotyping
HCoV-OC43 exhibited distinct predominant genotypes during different periods (Fig 1A-1H). At the WG level, the predominant genotypes shifted over time prior to the emergence of SARS-CoV-2, from genotype A (1967) to genotype E (1985–1990), genotype C (1991–2006), genotype D (2007–2010), genotype F (2011–2013), genotype G (2014–2015), genotype H (2015), genotype I (2016–2017), and genotype K (2018–2019). Following the emergence and subsequent pandemic of SARS-CoV-2, genotypes J and K (behaving as genotype I at the N-gene level) have become the predominant genotypes, with genotype K accounting for the majority of recent viral isolates. Fourteen sequences obtained by sequencing (PV798427-PV798440) in this study were all classified as genotype K. Notably, all HCoV-OC43 strains collected since 2024 belong to genotype K, indicating that it is highly likely to occupy the currently dominant position (Fig 1E). Besides, comparison of phylogenetic trees constructed from different gene regions for the same viral strains revealed that, except for genotype A, strains assigned to other genotypes appeared in partially inconsistent phylogenetic positions across the maximum likelihood (ML) trees based on WG, S, RdRp, and N genes (Fig 1I). Detailed summary results of the same virus strains under the four genotyping levels (WG, S, RdRp, and N) can be found in S1 Table.
A-D. genotyping results based on the WG, S, RdRp, and N gene sequences of HCoV-OC43 (A. WG genotyping results; B. S genotyping results; C. RdRp genotyping results; D. N genotyping results). The black dots on the branches indicate bootstrap values ≥ 70. Branches marked in red on the evolutionary tree indicate sequences obtained by sequencing in this study. E-H. distribution results of sequence collection years for different genotypes based on WG, S, RdRp, and N genotyping of HCoV-OC43 (E. distribution results based on WG genotyping; F. distribution results based on S genotyping; G. distribution results based on RdRp genotyping; H. distribution results based on N genotyping). I. WG, S, RdRp, and N genotyping results of the same virus strain of HCoV-OC43.
Bayesian evolutionary analysis
For genotype K, which currently occupies a dominant position after the emergence of SARS-CoV-2, the overall nucleotide substitution rate at the WG level was estimated at 2.2156 × 10 ⁻ ⁴ nucleotide substitutions/site/year (95% HPD: 1.7252 × 10 ⁻ ⁴, 2.7172 × 10 ⁻ ⁴), and the time to the most recent common ancestor (tMRCA) traced to approximately mid-June 2010 (2010.4476; 95% HPD: 2002.1353, 2015.0588) (S2 Table). Comparison of total nucleotide substitution rates across the S, RdRp, and N genes revealed that the S gene exhibited the highest rate, whereas the RdRp gene showed the slowest (S2 Table and Fig 2A-2D). Furthermore, by comparing the nucleotide substitution rates of the WG, S, RdRp, and N genes before and after the SARS-CoV-2 pandemic, there was an accelerated evolutionary trend in all four genetic levels. The acceleration was most pronounced in the S gene, with nucleotide substitution rates increasing from 2.3757 × 10 ⁻ ⁴ (95% HPD: 9.4851 × 10 ⁻ ⁵, 3.9034 × 10 ⁻ ⁴) to 8.9403 × 10 ⁻ ⁴ (95% HPD: 4.9075 × 10 ⁻ ⁴, 1.3053 × 10 ⁻ ³) (S2 Table and Fig 2A-2D).
A-D. nucleotide substitution rates of genotype K at the WG, S, RdRp, and N gene levels (expressed at the N gene level as genotype I). E-H. nucleotide substitution rates of genotype J at the WG, S, RdRp, and N gene levels. The all, pre-pandemic (pre-), and post-pandemic (post-) collected sequences from each dataset were imported into the BEAST software for analysis separately. The white line in the box plot indicates the median, and the black dot indicates the mean.
For genotype J, which was in subdominant position after the emergence of SARS-CoV-2, the WG-level overall nucleotide substitution rate was 2.5133 × 10 ⁻ ⁴ nucleotide substitutions/site/year (95% HPD: 1.5595 × 10 ⁻ ⁴, 3.4601 × 10 ⁻ ⁴), with the tMRCA estimated at mid-January 2006 (2006.0498; 95% HPD: 1995.1384, 2013.8474) (S2 Table). A similar pattern was observed, with the S gene evolving fastest and RdRp slowest (S2 Table and Fig 2E-2H). However, in contrast to genotype K, post-pandemic analysis revealed accelerated evolution in the WG, RdRp, and N genes, whereas the S gene showed a deceleration in its nucleotide substitution rates (S2 Table and Fig 2E-2H).
Besides, recombination analysis identified only a few potential recombinant sequences, all of which were sampled before the COVID-19 pandemic, and SimPlot validation did not reveal clear or stable parental fragment exchanges (detailed information is in S1 File). Therefore, the impact of recombination on the BEAST-based estimation of nucleotide substitution rates is expected to be minimal.
Selection pressure analysis and historical evolution analysis of positively selected sites
Positively selected sites were found only in the S and N proteins of genotype K (behaving as genotype I at the N gene level) and in the S protein of genotype J (S3 Table). No positively selected sites were detected in the remaining proteins. Specifically, seventeen positively selected sites (identified by at least one method) were detected in the S protein of genotype K: twelve—sites 25, 26, 38, 40, 67, 89, 195, 199, 265, 266, 270, and 271—were located in the N-terminal domain (NTD) of the S1 subunit; two—sites 504 and 506—were in the C-terminal domain (CTD) of the S1 subunit; one—site 900—was in the fusion peptide (FP) region of the S2 subunit; two—sites 1251 and 1338—were in the other region (site 1251 was mapped to the stem helix region) of the S protein (S3 Table). In the N protein of genotype K, three positively selected sites were identified at sites 49, 79, and 248 (S3 Table). In genotype J, nine positively selected sites (identified by at least one method) were detected. Among these, six positively selected sites—454, 472, 481, 503, 554, and 573—were identified in the CTD of the S1 subunit (S3 Table).
The results of historical evolution analysis showed that amino acid mutations at positively selected sites almost all occurred along internal nodes of the time-scaled maximum clade credibility (MCC) trees. Lots of mutations at positively selected sites in genotype K, including R26G/T, P38L/S, S40P, L67V, S265A, S270N, L271S, and D1251A, were observed at tree nodes or within subclades corresponding to sequences collected after the SARS-CoV-2 pandemic. In contrast, the positively selected sites in genotype J were primarily associated with tree nodes corresponding to pre-pandemic sequences (Fig 3).
A. MCC tree of all S protein gene sequences. Different genotypes are marked in different colors. B-C. MCC tree of genotypes J and K. Branch support values represent Posterior Probability. The tree nodes are labelled with amino acid mutations of positively selected sites. Red triangles represent sequences collected post-pandemic, pink squares represent sequences collected pre-pandemic, and red pentagrams represent sequences obtained by sequencing in this study.
Furthermore, the ML trees with branch lengths in units of substitutions per site reconstructed by IQ-TREE showed that within genotype K, sequences sampled after 2020 accumulated more substitutions compared with those sampled before 2020, whereas genotype J exhibited the reverse pattern (S1 Fig). This provided qualitative, model-light support that was basically consistent with the accelerated evolutionary patterns inferred from our Beast analyses and largely in agreement with our historical evolution analysis of positively selected sites (Figs 2, 3 and S1).
Polymorphism analysis of amino acid variation sites
Three amino acid variation sites were identified in the S protein of genotype J and six in that of genotype K (Table 1). For genotype J, the major amino acid usage frequency at variation site 481 showed a statistically significant difference before and after the pandemic (P < 0.01); variation sites 481 and 573 coincided with positively selected sites (Table 1). For genotype K, all six variation sites—26, 38, 44, 67, 265, and 483—exhibited significant differences in the major amino acid usage frequencies (P < 0.01), among which sites 26, 38, 67, and 265 also corresponded to positively selected sites (Table 1).
Prediction of potential linear and conformational B-cell epitopes
In the β-turn and random coil regions, which are typically enriched for antigenic epitopes, seven potential linear B-cell epitopes were predicted in each of the S proteins of HCoV-OC43 genotypes J and K, with two epitopes shared between them, resulting in a total of twelve unique predicted potential linear B-cell epitopes (S2 Fig and S4 Table). Notably, in genotype K, sites 265, 266, 270, and 271 in the NTD of the S1 subunit, identified as both positively selected sites and predicted linear epitope sites, were located within the epitope 264-RPKDGFSP-271. Among these, site 265 was also identified as a statistically significant variation site. Positively selected site 472 was identified in the predicted linear epitope 472-VFKPQP-477 of genotype J (Tables 1, S3 and S4).
Based on AlphaFold-predicted three-dimensional (3D) structures of the S proteins of genotypes J and K, 447 and 418 potential conformational B-cell epitope residues were predicted, respectively (S5 Table). These residues were distributed across multiple regions of the S protein, including the NTD and CTD of the S1 subunit, the S1/S2 cleavage region, and the fusion peptide region of the S2 subunit (Fig 4 and S5 Table). Among them, the positively selected sites (sites 25, 26, 38, 89, 195, 199, 265 and 266) and statistically significant variation sites (sites 26, 38, 44, and 265) in the NTD of the S1 subunit of genotype K were also mapped to predicted conformational epitope residues; two positively selected sites 26 (25) and 44 (43) were identified within the predicted conformational epitope residues of genotype J (Tables 1, S3 and S5). Here, 25 and 43 in parentheses indicate the aligned conformational epitope sites.
A-B. potential B-cell conformational epitopes of genotypes J and K on the S protein displayed in three orthogonal views (A. genotype J; B. genotype K). Potential B-cell conformational epitopes were predicted by Discotope-3.0 based on the AlphaFold-predicted S protein structures. Three chains of the HCoV-OC43 spike protein are depicted in purple, pink, and green cartoons, with potential epitope sites represented as ball-and-stick atoms.
Antigen–antibody binding analysis
The 9-O-acetylated sialic acid (9-O-Ac-Sia) receptor (PDB: 6NZK) binds to the S1 NTD of the HCoV-OC43 S protein, inserting its acetyl group into two hydrophobic binding pockets, P1 and P2, which defined by the LI loop (27–NDKDTG–32) and the L2 loop (80–LKGSVLL–86) and separated by Trp90 (S3A-S3C Fig). Of the positively selected sites identified in the S protein of genotype K, sites 38 (33) and 40 (35) are situated near the binding region, whereas site 89 (84) resides directly within it, suggesting potential functional relevance (S3C Fig). Here, 38 (33) indicates that 38 represents the positively selected site, while 33 corresponds to the aligned position in the sequence used for the 6NZK structural prediction.
Additionally, focusing on four broadly neutralizing antibodies—S2P6, COV44–62, COV44–79, and 76E1—which were originally developed against SARS-CoV-2 but also exhibit cross-neutralization activity against HCoV-OC43, the antigen–antibody binding results showed that all four antibodies target relatively conserved regions in the S2 subunit of the S protein of HCoV-OC43. Specifically, S2P6 bound to the stem helix region, while COV44–62, COV44–79, and 76E1 bound to the fusion peptide region which includes the S2’ cleavage site (Fig 5). Detailed hydrogen bond interactions at the antigen–antibody interfaces are shown in Fig 5. Among these binding sites, sites 925 (921), 1247 (1243), and 1249 (1245) in genotype J, and sites 912 (902), 925 (915), 1247 (1237), 1249 (1239), and 1251 (1241) in genotype K were also identified as predicted conformational B-cell epitope residues (Fig 5 and S5 Table). Notably, site 1251 (1241) was also one of the previously identified positively selected sites (Fig 5 and S3 Table). Here, 925 (921) indicates that 921 represents the antigen–antibody interaction site, which also overlaps with the conformational epitope site predicted in this study, while 925 corresponds to the aligned site position in the original sequence used for selection pressure analysis. Furthermore, the stem helix and fusion peptide regions involved in antibody binding show a high degree of amino acid sequence similarity between SARS-CoV-2 and HCoV-OC43 (Fig 5). These findings, based on protein structural analyses of antigen–antibody interactions, provide structural evidence supporting potential cross-reactivity between SARS-CoV-2 and HCoV-OC43 within host cells.
A. HCoV-OC43 genotype J Spike stem helix peptide binding to S2P6 antibody Fab fragment (PDB: 7NRJ). B. HCoV-OC43 genotype K Spike stem helix peptide binding to S2P6 antibody Fab fragment (PDB: 7NRJ). C. HCoV-OC43 Spike fusion peptide binding to COV44-62 antibody Fab fragment (PDB: 8D36). D. HCoV-OC43 Spike fusion peptide binding to COV44-79 antibody Fab fragment (PDB: 8DAO). E. HCoV-OC43 Spike fusion peptide binding to 76E1 antibody Fab fragment (PDB: 7X9E). Due to the identical spike fusion peptide sequences of genotypes J and K, both genotypes were analyzed together in panels C, D, and E. Each panel shows genotype J at the top and genotype K at the bottom. The heavy chain of the antibody Fab antigen-binding fragment is depicted in pink cartoons, and the light chain is shown in green cartoons. The viral binding peptide is represented in purple cartoons. Antigen-antibody hydrogen-bonding interaction residues are highlighted in stick atoms and annotated with text labels. Blue dashed lines indicate hydrogen bonds. The sites highlighted in red on the peptide sequences indicate antigen–antibody hydrogen-bonding interaction residues. SARS-CoV-2 interaction residues were annotated based on the related source literatures [21–23].
Discussion
In this study, we performed viral genotyping of HCoV-OC43 using phylogenetic analyses based on the WG, S, RdRp, and N gene sequences. Through the estimation and comparison of nucleotide substitution rates, we found an accelerated evolutionary trend in these genes of the currently dominant genotype K of HCoV-OC43, following the emergence of SARS-CoV-2. This finding suggests a potential shift in the evolutionary dynamics of HCoV-OC43, possibly influenced by changes in host immune landscapes or ecological pressures in the post-SARS-CoV-2 era.
A similar accelerating discovery was reported in a study on the genomic evolution analysis of human respiratory syncytial virus (HRSV) conducted by Maria Piñana et al. in 2024 [24]. They found that both HRSV-A and HRSV-B subtypes showed an increase in nucleotide substitution rates after the SARS-CoV-2 pandemic. Logically, the emergence of this accelerated trend of viral evolution is more plausible for the seasonal coronavirus HCoV-OC43, which is structurally similar to SARS-CoV-2 and belongs to the same genus, Betacoronavirus.
SARS-CoV-2 and HCoV-OC43 are similar in structural and virological characteristics, both are transmitted by respiratory droplet transmission or close contact, and both are capable of infecting humans. The competition between the two viruses for ecological niches may lead to competition for the same host or biological resources, which is exacerbated by the widespread implementation of social quarantine measures [25,26]. Moreover, SARS-CoV-2 shows greater transmission efficiency than HCoV-OC43. SARS-CoV-2 spread rapidly and globally upon its emergence. The infectiousness of SARS-CoV-2 was already estimated by the parameter basic reproduction number (R0), with an early R0 of about 2.2 and a mean R0 of 1.6-6.5; while the reported annual prevalence of endemic coronaviruses detected in hospital-based cohorts was only 6% on average, although most of these infections were detected as HCoV-OC43 [8,25,27–31]. Besides, due to frequent SARS-CoV-2 reinfections and widespread vaccinations, SARS-CoV-2 may indirectly contribute to the evolution of HCoV-OC43 mutations through host cross-reactivity. Several studies have found cross-reactivity between SARS-CoV-2 and seasonal coronaviruses in host cells [19,20]. Additionally, in this study, we confirmed the presence of SARS-CoV-2 and HCoV-OC43 cross-reactivity in host cells based on protein 3D structural evidence. Through predictive structural modeling, we successfully obtained the 3D structures of antigen-antibody complexes, identified the corresponding binding epitopes, and mapped specific interaction sites within these epitopes. Notably, a positively selected site (1251) was detected within the binding region. Therefore, our findings provide structural-level evidence supporting the existence of cross-reactivity between SARS-CoV-2 and HCoV-OC43 in host cells.
By performing selection pressure analysis and ancestral sequence reconstruction, we found that in genotype J, positively selected mutations are primarily concentrated in the CTD of the S1 subunit, and are mostly located on phylogenetic tree nodes corresponding to sequences collected before the SARS-CoV-2 pandemic. In contrast, genotype K exhibits positively selected mutations mainly in the NTD of the S1 subunit and the fusion peptide region of the S2 subunit, predominantly on nodes associated with post-pandemic sequences. Based on tMRCA, genotype J likely emerged earlier than genotype K. Notably, since 2024, all globally collected viral strains have belonged to genotype K, with no detection of genotype J. These suggest that genotype K currently holds the dominant status. Moreover, the accumulation of adaptive mutations at multiple positively selected sites in the S1 NTD, S2 fusion peptide region, and N protein, along with possible co-evolution among these regions (NTD-FP-N), may enhance the viral ability to recognize and bind to host cell sialic acid receptors, improve membrane fusion and viral replication/assembly efficiency, and ultimately facilitate more efficient host cell entry and transmission. In contrast, genotype J appears to have entered a relatively stable adaptive phase. After undergoing adaptive evolution, its major advantageous mutations may have already become fixed, leading to a more stable genetic composition within the population and a reduced evolutionary rate. Consequently, post-pandemic mutations in the S protein are likely to be neutral or under purifying selection, resulting in a slower accumulation of new beneficial mutations. We therefore propose that genotype J may have largely adapted to current host and environmental conditions and is presently in a relatively stable, low-variability evolutionary stage, though not in a state of complete stasis or absolute equilibrium.
The spike protein of coronaviruses is known for its high variability and frequent mutations [14,32,33], whereas the RdRp and N proteins are relatively conserved [14,32]. In genotype J, we observed that the S gene exhibited a decelerated evolutionary trend after the pandemic, while the RdRp and N genes evolved more rapidly. This contrasting pattern may reflect differences in selective pressures and functional constraints. The S protein, having undergone strong adaptive evolution during earlier host adaptation, may have reached a relatively optimized state for receptor binding and immune evasion, leading to a reduced rate of further advantageous mutations. By contrast, the more conserved RdRp and N genes may be experiencing compensatory or fine-tuning mutations to maintain replication efficiency and genomic stability in response to accumulated changes elsewhere in the viral genome.
Driven by the SARS-CoV-2 pandemic, global efforts to develop antiviral drugs targeting SARS-CoV-2 have accelerated significantly. These antivirals fall into two main categories: one targets viral proteins—primarily viral enzymes—to disrupt the viral life cycle, while the other targets host proteins involved in viral entry and replication, such as host cell receptors or proteases [34]. Several antiviral drugs that have been approved or are in late-stage clinical trials primarily target conserved viral domains, including Paxlovid (Nirmatrelvir/Ritonavir) [35], Molnupiravir (EIDD-2801/MK-4482) [36,37], and Remdesivir [38], which act on key viral enzymes such as the RNA-dependent RNA polymerase (RdRp/nsp12). These enzymes are highly conserved across coronaviruses and exhibit low tolerance to mutation, making them promising broad-spectrum antiviral targets. Notably, several studies have shown that Molnupiravir exhibits broad-spectrum activity against SARS-CoV-2, SARS-CoV, MERS-CoV, and bat-derived coronaviruses in human airway epithelial cell cultures [39] and humanized mouse models [40]. Moreover, some studies [41–43] indicate that coronaviruses tend to accumulate mutations more readily under drug pressure. Therefore, during the pandemic, the widespread use of broad-spectrum antiviral agents may have indirectly contributed to the accelerated evolution of structurally related viruses, such as HCoV-OC43.
The RdRp of coronaviruses possesses template-switching capability [44,45], which facilitates inter-template recombination during co-infection of a host by different HCoV-OC43 genotypes or closely related coronaviruses. This process can lead to genetic recombination and segment exchange, resulting in the continuous emergence and replacement of genotypes over time, thereby altering viral biological properties and antigenic structures, and enhancing adaptability and transmissibility [5,12]. In addition, during ongoing transmission, the S protein of HCoV-OC43 undergoes adaptive mutations under immune selection pressure [46]. As identified in this study, adaptive mutations at positively selected sites in the S1 NTD can increase the viral binding affinity for the host cell surface receptor 9-O-Ac-Sia. These mutations may facilitate immune evasion either by altering antigenic epitopes to escape recognition by neutralizing antibodies or by gradually accumulating point mutations that drive antigenic drift, enabling continual adaptation to host immune responses. Through genetic recombination, HCoV-OC43 generates novel genotypes; adaptive mutations enhance infectivity and immune evasion; and antigenic drift leads to continuous changes in epitope antigenicity. These mechanisms drive the viral evolutionary response to immune selection pressure and sustain its long-term circulation in the human population, forming the core dynamics underlying the persistent endemicity of HCoV-OC43.
This study has several limitations. Most of the data were derived from publicly available databases, and the currently limited number of HCoV-OC43 genomic sequences may not fully reflect the virus’s true genetic diversity and circulation patterns. Linking positively selected sites with epitopes requires highly precise analyses, which our current study may not fully achieve. Moreover, our B-cell epitope prediction and antigen-antibody binding analyses were based on static structural models of the spike protein, which cannot fully account for receptor-induced dynamic conformational changes. Due to current limitations in available resources and experimental data, dynamic conformational modeling of the HCoV-OC43 spike protein is not feasible for us. Therefore, our work provides a preliminary exploration of potential associations between positively selected sites and predicted epitopes, highlighting possible patterns rather than definitive correlations. Future studies could refine these analyses by mapping individual positively selected sites onto experimentally validated epitopes, integrating structural and antigenicity data, or applying statistical models to quantify site-by-epitope correlations more rigorously. Furthermore, large-scale sequencing combined with experimental validation—such as neutralization assays to assess cross-reactive antibody responses, protein functional assays to evaluate the effects of specific mutations, and comparative studies of viral adaptability to different hosts—will be essential to clarify the biological relevance of the predicted cross-reactivity and adaptive evolutionary patterns.
In conclusion, our findings suggest that the SARS-CoV-2 pandemic may have promoted the evolution of HCoV-OC43. Under selection pressure, adaptive mutations at key amino acid sites in the spike and nucleocapsid proteins of HCoV-OC43 may enhance the virus’s ability to recognize and bind to host sialic acid receptors, facilitate membrane fusion and viral assembly, and promote efficient host cell entry. These evolutionary adaptations likely contribute to viral persistence and ecological competitiveness. Additionally, immune pressure resulting from cross-reactivity between SARS-CoV-2 and HCoV-OC43 in host cells may serve as a driving force for the accelerated evolution of HCoV-OC43.
Methods
Data download and preprocessing
The data for this study were sourced from two parts. One part consisted of complete genome sequences of HCoV-OC43 obtained by sequencing. Based on the 2024 comprehensive surveillance of acute respiratory infections in Jing’an District, Shanghai, we used the Real-Time PCR Diagnostic Kit Rapid Detection of multiple pathogens of national acute respiratory infectious diseases (SMS-D404AAYF-C-10T-01K) (Beijing Zhuocheng Huisheng Biotechnology Co., Ltd., China) and the real-time quantitative PCR instrument LightCycler 480 II (F. Hoffmann-La Roche Ltd., Basel, Switzerland) for multi-pathogen detection. This kit contains specific primers and fluorescent probes for detecting the corresponding pathogen genes. By collecting fluorescence signals during the PCR amplification process, we monitored whether S-shaped amplification and Ct values ≤38 were present in different detection channels, which allowed for the qualitative detection of multiple respiratory pathogens, including four subtypes of HCoVs. A total of 85 HCoV-positive samples were detected, including 6 cases of HCoV-229E, 43 of HCoV-NL63, 28 of HCoV-OC43, and 11 of HCoV-HKU1. Due to co-infection between different subtypes, the total number of positive cases did not equal the sum of individual subtypes. Finally, based on sample quality and retesting of Ct values, we successfully sequenced 14 whole genomes of HCoV-OC43 using the Illumina Viral Surveillance Panel v2 (VSP2) reagent kit (hybrid capture enrichment method) and the Illumina MiSeq platform. Reference sequence-guided assembly was performed with MEGAHIT v1.1.3 [47] and Ragtag v2.1.0 [48]. These sequences were aligned and trimmed according to the reference sequence NC_006213.1 to extract the S, RdRp, and N gene segments, which were subsequently incorporated into the collected sequence datasets. All 14 sequences obtained have been submitted to GenBank with the accession numbers: PV798427-PV798440.
Additionally, the other part comprised the WG, S, RdRp, and N gene sequences of HCoV-OC43 downloaded from the National Center for Biotechnology Information (NCBI) database (https://www.ncbi.nlm.nih.gov) on January 15, 2025. Among them, we excluded sequences for which no collection date could be reliably obtained either through the NCBI database or publications, sequences with too many ambiguous bases (>1% of the total sequence length), sequences with consecutively long Ns (more than 10 consecutive Ns), sequences with obvious anomalies, and sequences shorter than 90% of the respective gene or whole genome length. However, considering that the number of sequences collected post-pandemic was less than that of pre-pandemic, we appropriately relaxed the restriction on the occurrence of consecutive unknown base N in sequences collected post-pandemic.
After data preprocessing, we finally obtained four sequence datasets of the WG, S, RdRp, and N of the HCoV-OC43. The WG dataset included 440 sequences in total, with 368 and 72 sequences collected pre-pandemic and post-pandemic; the S dataset included 621 sequences in total, with 545 and 76 sequences collected pre-pandemic and post-pandemic; the RdRp dataset included 655 sequences in total, with 569 and 86 sequences collected pre-pandemic and post-pandemic; the N dataset included 725 sequences in total, with 637 and 88 sequences collected pre-pandemic and post-pandemic.
Phylogenetic analysis and HCoV-OC43 genotyping
Each dataset described above was subjected to multiple sequence alignment using MAFFT v7.520 [49] and manual adjustment using MEGA v7.0 [50]. Subsequently, the maximum likelihood (ML) trees were constructed with the bootstrap set to 5000 in IQ-TREE v2.2.0 (integrated ModelFinder) [51], and several sequences of bovine coronavirus (BCoV) were used as the outgroup sequences to clarify the phylogenetic relationships. HCoV-OC43 genotyping was performed based on the reference typing sequences and phylogenetic relationships, using the naming convention proposed by Lau et al [12]. All phylogenetic trees were embellished using the online tool iTOL [52]. The reference typing sequences were summarized and organized based on the available articles [5,12,53–59] about HCoV-OC43 genotyping of the WG, S, RdRp, and N genes. Specific reference typing information for these sequences is provided in S6 Table.
Bayesian evolutionary analysis
To explore the changes in nucleotide substitution rates of HCoV-OC43 before and after the SARS-CoV-2 pandemic, as well as the time to the most recent common ancestor (tMRCA), analyses were conducted in BEAST software on the predominant genotypes. The nucleotide substitution rate reflects the evolutionary rate, while tMRCA suggests the earliest time of origin. Based on the genotyping results of the WG, S, RdRp, and N of HCoV-OC43, we generated several datasets of all, the pre-pandemic, and the post-pandemic sequences for the predominant genotypes. For each dataset, we utilized MAFFT v7.520 to conduct multiple sequence alignment, followed by manual adjustment using MEGA v7.0. BEAST v1.10.4 [60] was used to estimate the rate of nucleotide substitutions per site per year and tMRCA using the Bayesian Markov Chain Monte Carlo (MCMC) method, and to generate the maximum clade credibility (MCC) tree. The best-fit site substitution model was identified based on the Bayesian Information Criterion (BIC) using IQ-TREE v2.2.0 (integrated ModelFinder, the bootstrap set to 5000). TreeTime v0.11.3 [61] was used to evaluate the temporal signal of the datasets to remove sequences with aberrant temporal signal. Referring to parameter settings in the existing research [12], all BEAST analyses were conducted independently using a relaxed molecular clock model with an uncorrelated exponential distribution and a constant coalescent prior. The collection dates of sequences were used for molecular clock calibration. The number of total generations varied with the datasets used in each analysis, and the output samples were all no less than 10,000. Subsequently, the produced log file was imported into Tracer v1.10.4 [62] to assess and view the convergence of the chains. All parameters were estimated with an Effective Sample Size (ESS) over 200, indicating sufficient convergence. Besides, the MCC tree was inferred with a burn-in value set to 10% using TreeAnnotator v1.10.4 [60] and visualized in Figtree v1.4.4 [60]. In addition, to exclude the potential impact of recombination on the comparison of nucleotide substitution rates, we constructed the representative sequence dataset for different genotypes of HCoV-OC43 and performed recombination analyses. The detailed recombination methodology is provided in the S1 File.
Selection pressure analysis and historical evolution analysis of positively selected sites
Selection pressure was evaluated for the predominant genotypes of S, RdRp, and N genes of HCoV-OC43 using the Fixed Effects Likelihood (FEL) [63], Mixed Effects Model of Evolution (MEME) [64], Single-Likelihood Ancestor Counting (SLAC) [63], and Fast Unconstrained Bayesian Approximation (FUBAR) [65] methods in HyPhy v2.5.62 [66]. Positively selected sites were identified based on statistical significance (p-value < 0.1 in FEL, MEME, and SLAC or posterior probability < 0.9 in FUBAR) by at least one method.
After selection pressure analysis, the nucleotide sequences of the S gene with positive selection sites were translated to amino acid sequences using MEGA v7.0. Then, the amino acid sequences and the related MCC tree generated after the BEAST analysis were imported together into TreeTime v0.11.3 for ancestral sequence reconstruction. Finally, we marked the mutations at the positively selected sites on the tree nodes of the previously constructed MCC tree, observing the historical evolution pattern of positively selected mutation sites.
Polymorphism analysis of amino acid variation sites
The S gene sequences of the predominant genotypes of HCoV-OC43 before and after the pandemic were aligned, adjusted, and translated into protein amino acid sequences, followed by a second alignment and adjustment. The majority consensus amino acid sequences were generated using Lasergene v11.1 MegAlign software from DNASTAR, Inc. All alignments and adjustments were performed with MAFFT v7.520 and MEGA v7.0. By comparing the majority consensus amino acid sequences before and after the pandemic, amino acid variation sites were identified, and the amino acid usage frequencies at these sites were calculated. Chi-square tests were performed using IBM SPSS Statistics for Windows v25.0 to evaluate whether the differences in the usage frequencies of major amino acids at variable sites before and after the pandemic were statistically significant.
Prediction of potential linear and conformational B-cell epitopes
Proceed with the same operation as before to obtain the majority consensus amino acid sequence of the full-length S protein of the predominant genotypes. Protein secondary structure was predicted using the online server SOPMA (https://npsa.lyon.inserm.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html) with the number of conformational states set to 4 (Helix, Sheet, Turn, Coil) and other parameters defaulted. Protein parameters, including hydrophobicity, flexibility, antigenic index, and surface probability, were predicted using Lasergene v11.1 Protean software from DNASTAR, Inc. Linear B-cell epitopes were predicted using the online server BepiPred-3.0 (https://services.healthtech.dtu.dk/services/BepiPred-3.0/) with higher confidence (top 20%). Finally, potential linear B-cell epitopes were identified as those predicted epitopes satisfying the following two conditions: not within alpha-helix or beta-bridge regions of the protein secondary structure; in the regions with flexibility > 0, antigenic index > 0, surface probability > 0, and hydrophobicity < 0.
Based on the majority consensus amino acid sequences, the three-dimensional (3D) structure of the spike protein was predicted using the AlphaFold server (https://alphafoldserver.com/). Conformational B-cell epitopes were subsequently predicted from the 3D structure using the DiscoTope-3.0 server (https://services.healthtech.dtu.dk/services/DiscoTope-3.0/) with default parameters.
Antigen–antibody binding analysis
The cryo-EM-resolved 3D structure of the HCoV-OC43 spike protein in complex with the host cell 9-O-Ac-Sia sialic acid receptor (PDB: 6NZK) was obtained from the Protein Data Bank (PDB, https://www.rcsb.org/) and analyzed for receptor-binding regions based on the 3D structure and the source literature [67]. For antibodies against SARS-CoV-2 with broadly neutralizing activity but lacking resolved antigen–antibody complex structures with HCoV-OC43, antigen–antibody binding analyses were performed using the AlphaFold server based on amino acid sequences. Representative antibodies analyzed in this study included S2P6 [21], COV44–62 [22], COV44–79 [22], and 76E1 [23], referring to the related source literature [21–23] and the antigen–antibody complexes available in the PDB (PDB: 7RNJ, 8D36, 8DAO, and 7X9E). Protein structure visualization and annotation were conducted using ChimeraX v1.8 [68].
Supporting information
S1 Fig. Maximum-likelihood phylograms of the predominant genotypes J and K.
A-D. Maximum-likelihood (ML) trees based on the WG, S, RdRp, and N genes of HCoV-OC43 genotype K (expressed at the N gene level as genotype I) (A. WG ML tree; B. S ML tree; C. RdRp ML tree; D. N ML tree). E-H. ML trees based on the WG, S, RdRp, and N genes of HCoV-OC43 genotype J (E. WG ML tree; F. S ML tree; G. RdRp ML tree; H. N ML tree). Branch lengths are proportional to the number of substitutions per site, allowing visualization of relative genetic distances among sequences.
https://doi.org/10.1371/journal.pntd.0014109.s001
(PDF)
S2 Fig. Potential linear B-cell epitopes of the predominant genotypes J and K on the S protein of HCoV-OC43.
A. potential linear B-cell epitopes of genotype J on the S protein. B. potential linear B-cell epitopes of genotype K on the S protein. The letter h in blue lowercase indicates the alpha helix. The letter e in red lowercase indicates the extended strand. The letter t in green lowercase indicates the beta turn. The letter c in yellow lowercase indicates the random coil. The orange underlining indicates the peptide predicted by BepiPred-3.0. The green box indicates the peptide predicted by SOPMA. The red box indicates the peptide predicted by Protean.
https://doi.org/10.1371/journal.pntd.0014109.s002
(TIF)
S3 Fig. Structural analysis of HCoV-OC43 binding to the 9-O-Ac-Sia receptor based on the PDB database.
A. the 9-O-Ac-Sia receptor binding region on the S protein of HCoV-OC43 (PDB: 6NZK). B-C. panels B and C show enlarged views of selected areas in panel A. Hydrogen-bonding interaction residues are highlighted in orange. Blue dashed lines indicate hydrogen bonds. The structural interpretations referenced the research of Tortorici, M. A. et al [67].
https://doi.org/10.1371/journal.pntd.0014109.s003
(TIF)
S1 Table. Summary of the same virus strains of HCoV-OC43 based on WG, S, RdRp, and N genotyping.
https://doi.org/10.1371/journal.pntd.0014109.s004
(XLSX)
S2 Table. Nucleotide substitution rates and tMRCA of the HCoV-OC43 predominant genotypes.
https://doi.org/10.1371/journal.pntd.0014109.s005
(XLSX)
S3 Table. Sites under positive selection identified by selection pressure analysis of HCoV-OC43.
https://doi.org/10.1371/journal.pntd.0014109.s006
(XLSX)
S4 Table. Summary of potential linear B-cell epitopes in the spike protein of HCoV-OC43.
https://doi.org/10.1371/journal.pntd.0014109.s007
(XLSX)
S5 Table. Summary of potential conformational B-cell epitopes in the spike protein of HCoV-OC43.
https://doi.org/10.1371/journal.pntd.0014109.s008
(XLSX)
S6 Table. The reference genotypes of HCoV-OC43 in this study from references.
https://doi.org/10.1371/journal.pntd.0014109.s009
(XLSX)
Acknowledgments
We acknowledge the contributions of scientists and researchers from all over the world for depositing their HCoV-OC43 sequences in the National Center for Biotechnology Information. We are grateful to Professor Hongjie Yu from the School of Public Health at Fudan University, a leading scholar in infectious disease epidemiology, for his valuable guidance and support during the revision of this manuscript.
References
- 1. McIntosh K, Dees JH, Becker WB, Kapikian AZ, Chanock RM. Recovery in tracheal organ cultures of novel viruses from patients with respiratory disease. Proc Natl Acad Sci U S A. 1967;57(4):933–40. pmid:5231356
- 2. Kesheh MM, Hosseini P, Soltani S, Zandi M. An overview on the seven pathogenic human coronaviruses. Rev Med Virol. 2022;32(2):e2282. pmid:34339073
- 3. Park S, Lee Y, Michelow IC, Choe YJ. Global Seasonality of Human Coronaviruses: A Systematic Review. Open Forum Infect Dis. 2020;7(11):ofaa443. pmid:33204751
- 4. Dijkman R, Jebbink MF, Gaunt E, Rossen JWA, Templeton KE, Kuijpers TW, et al. The dominance of human coronavirus OC43 and NL63 infections in infants. J Clin Virol. 2012;53(2):135–9. pmid:22188723
- 5. Zhang Y, Li J, Xiao Y, Zhang J, Wang Y, Chen L, et al. Genotype shift in human coronavirus OC43 and emergence of a novel genotype by natural recombination. J Infect. 2015;70(6):641–50. pmid:25530469
- 6. Venter M, Lassaunière R, Kresfelder TL, Westerberg Y, Visser A. Contribution of common and recently described respiratory viruses to annual hospitalizations in children in South Africa. J Med Virol. 2011;83(8):1458–68. pmid:21678450
- 7. Ren L, Gonzalez R, Xu J, Xiao Y, Li Y, Zhou H, et al. Prevalence of human coronaviruses in adults with acute respiratory tract infections in Beijing, China. J Med Virol. 2011;83(2):291–7. pmid:21181925
- 8. Gaunt ER, Hardie A, Claas ECJ, Simmonds P, Templeton KE. Epidemiology and clinical presentations of the four human coronaviruses 229E, HKU1, NL63, and OC43 detected over 3 years using a novel multiplex real-time PCR method. J Clin Microbiol. 2010;48(8):2940–7. pmid:20554810
- 9. Nickbakhsh S, Ho A, Marques DFP, McMenamin J, Gunson RN, Murcia PR. Epidemiology of Seasonal Coronaviruses: Establishing the Context for the Emergence of Coronavirus Disease 2019. J Infect Dis. 2020;222(1):17–25. pmid:32296837
- 10. Xia S, Yan L, Xu W, Agrawal AS, Algaissi A, Tseng C-TK, et al. A pan-coronavirus fusion inhibitor targeting the HR1 domain of human coronavirus spike. Sci Adv. 2019;5(4):eaav4580. pmid:30989115
- 11. Oong XY, Ng KT, Takebe Y, Ng LJ, Chan KG, Chook JB, et al. Identification and evolutionary dynamics of two novel human coronavirus OC43 genotypes associated with acute respiratory infections: phylogenetic, spatiotemporal and transmission network analyses. Emerg Microbes Infect. 2017;6(1):e3. pmid:28050020
- 12. Lau SKP, Lee P, Tsang AKL, Yip CCY, Tse H, Lee RA, et al. Molecular epidemiology of human coronavirus OC43 reveals evolution of different genotypes over time and recent emergence of a novel genotype due to natural recombination. J Virol. 2011;85(21):11325–37. pmid:21849456
- 13. Lin P, Wang M, Wei Y, Kim T, Wei X. Coronavirus in human diseases: Mechanisms and advances in clinical treatment. MedComm (2020). 2020;1(3):270–301. pmid:33173860
- 14. Zhang R, Wang K, Ping X, Yu W, Qian Z, Xiong S, et al. The ns12.9 Accessory Protein of Human Coronavirus OC43 Is a Viroporin Involved in Virion Morphogenesis and Pathogenesis. J Virol. 2015;89(22):11383–95. pmid:26339053
- 15. Mounir S, Labonté P, Talbot PJ. Characterization of the nonstructural and spike proteins of the human respiratory coronavirus OC43: comparison with bovine enteric coronavirus. Adv Exp Med Biol. 1993;342:61–7. pmid:8209772
- 16. Masters PS. The molecular biology of coronaviruses. Adv Virus Res. 2006;66:193–292. pmid:16877062
- 17. Kim MI, Lee C. Human Coronavirus OC43 as a Low-Risk Model to Study COVID-19. Viruses. 2023;15(2):578. pmid:36851792
- 18. De Thoisy A, Woudenberg T, Pelleau S, Donnadieu F, Garcia L, Pinaud L, et al. Seroepidemiology of the Seasonal Human Coronaviruses NL63, 229E, OC43 and HKU1 in France. Open Forum Infect Dis. 2023;10(7):ofad340. pmid:37496603
- 19. Soni MK, Migliori E, Fu J, Assal A, Chan HT, Pan J, et al. The prospect of universal coronavirus immunity: characterization of reciprocal and non-reciprocal T cell responses against SARS-CoV2 and common human coronaviruses. Front Immunol. 2023;14:1212203. pmid:37901229
- 20. Dangi T, Palacio N, Sanchez S, Park M, Class J, Visvabharathy L, et al. Cross-protective immunity following coronavirus vaccination and coronavirus infection. J Clin Invest. 2021;131(24):e151969. pmid:34623973
- 21. Pinto D, Sauer MM, Czudnochowski N, Low JS, Tortorici MA, Housley MP, et al. Broad betacoronavirus neutralization by a stem helix-specific human antibody. Science. 2021;373(6559):1109–16. pmid:34344823
- 22. Dacon C, Tucker C, Peng L, Lee C-CD, Lin T-H, Yuan M, et al. Broadly neutralizing antibodies target the coronavirus fusion peptide. Science. 2022;377(6607):728–35. pmid:35857439
- 23. Sun X, Yi C, Zhu Y, Ding L, Xia S, Chen X, et al. Neutralization mechanism of a human antibody with pan-coronavirus reactivity including SARS-CoV-2. Nat Microbiol. 2022;7(7):1063–74. pmid:35773398
- 24. Piñana M, González-Sánchez A, Andrés C, Vila J, Creus-Costa A, Prats-Méndez I, et al. Genomic evolution of human respiratory syncytial virus during a decade (2013-2023): bridging the path to monoclonal antibody surveillance. J Infect. 2024;88(5):106153. pmid:38588960
- 25. Leung NHL. Transmissibility and transmission of respiratory viruses. Nat Rev Microbiol. 2021;19(8):528–45. pmid:33753932
- 26. Zhang J, Litvinova M, Liang Y, Wang Y, Wang W, Zhao S, et al. Changes in contact patterns shape the dynamics of the COVID-19 outbreak in China. Science. 2020;368(6498):1481–6. pmid:32350060
- 27. Delamater PL, Street EJ, Leslie TF, Yang YT, Jacobsen KH. Complexity of the Basic Reproduction Number (R0). Emerg Infect Dis. 2019;25(1):1–4. pmid:30560777
- 28. Li Q, Guan X, Wu P, Wang X, Zhou L, Tong Y, et al. Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia. N Engl J Med. 2020;382(13):1199–207. pmid:31995857
- 29. Wu JT, Leung K, Leung GM. Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study. Lancet. 2020;395(10225):689–97. pmid:32014114
- 30. Walsh EE, Shin JH, Falsey AR. Clinical impact of human coronaviruses 229E and OC43 infection in diverse adult populations. J Infect Dis. 2013;208(10):1634–42. pmid:23922367
- 31. Park M, Cook AR, Lim JT, Sun Y, Dickens BL. A systematic review of COVID-19 epidemiology based on current evidence. J Clin Med. 2020;9(4):967. pmid:32244365
- 32. Ziebuhr J, Snijder EJ, Gorbalenya AE. Virus-encoded proteinases and proteolytic processing in the Nidovirales. J Gen Virol. 2000;81(Pt 4):853–79. pmid:10725411
- 33. Ma W, Fu H, Jian F, Cao Y, Li M. Immune evasion and ACE2 binding affinity contribute to SARS-CoV-2 evolution. Nat Ecol Evol. 2023;7(9):1457–66. pmid:37443189
- 34. Li G, Hilgenfeld R, Whitley R, De Clercq E. Therapeutic strategies for COVID-19: progress and lessons learned. Nat Rev Drug Discov. 2023;22(6):449–75. pmid:37076602
- 35. Hashemian SMR, Sheida A, Taghizadieh M, Memar MY, Hamblin MR, Bannazadeh Baghi H. Paxlovid (Nirmatrelvir/Ritonavir): A new approach to Covid-19 therapy?. Biomed Pharmacother. 2023;162:114367. pmid:37018987
- 36. Painter GR, Natchus MG, Cohen O, Holman W, Painter WP. Developing a direct acting, orally available antiviral agent in a pandemic: the evolution of molnupiravir as a potential treatment for COVID-19. Curr Opin Virol. 2021;50:17–22. pmid:34271264
- 37. Painter WP, Holman W, Bush JA, Almazedi F, Malik H, Eraut NCJE, et al. Human Safety, Tolerability, and Pharmacokinetics of Molnupiravir, a Novel Broad-Spectrum Oral Antiviral Agent with Activity Against SARS-CoV-2. Antimicrob Agents Chemother. 2021;65(5):e02428-20. pmid:33649113
- 38. Blair HA. Remdesivir: A Review in COVID-19. Drugs. 2023;83(13):1215–37. pmid:37589788
- 39. Sheahan TP, Sims AC, Zhou S, Graham RL, Pruijssers AJ, Agostini ML, et al. An orally bioavailable broad-spectrum antiviral inhibits SARS-CoV-2 in human airway epithelial cell cultures and multiple coronaviruses in mice. Sci Transl Med. 2020;12(541):eabb5883. pmid:32253226
- 40. Wahl A, Gralinski LE, Johnson CE, Yao W, Kovarova M, Dinnon KH 3rd. SARS-CoV-2 infection is effectively treated and prevented by EIDD-2801. Nature. 2021;591(7850):451–7. pmid:33561864
- 41. Korber B, Fischer WM, Gnanakaran S, Yoon H, Theiler J, Abfalterer W, et al. Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus. Cell. 2020;182(4):812-827.e19. pmid:32697968
- 42. Li Q, Wu J, Nie J, Zhang L, Hao H, Liu S, et al. The Impact of Mutations in SARS-CoV-2 Spike on Viral Infectivity and Antigenicity. Cell. 2020;182(5):1284-1294.e9. pmid:32730807
- 43. Liu H, Wei P, Zhang Q, Chen Z, Aviszus K, Downing W, et al. 501Y.V2 and 501Y.V3 variants of SARS-CoV-2 lose binding to bamlanivimab in vitro. MAbs. 2021;13(1):1919285. pmid:34074219
- 44. Sola I, Almazán F, Zúñiga S, Enjuanes L. Continuous and Discontinuous RNA Synthesis in Coronaviruses. Annu Rev Virol. 2015;2(1):265–88. pmid:26958916
- 45. Yang Y, Yan W, Hall AB, Jiang X. Characterizing Transcriptional Regulatory Sequences in Coronaviruses and Their Role in Recombination. Mol Biol Evol. 2021;38(4):1241–8. pmid:33146390
- 46. Kistler KE, Bedford T. Evidence for adaptive evolution in the receptor-binding domain of seasonal coronaviruses OC43 and 229e. Elife. 2021;10:e64509. pmid:33463525
- 47. Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31(10):1674–6. pmid:25609793
- 48. Alonge M, Lebeigle L, Kirsche M, Jenike K, Ou S, Aganezov S, et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 2022;23(1):258. pmid:36522651
- 49. Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30(14):3059–66. pmid:12136088
- 50. Kumar S, Stecher G, Tamura K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol Biol Evol. 2016;33(7):1870–4. pmid:27004904
- 51. Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol Biol Evol. 2020;37(5):1530–4. pmid:32011700
- 52. Letunic I, Bork P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;49(W1):W293–6. pmid:33885785
- 53. Kin N, Miszczak F, Lin W, Gouilh MA, Vabret A, EPICOREM Consortium. Genomic Analysis of 15 Human Coronaviruses OC43 (HCoV-OC43s) Circulating in France from 2001 to 2013 Reveals a High Intra-Specific Diversity with New Recombinant Genotypes. Viruses. 2015;7(5):2358–77. pmid:26008694
- 54. Ren L, Zhang Y, Li J, Xiao Y, Zhang J, Wang Y, et al. Genetic drift of human coronavirus OC43 spike gene during adaptive evolution. Sci Rep. 2015;5:11451. pmid:26099036
- 55. Abidha CA, Nyiro J, Kamau E, Abdullahi O, Nokes DJ, Agoti CN. Transmission and evolutionary dynamics of human coronavirus OC43 strains in coastal Kenya investigated by partial spike sequence analysis, 2015-16. Virus Evolution. 2020;6(1):veaa031. pmid:32523779
- 56. Zhu Y, Li C, Chen L, Xu B, Zhou Y, Cao L, et al. A novel human coronavirus OC43 genotype detected in mainland China. Emerg Microbes Infect. 2018;7(1):173. pmid:30377292
- 57. Zhang Z, Liu W, Zhang S, Wei P, Zhang L, Chen D, et al. Two novel human coronavirus OC43 genotypes circulating in hospitalized children with pneumonia in China. Emerg Microbes Infect. 2022;11(1):168–71. pmid:34907853
- 58. Lau SKP, Li KSM, Li X, Tsang K-Y, Sridhar S, Woo PCY. Fatal Pneumonia Associated With a Novel Genotype of Human Coronavirus OC43. Front Microbiol. 2022;12:795449. pmid:35095806
- 59. Alamri KA, Farrag MA, Aziz IM, Dudin GA, Mohammed AA, Almajhdi FN. Prevalence of Human Coronaviruses in Children and Phylogenetic Analysis of HCoV-OC43 during 2016-2022 in Riyadh, Saudi Arabia. Viruses. 2022;14(12):2592. pmid:36560596
- 60. Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018;4(1):vey016. pmid:29942656
- 61. Sagulenko P, Puller V, Neher RA. TreeTime: Maximum-likelihood phylodynamic analysis. Virus Evol. 2018;4(1):vex042. pmid:29340210
- 62. Rambaut A, Drummond AJ, Xie D, Baele G, Suchard MA. Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7. Syst Biol. 2018;67(5):901–4. pmid:29718447
- 63. Kosakovsky Pond SL, Frost SDW. Not so different after all: a comparison of methods for detecting amino acid sites under selection. Mol Biol Evol. 2005;22(5):1208–22. pmid:15703242
- 64. Murrell B, Wertheim JO, Moola S, Weighill T, Scheffler K, Kosakovsky Pond SL. Detecting individual sites subject to episodic diversifying selection. PLoS Genet. 2012;8(7):e1002764. pmid:22807683
- 65. Murrell B, Moola S, Mabona A, Weighill T, Sheward D, Kosakovsky Pond SL, et al. FUBAR: a fast, unconstrained bayesian approximation for inferring selection. Mol Biol Evol. 2013;30(5):1196–205. pmid:23420840
- 66. Kosakovsky Pond SL, Poon AFY, Velazquez R, Weaver S, Hepler NL, Murrell B, et al. HyPhy 2.5-A Customizable Platform for Evolutionary Hypothesis Testing Using Phylogenies. Mol Biol Evol. 2020;37(1):295–9. pmid:31504749
- 67. Tortorici MA, Walls AC, Lang Y, Wang C, Li Z, Koerhuis D, et al. Structural basis for human coronavirus attachment to sialic acid receptors. Nat Struct Mol Biol. 2019;26(6):481–9. pmid:31160783
- 68. Pettersen EF, Goddard TD, Huang CC, Meng EC, Couch GS, Croll TI, et al. UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Protein Sci. 2021;30(1):70–82. pmid:32881101