Uncovering potential host proteins and pathways that may interact with eukaryotic short linear motifs in viral proteins of MERS, SARS and SARS2 coronaviruses that infect humans

A coronavirus pandemic caused by a novel coronavirus (SARS-CoV-2) has spread rapidly worldwide since December 2019. Improved understanding and new strategies to cope with novel coronaviruses are urgently needed. Viruses (especially RNA viruses) encode a limited number and size (length of polypeptide chain) of viral proteins and must interact with the host cell components to control (hijack) the host cell machinery. To achieve this goal, the extensive mimicry of SLiMs in host proteins provides an effective strategy. However, little is known regarding SLiMs in coronavirus proteins and their potential targets in host cells. The objective of this study is to uncover SLiMs in coronavirus proteins that are present within host cells. These SLiMs have a high possibility of interacting with host intracellular proteins and hijacking the host cell machinery for virus replication and dissemination. In total, 1,479 SLiM hits were identified in the 16 proteins of 590 coronaviruses infecting humans. Overall, 106 host proteins were identified that may interact with SLiMs in 16 coronavirus proteins. These SLiM-interacting proteins are composed of many intracellular key regulators, such as receptors, transcription factors and kinases, and may have important contributions to virus replication, immune evasion and viral pathogenesis. A total of 209 pathways containing proteins that may interact with SLiMs in coronavirus proteins were identified. This study uncovers potential mechanisms by which coronaviruses hijack the host cell machinery. These results provide potential therapeutic targets for viral infections.


Introduction
Six coronaviruses have been known to infect humans. They are human coronavirus 229E (229E-CoV), human coronavirus NL63 (NL63-CoV), human coronavirus OC43 (OC43-CoV), human coronavirus HKU1 (HKU1-CoV), severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV). HCoV-229E and HKU-NL63 are members of the Alphacoronavirus genus and are responsible for the common cold. HKU-OC43 and HCoV-HKU1 are Betacoronaviruses that cause self-limiting upper respiratory tract infections. SARS-CoV and MERS-CoV cause severe lower respiratory tract infections [1,2]. A coronavirus pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been spreading rapidly worldwide since December 2019. Over 194,000 confirmed cases have been identified up to March 2020 [3]. Increased understanding and new strategies to cope with novel coronaviruses are urgently needed. Short linear motifs (SLiMs) are short protein sequences (3-10 residues) that can mediate protein-protein interactions. Three characteristics differentiate SLiMs from the global domains in proteins [4,5]. The first feature of SLiMs is a functional interaction interface encoded in a short and poorly conserved sequence. The short length of the motifs makes them easy to arise/disappear via mutations (i.e., to appear de novo in unrelated protein sequences). The second characteristic of SLiMs is to increase the richness of functional motifs within a given length of protein sequence. The third characteristic of SLiMs is that the interactions mediated by SLiMs tend to be transient and have low binding affinities because a small number of residues are involved. They are ideal for mediating fast protein-protein interactions, such as interactions between phosphorylation sites on their binding partners in signal transduction pathways. These characteristics provide a flexible molecular basis for rapidly evolved proteins of RNA viruses to produce high versatility.
Viruses (especially RNA viruses) encode a limited number and size (length of polypeptide chain) of viral proteins that must establish complex networks of interactions with the host cell components to control (hijack) the host cell machinery. To achieve this goal, the extensive mimicry of host protein SLiMs provides an effective strategy [6][7][8][9]. Studies such as the identification of SLiMs in influenza A virus ribonucleoproteins [10] and the identification of proteinprotein interactions between HIV-1 and human proteins mediated by SLiMs [11] have been reported. However, little is known regarding SLiMs in coronavirus proteins and their potential targets in host cells. The objective of this study is to uncover SLiMs in coronavirus proteins that are present in intracellular parts of human cells. These SLiMs have a high possibility of interacting with human intracellular proteins and hijacking the human cell machinery for virus replication and dissemination.

Data sets
Sequences of fifteen nonstructural proteins (nsp1-10 and nsp12-16 from the open reading frame 1ab, orf1ab, of coronaviruses) and capsid proteins from 590 full-length genomes of coronaviruses isolated from human hosts were used. These proteins were studied because they perform their functions intracellularly in host cells. Therefore, SLiMs identified in these proteins may have the potential to interact with intracellular proteins of host cells, consequently affecting intracellular pathways in host cells. Protein sequences including 25 229E-CoVs, 60 NL63-CoVs, 39 HKU1-CoVs, 139 OC43-CoVs, 42 SARS-CoVs, 123 MERS-CoVs, 34 SARS-CoV-2s and 128 unclassified coronaviruses (other) were retrieved from the Virus Pathogen Resource (ViPR, https://www.viprbrc.org/) [12]. The accession numbers of virus genomes encoding the viral proteins used in this study are listed in S1 Table. In total, 289 consensus sequences (regular expressions) of SLiMs and information on associated interaction proteins and pathways were retrieved from the Eukaryotic Linear Motif Resource for Functional Sites in Proteins (http://elm.eu.org/) [13].

Data analysis
The data analysis procedures are shown in Fig 1. Sequence manipulation and SLiM identification (by regular expression matching) were performed using Perl scripts written by the authors. Phylogenetic analysis of orf1ab and capsid protein sequences was performed using ClustalX 2.1 using the neighbor-joining algorithm with 1,000 bootstrap replicates [14]. Heat maps were produced using the heatmap.2 function of the gplots package in R (https://www.rproject.org/). The website http://bioinformatics.psb.ugent.be/webtools/Venn/ was used to identify common and unique elements (intersection and union) among data sets.

Identification of SLiMs in coronavirus proteins
In total, 1,479 SLiM hits were identified in the 16 coronavirus proteins ( Table 1). The complete results are listed in S2 Table. As shown in Fig 2, the topologies of SLiM composition trees (Fig  2A and 2B) and phylogenetic trees (protein sequence similarity) (Fig 2C and 2D) are different. These results indicate that SLiM compositions provide different information that could not be observed from the comparison of sequence similarity. According to the SLiM compositions of orf1ab-encoding proteins and capsid proteins, 590 coronaviruses were divided into four groups: (229E-CoV and NL63-CoV), (HKU1-CoV and OC43-CoV), (MERS-CoV and unclassified group in the ViPR database), and (SARS-CoV and SARS-CoV-2). As a consequence, seven groups (g1-g7) of SLiMs were separated from the 1,479 SLiM hits. Since one SLiM can be repeatedly identified in one and multiple viral proteins, nonredundant SLiMs of the seven groups are summarized in S3-S8 Tables.
Group 1 (g1) SLiMs were composed of 42 SLiMs identified in proteins of all 229E-CoVs and NL63-CoVs but not identified in proteins of other coronaviruses (S3 Table). The group 2 (g2) SLiMs were composed of 26 SLiMs identified in proteins of all HKU1-CoVs and OC43-CoVs but not identified in proteins of other coronaviruses (S4 Table). The group 3 (g3) SLiMs were composed of 42 SLiMs identified in proteins of all MERS-CoVs and unclassified coronaviruses but not identified in proteins of other coronaviruses (S5 Table). The group 4 (g4) SLiMs were composed of 46 SLiMs identified in proteins of all SARS-CoVs and SARS--CoV-2s but not identified in proteins of other coronaviruses (S6 Table). The group 5 (g5) SLiMs were composed of 66 SLiMs identified in proteins of all 590 coronaviruses (S7 Table). The group 6 (g6) SLiMs were composed of 12 SLiMs identified in proteins of all SARS-CoVs but not identified in proteins of other coronaviruses (S8 Table). Group 7 (g7) SLiMs were composed of 11 SLiMs identified in proteins of all SARS-CoV-2s but not identified in proteins of other coronaviruses (S9 Table).

Human proteins interacting with SLiMs identified in coronavirus proteins
To reveal the target host proteins that interact with SLiMs identified in coronavirus proteins, the information on SLiM interacting proteins was integrated into the seven groups of identified SLiMs and is shown in S10-S16 Tables. The common and different SLiM interacting proteins between different viral groups are summarized in Fig 3. Fifty-seven human proteins may interact with viral proteins of the SARS-CoV, SARS-CoV-2 and MERS-CoV groups mediated by SLiMs. Thirty-five of the fifty-seven human proteins interacting with viral proteins of SARS-CoV, SARS-CoV-2 and MERS-CoV are listed in Table 2. Another twenty-two of the fifty-seven human proteins interacting with viral proteins of SARS-CoV, SARS-CoV-2 or MERS-CoV are listed in Table 3. These results uncovered human intracellular proteins that may be specifically targeted by SARS-CoV, SARS-CoV-2 or MERS-CoV through SLiMs.

Potential pathways may be targeted and affected by SLiMs in coronavirus proteins
To reveal the potential effects of SLiMs identified in viral proteins, information on SLiM-interacting protein-associated pathways was integrated into the seven groups of identified SLiMs and is shown in S17-S23 Tables. The common Table 4. These pathways may be specifically targeted by SARS-CoV, SARS-CoV-2 or MERS--CoV through SLiMs.
Three pathways (hsa04340: Hedgehog signaling pathway, hsa04723: Retrograde endocannabinoid signaling and hsa04970: Salivary secretion) may be targeted by SLiMs that are present only in MERS-CoV, SARS-CoV and SARS-CoV-2 proteins ( Table 2). Evidence is emerging indicating that the Hedgehog (Hh) signaling pathway is involved in postnatal processes such as tissue repair and adult immune responses. Therefore, Hh signaling has been reported as a target for some pathogens to control the local infected environment through the pathway [15,16]. These studies indicate that select populations of immune cells are primed to respond to the Hh signal, often resulting in proliferation and upregulation of a subset of cytokines. Upregulated Hh signaling, as occurs in certain tissues during HBV, HCV, EBV, and HIV infections [17][18][19][20], may promote an environment for a modest replication shift to an environment that supports high-level replication. In contrast, limited pathway activation, such as influenza A virus infection, may suppress the immune response to evade it and/or protect the host from detrimental outcomes [21]. The retrograde endocannabinoid system (RECS) contains cannabinoid receptors CB1 and CB2. CB-2 receptors are most abundantly found in the immune system (spleen, tonsils, thymus glands and immune cells, including macrophages and leucocytes). The immediate effective actions of endocannabinoids on immune functions are due to location of immune cells and CB2 receptors throughout the body at localized sites [22]. Cannabinoids exhibit profoundly immunosuppressive effects on inflammatory and cellular antiviral responses. Many viral infection studies, both in vitro and in vivo, have demonstrated that cannabinoid treatment leads to disease progression, increased pathology, and sometimes host death [23]. Therefore, in many clinical settings, including acute infections caused by influenza A viruses and persistent infection of the liver caused by hepatitis C virus, cannabinoids lead to worsened disease outcome [24,25]. Targeting retrograde endocannabinoid signaling by SLiMs in coronavirus proteins may be beneficial for coronavirus infections.
The salivary gland has been proposed as a target of SARS-CoV and SARS-CoV-2 infections because the angiotensin-converting enzyme 2 (ACE2) protein is expressed in the salivary gland duct epithelium [26,27]. Several reports suggest the use of saliva for the diagnosis and monitoring of SARS-CoV and SARS-CoV-2 infections [28][29][30]. Targeting the salivary pathway by SLiMs in coronavirus proteins may be beneficial for coronavirus transmission.

Discussion
Viruses are obligate parasites completely dependent on host cells for replication and dissemination. Infection with viruses activates a series of host antiviral responses to inhibit viral replication and dissemination. Viruses have to evolve mechanisms to evade and subvert those host antiviral responses. Encoding SLiMs in viral proteins provides a solution to hijack, mimic and/ or manipulate intracellular regulatory processes such as signal transduction, cell cycle, DNA damage repair and host immune responses [6][7][8][9]. It has been reported that SARS-CoV can induce G0/G1-phase arrest of infected host cells [38][39][40][41]. The SARS-CoV 3b nonstructural protein can induce cell cycle arrest at the G0/G1 phase [42]. Additionally, the SARS-CoV 7a nonstructural protein can inhibit cell growth and induce G0/G1-phase arrest. Expression of 7a was shown to decrease the levels of cyclin D3 and inhibit phosphorylation of pRb [38]. Moreover, many cell cycle-associated proteins, such as cyclin, cdk and E2Fs, were found to be targeted by proteins of RNA viruses [39][40][41]. Activation of the DNA damage response by RNA viruses (hepatitis C virus, influenza A virus, human immunodeficiency virus 1 and human Tcell lymphotropic virus 1) has been reported [43]. The Wnt pathway, a key pathway in cell signaling, has been reported to be dysregulated by DNA viruses such as Epstein-Barr virus, hepatitis B virus and human papillomavirus and RNA viruses, including hepatitis C virus and human immunodeficiency virus [44]. All of these proteins and pathways were identified in this study (Tables 2 and 3). The proteins listed in Table 2   The relationships between SLiMs, SLiM interacting proteins and SLiM interacting proteinassociated pathways are multiple SLiMs to multiple interacting proteins to multiple pathways (a network) rather than a one-to-one connection (S17-S23 Tables). Although 26 SLiM-interacting proteins were identified only in SARS-CoV and/or SARS-CoV-2 proteins ( Table 2, g4, g6 and g7) and 18 SLiM-interacting proteins were identified only in MERS-CoV proteins ( Table 2, g3), there are very few pathways that are targeted specifically only by SLiMs identified in MERS-CoV, SARS-CoV and/or SARS-CoV-2. The results shown in S15 Table indicate that more potential interactions may exist between SLiMs in SARS-CoV proteins and their interacting proteins (such as JUN, NFATC1, CASP9, CREB3, PTEN, ROCK1, CD40, RB1, E2F1 and MAP2K4). In contrast, the results shown in S16 Table indicate that more potential interactions may exist between SLiMs in SARS-CoV-2 proteins and their interacting proteins (such as JUN, MYC, CCNE1, POLD3, HIF1A, CDC20, CDC25C, MAPKAPK2, PTTG1, SREBF1, etc.). As a consequence, different amounts and compositions of SLiMs in different types and strains of coronaviruses may lead to different strengths, different target sites and different effects on host intracellular pathways.
Categories and numbers of pathways containing proteins interacting with SLiMs identified in 16 coronavirus proteins are summarized in Fig 5. Four cell growth and death pathways affected by coronavirus proteins have been reported [38][39][40][41][42][43][44]. Sixteen immunity pathways, 27 signal transduction pathways and 17 endocrine system-associated pathways suggest that multiple key regulator pathways in host cells may be targeted by viral proteins through SLiMs. Moreover, 26 infectious disease-associated pathways suggest that key regulatory pathways targeted by different pathogens may be common. Forty-seven noninfectious disease-associated pathways suggest that key regulators among these pathogenic pathways may be common. These potential key regulators (and may be potential therapeutic targets of viral infections) are present in the lists of SLiM interacting proteins in this study (Table 2 and S10-S16 Tables).

Conclusion
This study revealed interacting proteins and associated pathways that may be targeted by coronavirus proteins by SLiMs. Different amounts and compositions of SLiMs in different types and strains of coronaviruses may lead to different target sites, different target strengths (variation in affinity of protein-protein interaction, number of target sites in a pathway), and as a consequence, different effects on human intracellular proteins and pathways. These results provide potential targets (virus-host protein interactions) to design antiviral strategies.
Supporting information S1