The proximal proteome of 17 SARS-CoV-2 proteins links to disrupted antiviral signaling and host translation

Viral proteins localize within subcellular compartments to subvert host machinery and promote pathogenesis. To study SARS-CoV-2 biology, we generated an atlas of 2422 human proteins vicinal to 17 SARS-CoV-2 viral proteins using proximity proteomics. This identified viral proteins at specific intracellular locations, such as association of accessary proteins with intracellular membranes, and projected SARS-CoV-2 impacts on innate immune signaling, ER-Golgi transport, and protein translation. It identified viral protein adjacency to specific host proteins whose regulatory variants are linked to COVID-19 severity, including the TRIM4 interferon signaling regulator which was found proximal to the SARS-CoV-2 M protein. Viral NSP1 protein adjacency to the EIF3 complex was associated with inhibited host protein translation whereas ORF6 localization with MAVS was associated with inhibited RIG-I 2CARD-mediated IFNB1 promoter activation. Quantitative proteomics identified candidate host targets for the NSP5 protease, with specific functional cleavage sequences in host proteins CWC22 and FANCD2. This data resource identifies host factors proximal to viral proteins in living human cells and nominates pathogenic mechanisms employed by SARS-CoV-2.


Introduction
Other coronaviruses have also been shown to block host translation [10,11], inhibit interferon signaling [12,13], antagonize viral RNA sensing [14,15], and degrade host mRNAs [16]. The degree of homology between SARS-CoV-2 and other coronaviruses, suggests the existence of both shared and divergent host protein interactions between its viral proteins and those of the other members of the coronavirus family.
Here we used proximity proteomics to identify the human proteins vicinal to 17 major SARS-CoV-2 proteins and, from that data and validation studies, to predict their likely location and function. We examined the intersection of the resulting atlas of human factors adjacent to SARS-CoV-2 viral proteins with risk loci associated with severe COVID-19 by genome wide association studies (GWAS). This nominated specific, viral protein-adjacent host candidates whose natural variation in expression may contribute to differences in COVID-19 susceptibility in the population. We also demonstrated that multiple SARS-CoV-2 products can affect host translation and host innate immune signaling and define a list of potential host targets and pathways for the NPS5 protease. Taken together, these resource data plot the location of the 17 major SARS-CoV-2 within the cell, define an atlas of human host proteins adjacent to them, and offer insight into potential pathogenic mechanisms engaged by SARS-CoV-2.

Host proteins proximal to viral proteins and their subcellular localization
To identify the human host proteins vicinal to the 17 major SARS-CoV-2 encoded viral proteins, HA epitope tagged fusions of BASU-BirA [6] were generated with each of these 17 viral ORFs (Fig 1A). BASU was introduced at the N and C terminus to minimize disruption as previously described [17]. Samples were prepared from plasmid-transfected 293T cells after 2 hours of biotin labeling and the biotinylated proteins were then isolated using streptavidin. Samples were divided for LC-MS/MS and immunoblotting (S1 Fig). MS data search was performed and protein lists were analyzed and scored using the Significance Analysis of Interactome (SAINT) method [18]. Using a cutoff of a SAINT score of 0.9 generated a list of 2421 host proteins compromising 1119 different host proteins. (Figs 1B and S2, and S1 Table) across the 17 viral proteins studied, 513 of which were unique to a specific viral protein. These data comprise a compendium of candidate human proteins adjacent to SARS-CoV-2-encoded proteins.
The identity of these 1119 human proteins provided clues to SARS-CoV-2 biology. Gene ontology (GO) term analysis (Fig 1B, 1C, and 1D and S2 Table) identified processes associated with SARS-CoV-2 viral protein impacts. This included translation initiation, RNA binding, the 26S proteasome, signaling, and SNARE-associated intracellular transport. It also identified adjacencies to major histocompatibility (MHC) proteins and components of the nuclear pore complex (NPC). A number of these processes, such as protein translation, are known processes affected by coronaviruses, while others, such as RNA-binding, are less well characterized.
To begin to map putative localizations for the 17 studied SARS-CoV-2 proteins within the cell, cellular component GO-term enrichment analysis was performed (Fig 2A), which pointed to possible intracellular localizations for each viral protein based on curated knowledge of the host proteins identified adjacent to each viral protein. To validate and extend this, protein fractions were prepared from cells expressing each SARS-CoV-2 protein studied. These included four overlapping fractions: a) cytoplasm b) cytoplasm/membrane c) nucleus/membrane, and d) nucleus (Fig 2B). Integrating GO-term analysis with immunoblotting of these fractions enabled predictions of the likely intracellular localization of each viral protein (Fig 2C and  2D). Many SARS-CoV-2 accessory proteins concentrate in the ER or in ER-proximal Coronavirus proteins are labeled in light blue and virus-host interactions are connected by red edges, while host-host protein interactions obtained from high confidence STRING interactions are labeled in grey. Highlighted node clusters of similar function, including 26S proteasome components (black), MHC Class I (red), nuclear pore (dark blue), RNA-binding (maroon), SNARE complex (purple), translation initiation complex (green) proteins were selected based on GO term analysis. C) Selected biological process GO term enrichment; enrichment scores are given as -Log10 p-values. Selected GO terms are nuclear pore organization, translational initiation, endosomal transport, and RNA splicing. D) Heatmap of molecular function GO term enrichment of SARS-CoV-2 proteins. All presented GO terms have a -Log10 p-value >3 for the Nucleoprotein, the listed non-structural proteins, or the listed open reading frames or a -Log10 p-Value >5 for the M membrane protein. https://doi.org/10.1371/journal.ppat.1009412.g001

PLOS PATHOGENS
SARS-CoV-2 proteomics reveals host pathogen interactions membranes (M, ORF3a, ORF3b, ORF6, ORF7a, ORF7b, ORF8, and ORF10). A number, however, appear to be predominantly cytoplasmic (NSP1, NSP2, NSP5, NSP15, ORF9b) and, interestingly, several appear to localize in part to the nucleus or nuclear membrane (NSP9, NSP14, ORF6, ORF9c). The localization predicted from these data is consistent with observations from other recent work [17,19]. Of the membrane localized proteins, subtle differences in location could be inferred. In the case of M protein, association with membranes in the endocytic pathway as well as lysosomal membranes was predicted. ORF8 and ORF10 clustered similarly with enrichment for ER interactions in the lumen.
Additionally, we sought to confirm the location of select viral proteins during infection. This would allow us to both assess if localization matched in conditions where all viral proteins are expressed and at normal levels, and also to rule out that localization is not different during transient overexpression of our BASU-tagged constructs. Specifically, A549 cells expressing the SARS-CoV-2 receptor, ACE2, were infected with SARS-CoV-2 at an MOI of 0.1, fixed with 4% paraformaldehyde, and stained with antibodies [20] against SARS-CoV-2 proteins (NSP1, NSP14, ORF6) along with host proteins identified from the BIOID (EIF3A, Rae1, Nup98, MCM3AP) and Hoescht for nuclear staining (Fig 3). NSP1 displayed punctate cytoplasmic (Fig 3A), while NSP14 stained both perinuclear and nuclear (Fig 3B), and ORF6 stained mostly perinuclear (Fig 3C). The staining patterns of infected cells largely mirrored that which was observed from the BIOID and fractionation studies. These data indicate that specific SARS-CoV-2 may display increased localization to a variety of intracellular sites, including the cytoplasm, nucleus and distinct endomembranes.
As there exists an orthogonal proteomics dataset [17], our results were compared with the previously published dataset. Rather than using proximity labeling to identify interactors, this study used TAP-MS to identify co-purified interactors of SARS-CoV-2 viral proteins. These two approaches have been found to be complementary [21,22]; where the first approach enriches for stable complexes, while the latter enriches for neighbor proteins in the endogenous environments of the bait. To address this, the identified viral proteomes of the two studies were compared. To do so we first reanalyzed the data from the TAP-MS study [17] to call all hits with a SAINT score � 0.9 as was to call high-confidence interactors equally. This generated a list of 695 host proteins. Comparison between the two studies yielded 960 proteins unique to the current study, 536 unique to the TAP-MS study, and 159 proteins shared between the two (S3 Fig). GO-term analysis for molecular function, biological process, and cellular component were performed on the hits from the TAP-MS data (S4, S5, and S6 Figs). Proximity labeling methods are particularly suited for identifying the location of bait proteins [23,24] therefore, cell component analysis was compared between the two lists ( S7 Fig). Given the high amount of unique hits, it was not surprising to see major differences between cell component enrichment between viral proteins. NSP1 uniquely enriched for hits associated with preinitiation complex and ribosomal components by BIOID, but not by TAP-MS. Contrastingly, NSP9 enriched for similar factors only by TAP-MS. Despite the differences, there were important similarities between the two approaches. For example, ORF6 was found to enrich for nuclear pore components using either method. Taken together for SARS-CoV-2 biology, the two approaches appear to give complementary insights into potential targets and pathways and both are valid for discovery.  CoV-2 viral protein-expressing HEK293T cell fractions; whole cell lysate (WCL), cytosol, cytosol/membrane, nucleus/membrane and nucleus fractions. Alpha-tubulin, calnexin, and histone H3 were used as fractionation controls for cytosol, membrane, and nucleus respectively. Schematic C) and table D) depicting the predicted location of all SARS-CoV-2 proteins surveyed in this study based on both the BioID and fractionation analysis. https://doi.org/10.1371/journal.ppat.1009412.g002

Viral proximal interactors include drug targetable host genes
There is a lack of SARS-CoV-2 specific antiviral therapies or against coronaviruses generally. Many current and experimental therapeutics were developed for activity against other viruses and are being tested for cross efficacy against SARS-CoV-2. Others are therapies known to have broad antiviral effects. There is significant interest in developing drugs that directly target SARS-CoV-2 viral proteins, but research and development may take years before use in patients. Another approach is using drugs against host genes critical to virus infection and replication. For example, drugs targeting ACE-2, the main receptor for SARS-CoV-2, or ACE-2 expression and function have been pursued. To expand the list of possible drugs beyond entry inhibitors, we compared the viral proximal proteome generated in this study against the "druggable" genome, which include databases of the gene targets of available drugs. This generated a list of 47 host genes (S8 Fig and S3 Table) and highlights, as previously reported [17], a group of cellular kinases associated with N protein. The viral nucleocapsid has been shown to be phosphorylated and phosphorylation is suggested to be important for its function [25,26]. This highlights cellular kinase inhibitors as drugs with possible activity against SARS-CoV-2.

GWAS-linked host proteins in the viral proximal proteome
The genetic basis for the wide spectrum of COVID-19 severity in different individuals within the human population is not fully understood. A number of recent genome wide association studies (GWAS) studies have endeavored to map genetic risk loci associated with SARS-CoV-2 infection and COVID-19 clinical severity [27,28]. These studies leverage large numbers of patients to identify SNPs that are correlated with outcomes such as infection and severity of disease, including hospitalization and mortality. Such linkage studies have identified a number of non-coding variants that may perform a regulatory function, for example, by altering expression of effect genes (eGenes) important in host susceptibility to SARS-CoV-2.
To determine if any putative COVID-19 risk-linked regulatory variants might control the expression of host proteins proximal to SARS-CoV-2 viral proteins, the following analysis was performed. Using publicly available data from GWAS studies [27,28], all single nucleotide polymorphisms (SNPs) associated with increased risk of COVID disease that reside in noncoding DNA were identified. These were filtered for variants localized to open chromatin, characteristic of regulatory DNA, in cell types relevant to COVID-19 pathogenesis, including immune and pulmonary cells. The resulting disease risk-linked variants were further distilled to those identified as expression quantitative trail loci (eQTLs) for specific putative eGene targets (Fig 4A). These eGenes, which represent a set of genes whose expression may be controlled by natural variants in the human population linked to COVID-19 risk, were then intersected with the atlas of host factors identified as adjacent to SARS-CoV-2 viral proteins by proximity proteomics. Publicly available protein interaction data was then integrated to project the connectedness of resulting gene set (Fig 4B). The resulting network was notable for host proteins implicated in cytokine signaling, cell cycle control, transcription, and translation, suggesting that genetic susceptibility to COVID-19 may link to variations in the expression of proteins that mediate these processes.
Among proteins identified by this analysis was TRIM4, a RING E3 ligase, that activates type I interferon signaling through activation of the cytosolic RNA sensor RIG-I. TRIM4 was significantly associated with SARS-CoV-2 M protein in proximity proteomics data (S1 Table) and,

PLOS PATHOGENS
SARS-CoV-2 proteomics reveals host pathogen interactions using eQTLgen [29], a regulatory SNP (rs1569055) approximately 230kb downstream of the TRIM4 promoter was recently associated with increased COVID severity in patients [28]. High-C chromatin immunoprecipitation (HiChIP) data from the immortalized B-cell line GM12878 as well as primary T-cell populations demonstrated chromatin looping from the SNP to the TRIM4 promoter (Fig 4C). Looping of this SNP increased contact strength in naïve, T-regs and Th-17 T-cells. TRIM4 is one of a group of ubiquitin ligases [30][31][32][33] that can activate RIG-I during RNA-sensing and subsequent antiviral signaling. Altered expression coupled with disruption by potential association with the SARS-CoV-2 M protein supports a model with the following features; a) individuals with this regulatory variant may express less TRIM4 b) physical association with SARS-CoV-2 M protein further reduces functional TRIM4 c) a relative reduction in biologically active TRIM4 leads to reduced innate immune signaling d) this reduction leads to increased susceptibility to SARS-CoV-2 pathogenesis. Integrating proximity proteomics data with genetic risk eQTL variants may help identify such candidate susceptibility mechanisms for natural variations in disease outcomes within the population.
In addition to GWAS studies, several groups have undertaken genome wide CRISPR screens to identify host factors required for viral replication [34][35][36]. The intersection of significant hits from these CRISPR screens and the viral proteome from this study yielded few hits. Given the role of accessory factors as being non-essential for viral replication, the lack of overlap is not surprising. Nevertheless, NXF1 and MCM3AP, known mediators of mRNA export [37], were found as hits in both CRISPR studies and as proximal interactors of ORF6. NXF1 and MCM3AP are part of an RNA export complex [38] that docks at nuclear pores.

Predicted viral antagonism of host protein translation and antiviral response
NSP1 is a part of the viral polyprotein ORF1 and during normal viral replication is cleaved and liberated by the viral protease NSP3. Earlier work has identified NSP1 of SARS-CoV-1 as a potent inhibitor of translation in a mechanism that involves interactions with the host ribosomes [10,13]. Recently other groups have shown that NSP1 of SARS-CoV-2 similarly blocks translation through interaction with the 40s ribosome [39,40]. High confidence proteins proximal to NSP1 included EIF3A, EIF3B, EIF2G, and EIF4G2 (Fig 5A) of which the first 3 are components of the EIF3 translation initiation complex. Interestingly, members of the EIF3 complex were not identified as high-confidence interactors by traditional TAP-MS studies [17,19]. Having confirmed through immunofluorescent microscopy that NSP1 predominantly resides in the cytosol of infected cells but only weakly co-localized with EIF3A, a member of the translation initiation complex, we sought to test if SARS-CoV-2 NSP1 inhibits host translation, NSP1 was expressed in HEK293T cells followed 24 hours later by transfection of in-vitro transcribed capped and polyadenylated mRNA expressing luciferase. NSP1 reduced luciferase signal by (49.7%) as compared to GFP control (Fig 5B), demonstrating that NSP1 can inhibit host cap-dependent translation, consistent with data reported by others [39,40]. To determine if NSP1 could inhibit translation of host-derived 5' UTRs and host IRES elements, two host UTRs (IFIT1 and ISG15) were subcloned separately upstream of luciferase along with two host IRES sequences (XIAP1 and APAF1) and luciferase measured in cells transfected with or without NSP1 construct. NSP1 reduced luciferase signal of both 5' UTRs (IFIT1 = 55.2%, ISG15 = 53.1%) and IRES elements (XIAP1 = 55.0%, APAF1 = 40.0), indicating a block in translation of these elements (Fig 5B). Lastly, NSP1 effects were tested on the SARS-CoV-2 5' UTR and the Cricket Paralysis Virus (CRPV) IRES. CRPV IRES is a minimal viral-derived PLOS PATHOGENS SARS-CoV-2 proteomics reveals host pathogen interactions IRES that initiates translation completely independent of EIF3. Surprisingly, NSP1 blocked both viral elements (SARS-CoV-2 = 59.1%, CRPV = 52.2%) compared to GFP control. NSP1 therefore exhibits broad translation inhibition of mRNAs containing various regulatory elements, suggesting NSP1 action on the initiating ribosome, however, additional actions, such as mRNA cleavage [10,16], may also be operative. This highlights the utility of using proximal

PLOS PATHOGENS
SARS-CoV-2 proteomics reveals host pathogen interactions proteomic labeling to identify, although they be weak or transient interactions, functional partners of viral proteins that cannot be readily identified by TAP-MS.
Host innate immune detection and signaling pathways are heavily targeted by viral proteins, especially accessory proteins [41]. Mitochondrial Activation of Viral Signaling (MAVS) is a critical signaling adaptor for RIG-I like receptors (RLR) cytosolic sensing pathway [42][43][44][45]. It recruits activated RLR sensors RIG-I and MDA-5 at mitochondrial and mitochondrialproximal membranes and leads to the activation of both IRF3 and NF-κB and expression of type-I interferons [46]. RIG-I and MDA-5 recognize various types of non-host or aberrant RNA species and are critical for host defense against RNA viruses [47]. MAVS was found as a high confidence protein proximal to two SARS-CoV-2 proteins: ORF6 and ORF9b (Fig 5C  and 5D). ORF6 has been found to inhibit type-I interferon SARS-CoV-2 [48,49] and the closely related SARS-CoV-1 [12,50]. One study demonstrated that ORF6 inhibition of type-1 interferon expression was linked to ORF6 binding to nuclear import complex Nup98/Rae1 [49], both of which were also captured as not only proximal interactors, but also were also observed to be localized in similar proximity to ORF6, but not ORF9B. We tested the ability of our SARS-CoV-2 ORF6 and ORF9b constructs to inhibit RLR signaling by co-transfecting constructs expressing ORF6 or ORF9B along with a reporter expressing nanoluciferase under the control of the IFNB1 promoter for interferon β1 along with a second reporter constitutively expressing firefly luciferase. To activate RLR signaling we transfected in a plasmid expressing a truncated version of RIG-I only containing the 2 CARD domains. This truncation is constitutively recruited to MAVS and initiates signaling in absence of any RNA stimulus and will test the viral proteins' ability to block any signaling downstream of sensing. ORF6 significantly inhibited RIG-I 2CARD activation of IFNB1 promoter activity by 96 percent (Fig 5E) while ORF9b showed no effect on inhibiting IFNB1 promoter activity. These data demonstrating ORF6 proximity to MAVS, along with ORF6 inhibition of IFNB1 promoter induction, implicate ORF6 impairment of MAVS in the RLR innate immune signaling pathway.

NSP5 proteomics prediction of potential host cleavage targets
NSP5 is one of two critical proteases encoded by SARS-CoV-2 and is also known as SARS-CoV-2 3CLpro due to its similarity to picornavirus 3C proteases and a number of other +ssRNA viruses. These proteases all contain chymotrypsin-like folds and a triad of residues harboring the critical cysteine residue [51]. 3CLpro-like proteases are considered important therapeutically since they are essential for cleaving large polyprotein products produced by +ssRNA viruses and chemical protease inhibitors may act broadly across members of a given virus family [52,53]. In addition to their necessity in the virus life cycle, many viral proteases can target host proteins and specifically affect antiviral responses or other cellular processes [54][55][56]. Complementing previous efforts to infer targets of the NSP5 protease, we identified 34 host proteins in the NSP5 proximal proteome (S2C Fig and S1 Table). To nominate possible host targets of NSP5 whose levels are decreased upon protease expression, we performed SILAC mass spectrometry comparing wild type SARS-CoV-2 NSP5 to the catalytically-inactive NSP5 C145A mutant [17,57]. Residue 145 is the critical catalytic cysteine and mutation to alanine prevents protease activity [58]. Cell death was observed following transient expression of wild type NPS5, but not NSP5 C145A , so cells were collected 24 hours post-transfection to minimize those effects. Immunoblotting for NSP5 and C145A in whole cell lysates confirmed expression of the proteases in samples from both heavy and light media (S9 Fig). A number of host proteins showed significant depletion in cells expressing wild type NSP5, but not protease-inactive NSP5 C145A (Fig 6A). Combining both data generated identified an additional 26 candidates resulting in a pool of 60 potential host protein targets for NSP5 (Fig 6B).

PLOS PATHOGENS
SARS-CoV-2 proteomics reveals host pathogen interactions Normalized FRET signal is shown comparing HEK293T cells expressing either wild type NSP5 or NSP5 C145A . PP1ab and CWC22 mutant (mut) sequences contain nucleic acid changes that result in QS!AS mutation in the peptide sequence. Data shown is representative of three independent experiments and significance was calculated using Student's T Test. � Indicates p value <0.05, NS not significant. Dose-dependent effect of coronavirus protease inhibitor G376 on cleavage of peptide encoded by PP1ab-2 D) and FANCD2-2 E) sequences. https://doi.org/10.1371/journal.ppat.1009412.g006

SARS-CoV-2 proteomics reveals host pathogen interactions
To begin to examine potential cleavage of these candidate proteins by NSP5, we searched their peptide sequences for potential cleavage sites using a published a cleavage prediction algorithm [59]. We then took these peptide sequences and tested them for cleavage by NSP5 using a loss of fluorescence resonance energy transfer (FRET) fluorescence assay. In brief, potential cleavage sites were inserted between a FRET pair and then this construct was cotransfected along with plasmids expressing either NSP5 or the NSP5 C145A , with loss of FRET signal only after wild type NSP5 expression as indicative of cleavage. This assay can be used to screen potential peptides encoded by host target sequences. To this end four sequences taken from SARS-CoV-2-pp1AB polyprotein, which is normally cleaved by NSP5, were cleaved as expected and as demonstrated by loss of FRET signal (Fig 6C and 6D). Testing of sequences from human CDKN2AIP, CWC22, FANCD2, and P53 proteins indicated NSP5 cleavage of one CWC22 and two FANCD2 peptide sequences (Fig 6C and 6D). Neither CDKN2AIP nor P53 sequences tested were cleavable by NSP5 in our assay and their depletion in the SILAC data may represent indirect effects of NSP5 activity. CWC22 is a component of the RNA spliceosome required for pre-mRNA splicing via promotion of exon-junction complex assembly [60,61]. FANCD2 is activated by ATM and localizes at BRCA1 foci during DNA damage [62]. These data suggest that SARS-CoV-2 may target host RNA splicing and DNA damage pathways via NSP5-mediated reduction in key proteins, namely CWC22 and FANCD2, that are involved in these processes.
As noted, viral protease inhibitors are a powerful class of drugs that potently block viral replication by preventing processing of viral polyproteins into functional subunits. Inhibition of viral proteases should also prevent cleavage of host proteins which may serve to blunt toxic effects on infected cells. GC376 [63] is a NSP5 protease inhibitor developed against feline coronavirus, the causative agent of fatal feline infectious peritonitis. Recent reports showed GC376 to be effective against SARS-CoV-2 NSP5. We tested the effect of GC376 on the cleavage of the ORF1ab-2 and FANCD2-2 peptide sequences. Using a range of concentrations up to 80 μM, ORF1ab-2 showed a modest inhibition of NSP5 as compared to NSP5 C145A in the FRET assay. FANCD2-2 showed a dramatic reduction in cleavage by NSP5 even at concentrations of 20 μM (Fig 6E and 6F). These data support GC376 inhibition of SARS-CoV-2 NSP5 action on viral and human host protein sequences cleavable by the viral protease.

Discussion
Here we present a compendium of human host proteins adjacent to 17 SARS-CoV-2 viral proteins, with a goal to offer insight into potential mechanisms that these viral proteins may engage during pathogenesis. These data encompass the less well understood SARS-CoV-2 accessory factors and predict the localization of each these viral proteins as well as identify significant adjacencies to proteins that mediate core cellular processes, including translation, signaling, RNA interactions, and intracellular transport. Further validation of viral protein localization was confirmed by immunofluorescent microscopy for select proteins in infected cells. For translation, SARS-CoV-2 NSP1 was found to be adjacent to subunits of the EIF3 translation initiation complex and proved a broad inhibitor of translation. For innate immune signaling, viral ORF6 was found proximal to the RLR pathway component, MAVS, with ORF6 potently inhibiting induction of the IFNB1 promoter. Integration of GWAS data in COVID-19 identified SNPs associated with natural variation in the expression of specific genes, including the viral M protein-proximal TRIM4 activator of type I interferon, that may contribute to disease susceptibility differences in the human population. Comparing wild type NSP5 with its catalytically inactive point mutant helped identify proteins whose levels were decreased by this viral protease and nominated cleavage sequences in human CWC22 and FANCD2,

PLOS PATHOGENS
SARS-CoV-2 proteomics reveals host pathogen interactions implicating specific candidates for viral disruption of normal pre-mRNA splicing and DNA damage pathways, respectively. We also observed a number of SARS-CoV-2 proteins (M, NSP2, NSP9, NSP15, ORF6, ORF7a, ORF7b, ORF8, ORF9c, and ORF10) vicinal to nuclear pore proteins. Given that coronavirus replication takes place exclusively in the cytosol of cells, these interactions, if functional, might point to a viral role in disrupting nuclear import/export. This is further supported by several lines of genetic evidence. GWAS data suggest the importance of nuclear pore component Nup43 and overlap of our proteomics data with whole genome CRISPR screen hits from recently studies [34][35][36] suggests that the mRNA export factors MCM3AP and NXF1 are important for viral replication. Taken together, these data indicate potential intracellular locations and candidate functions of the SARS-CoV-2 viral proteins studied and provide a resource for future studies of pandemic coronaviruses.
SARS-CoV-2, as the etiological agent of COVID-19, joins SARS-CoV and MERS-CoV as an important coronavirus pathogen. Very minor mutations in the viral spike protein [64][65][66][67] along with a number of animal reservoirs in endemic regions represent a significant risk for new pandemic coronavirus strains to emerge [68], underscoring the need to understand coronaviral accessory protein functions and virus-host interactions. Comparative studies that analyze multiple coronaviruses [19,69], including both highly pathogenic and nonpathogenic, will be very beneficial to understanding what can identify new possibly pandemic virus strains. Such resources may allow the research community to not only address current concerns and also provide insight to address with newly emerging coronaviruses in the future. Currently, the ability of S proteins capable of binding to human ACE2 receptor has been seen both in highly pathogenic coronaviruses such as SARS-CoV and SARS-CoV-2 as well as in relatively non-pathogenic coronaviruses such as HCoV-NL63 [70]. Thus, comparing the actual molecular interactions and effects of viral proteins on the host between pathogenic and non-pathogenic virus strains may provide actual insight on what makes certain coronaviruses more dangerous and highlight critical virus-host interactions that may be targeted to reduce disease.
The viral envelope of SARS-CoV-2 must contain the proper structural components comprised of S, E, M, and N with a completed viral genome [71] and replication of both subgenomic and genomic RNA occurs in membranous compartments [72]. Accordingly, coronaviruses devote substantial portions of its large genome to manipulating host processes involved in ER-Golgi transport and endocytic and exocytic activity, which was captured in the proximal interactome. We also found evidence of the interaction of SARS-CoV-2 with MHC class I molecules with M, ORF7a, ORF7b, ORF8, and ORF10. Down-regulation of surface expressed proteins has been reported for SARS [73]. It is still an open question to how SARS-CoV-2 affects surface expression of important host receptors, which viral proteins affect this process, and the effects on virus replication and disease.
Translation inhibition is a general strategy utilized by many virus families including other RNA viruses like orthomyxoviruses [74], picornaviruses [75,76], rhabdoviruses [77], and togaviruses [78]. Host translational blockade may broadly block antiviral responses and can also cause affect the viability of the infected cell. Some but not all viruses have strategies to overcome translational shutoff, biasing translation of viral mRNA, including the use of IRES elements [79]. Lung tissue from COVID patients, in particular, displayed proteomic changes associated with translation inhibition [80]. NSP1 from both SARS-CoV-1 [10] and SARS-CoV-2 [39,40] have been shown to be potent inhibitors of host translation and are thought to do so using at least two mechanisms: binding to and inhibition of EIF3 translation initiation complex and direct cleavage of host mRNAs. Cryo-EM studies place a domain of NSP1 as sitting in the mRNA channel of the 40S ribosome. Our proximity proteomics data shows NSP1 of SARS-CoV-2 binding to a significant number of EIF3 complex subunits and we demonstrate that NSP1 is able to block translation of capped transcripts as well as transcripts

PLOS PATHOGENS
SARS-CoV-2 proteomics reveals host pathogen interactions containing host and viral IRES elements. We also observe, as another study has shown [39], that NSP1-induced translational shutoff affects host and viral transcripts containing the leader sequence found in the viral 5' UTR. This element exists on all genomic and subgenomic viral RNAs [81]. Whether other SARS-CoV-2 factors are necessary to overcome NSP1 translational inhibition or if, during viral replication, the large number of viral transcripts simply outcompetes host transcripts, as seen in vesicular stomatitis virus [82], remains to be determined. A recent study [80] from autopsies of COVID patients characterized whole proteome changes in multiple organs.
Innate immune signaling is a central mechanism of host cell response to viral infection. ORF6 of SARS-CoV-1 [12,50] and SARS-CoV-2 [49] were shown to be potent inhibitors of such antiviral signaling. One proposed mechanism is that ORF6, through association with specific NPCs (Rae1-Nup98), blocks import of activated transcription factors needed to induce IFNB1 transcripts and other primary interferon-stimulated genes. In this regard, we identified MAVS proximal to ORF6 and ORF9b. We observed that ORF6, but not ORF9b, inhibited RLR signaling downstream of RIG-I RNA-binding. However, Nup98/Rae1 is most commonly associated with export of RNAs from the nucleus [83,84]. This study has identified both NSP15 and ORF6 as proximal interactors with many components of the nuclear RNA export pathway including Nup43, Nup98, Rae1, MCM3AP, NXF1, and others. Additional evidence, from both GWAS studies and genome-wide CRISPR screens has identified RNA export as important for severe disease and viral replication. It will be important to reexamine the mechanism of ORF6 in impairment of interferon signaling as well as examining the effects NSP15 may have on RNA export. Thus, targeting the ability of the virus to disrupt host RNA export pathways may be an attractive therapeutic target and warrants more attention. The effects of SARS-CoV-2 on host processes identified in this study are summarized in Fig 7. Viral proteases, such as SARS-CoV-2 NSP5 studied here, have been shown to be potent antiviral targets [85]. These proteases are essential for viral replication and escape has proven difficult in resistance studies [86]. Coronaviruses encode two proteases NSP3 and NSP5, with NSP5 classified as the main protease. They are both necessary for the processing of the ORF1ab polyprotein containing the viral replicase proteins. NSP5 shows similarity to proteases found in picornaviruses and noroviruses [87]. Beyond their importance in viral replication, these viral proteases can target host proteins containing their target residues [88]. NSP5 recognizes certain glutamine-serine/alanine/glycine residues, with added specificity being determined by two to three flanking residues [59]. Picornavirus virulence has been shown to be mediated in part by 3C protease cleavage of host proteins [55]. Using both BioID and SILAC metabolic labeling followed by mass spectrometry, we sought to identify candidate host proteins and use a modified FRET-based cleavage assay to determine if these candidates contained sequences cleavable by NSP5. We identified human CWC22 and FANCD2 as candidates; both proteins contained sequences that could be cleaved by NSP5 in an assay used here which can be used to rapidly assess other potential host targets. The proteomic studies also identified clusters of host factors involved in DNA damage and repair and RNA splicing. Furthermore, we show the effects of GC376 [63], a protease inhibitor of feline coronavirus, displays evidence of inhibition of NSP5 cleavage activity. Consistent with this, GC376 has been shown to block viral replication of SARS-CoV-2 in early studies [89] and we observe that this protease inhibitor blocks NSP5 cleavage of both host and viral target peptide sequences.
The global impacts of the SARS-CoV-2 pandemic have focused attention on identifying new treatments and interventions. Given both the newness of the virus and the relative dearth of research into human coronaviruses, it is important that many resources are generated to better understand aspects of the virus-host interaction. The present work contains a proximal proteomic resource for 17 SARS-CoV-2 viral proteins and combining such proximal PLOS PATHOGENS SARS-CoV-2 proteomics reveals host pathogen interactions proteomics with TAP-based proteomics may be helpful in leveraging the strengths associated with each technique. While proximity proteomics can identify transient, indirect, and weak binding events, including those dependent on intact membranes, TAP-based approaches can focus attention on complexes of proteins that stably associated with each other. We validate the quality of the present proximity data set by corroborating spatial insights with biochemical fractionation experiments. Taken together with other efforts to generate high-quality resources, these data should prove helpful in both generating hypotheses and better understanding dynamics of virus-host interactions in regards to human disease.

Cell culture
HEK293T were obtained from Takara Bio and were cultured on DMEM 10% FBS, 1% Penicillin/Streptomycin and grown at 37C, 5% CO2. For SILAC experiments [90], the cells were cultured in a medium containing [13C6,15N2]-lysine and [13C6]-arginine for at least 2 weeks to promote complete incorporation of the stable isotope-labeled amino acids. Cells were

Transfection, biotin labeling, and streptavidin pulldown
All viral expression constructs were obtained from Addgene [17]. A summary of all plasmids used in this study is listed in Table 1. HA-BASU was cloned in frame with either an N-terminal or C-terminal linker as indicated. For BioID experiments 5x106 HEK293T were plated and transfected with 5ug of each viral expression plasmid. 24 hours post transfection, biotin was added (50 uM final concentration) for 4 hours, then media was exchanged twice with DPBS and the cells harvested and lysed in RIPA buffer (Thermo Scientific) supplemented with protease inhibitors. Lysates were sonicated and then, using the Kingfisher Flex automated purification, incubated for six hours with 100 uL of ReSYN (ReSYN Biosciences) streptavidin microparticles and then washed sequentially with 2% LDS buffer, Triton X-100 buffer (1% Triton X-100 0.1%, Sodium Deoxycholate 500mM, 1mM EDTA, 50mM HEPES pH 7.5), Igepal Wash Buffer (0.5% Igepal, 0.5% Sodium Deoxycholate, 10mM TRIS pH 7.5, 333.3mM LiCL, 20mM EDTA), and deposited into 50mM TRIS pH 7.4. Samples were washed with automated mixing for 30 minutes for each step. A portion of the whole cell lysate was saved and ran on

PLOS PATHOGENS
SARS-CoV-2 proteomics reveals host pathogen interactions SDS-PAGE gel, transferred to PVDF, and then probed with anti-HA antibody with 800CW anti-rabbit (LICOR) secondary along with 800CW streptavidin dye (LICOR) to confirm viral protein expression and total biotinylation. A summary of antibodies used in this study is listed in Table 2. For SILAC experiments with NSP5 and C145A, approximately 2×106 cells were harvested, washed with ice-cold PBS for three times, and lysed by incubating on ice for 30 min with CelLytic M (Sigma) cell lysis reagent containing 1% protease inhibitor cocktail. The cell lysates were centrifuged at 7,000g and at 4˚C for 15 min, and the resulting supernatants collected.

Sample preparation for mass spectrometry
After wash and purification samples contained bound proteins on beads in TRIS buffer. The protein on the beads were reduced with dithiothreitol, and alkylated with iodoacetamide. The processed proteins were subsequently digested with Trypsin/Lys-C (Promega) at an enzyme/ substrate ratio of 1:100 in 50 mM NH4HCO3 (pH 8.5) at 37˚C for overnight. For SILAC samples, the protein lysates prepared from cells with WT or mutant NSP5 were combined at 1:1 ratio (by mass), and 30 μg of the mixed protein lysate was loaded onto a 10% SDS-PAGE gel. After electrophoresis, the gel lanes were cut into 11 slices according to apparent molecular weight ranges of proteins (< 20, 20-25, 25-30, 30-37, 37-42, 42-50, 50-62, 62-75, 75-100, 100-150, >150 kDa), reduced in-gel with dithiothreitol, and alkylated with iodoacetamide. The processed proteins were subsequently digested in-gel with Trypsin/Lys-C (Promega) at an enzyme/substrate ratio of 1:100 in 50 mM NH4HCO3 (pH 8.5) at 37˚C for overnight. Subsequently, peptides were recovered from gels with a solution containing 5%

PLOS PATHOGENS
SARS-CoV-2 proteomics reveals host pathogen interactions acetic acid in H2O and then with a solution containing 2.5% acetic acid in an equi-volume mixture of CH3CN and H2O. All the resulting peptide mixture was subsequently dried in a Speed-vac, and desalted by employing OMIX C18 pipet tips (Agilent Technologies, Santa Clara, CA). LC-MS/MS experiments were conducted on a Q Exactive Plus mass spectrometer equipped with an UltiMate 3000 UPLC system (Thermo Fisher Scientific).
The mass spectrometer was operated in a data-dependent acquisition mode. Full-scan mass spectra were acquired in the range of m/z 350-1500 using the Orbitrap analyzer at a resolution of 70,000 at m/z 200. Up to 25 most abundant ions found in MS with a charge state of 2 or above were sequentially isolated and collisionally activated in the HCD cell with a normalized collision energy of 28 to yield MS/MS.

Database search
Maxquant, Version 1.5.2.8, was used to analyze the LC-MS and MS/MS data for protein identification and quantification [91]. The database we used for search was human IPI database, version 3.68, which contained 87,061 protein entries. The maximum number of miss-cleavages for trypsin was two per peptide. Cysteine carbamidomethylation and methionine oxidation were set as fixed and variable modifications, respectively. The tolerances in mass accuracy were 20 ppm for both MS and MS/MS. The maximum false discovery rates (FDRs) were set at 0.01 at both peptide and protein levels, and the minimum required peptide length was 6 amino acids. Spectral match assignment files were collapsed to the gene level and false positive matches and contaminants were removed. SAINT analysis [18] was run with the following parameters: 10,000 iterations, LowMode ON, Normalize ON and the union of MinFold ON and OFF. Minimum interactome inclusion criteria were SAINT� 0.9, fold change over matched cell type control � 4. Low normalized spectral count proteins were removed.

Gene ontology
Gene Ontology (GO) term analyses were produced using the clusterProfiler R package [92]. Proteins with SAINT score �0.9 were classified as likely interactions and used to identify enriched GO terms for the individual SARS-CoV2 protein interactomes. Highly redundant GO terms were removed for readability. Bar plots and heatmaps were produced with the ggplot2 [93] and heatmap [94] packages respectively in R.

Host-virus interaction network
Host-virus interaction network produced from BASU BioID interactions with a SAINT score � 0.9 in Cytoscape [95]. The network was further curated to emphasize the significantly enriched GO terms for each SARS-CoV-2 protein. Edges denoting Host-virus protein interactions are indicated in red. Host-host interactions were determined from high confidence PLOS PATHOGENS SARS-CoV-2 proteomics reveals host pathogen interactions (>0.700) STRING database interactions obtained from experimental evidence and database interactions for all of the curated proteins. Cell endogenous protein interactions are denoted by grey edges. Clusters were highlighted based on highly enriched GO terms for SARS-CoV-2 proteins.

Cellular fractionation
Cellular fractionations were generated using a previous protocol [96] with minimal modification. Cells were transfected as described previously. Cell pellets were split into three separate samples. The first was lysed using RIPA buffer (Thermo Scientific) and was labeled whole cell lysate (WCL). The second sample was resuspended in buffer containing 0.3% Igepal, 10mM HEPES, 10mM KCl, 1.5mM MgCl2. Sample was pelleted at 1500g and supernatant was collected and labeled cytoplasm/membrane fraction. The remaining pellet was washed once and then lysed in RIPA and labeled nuclear fraction. The third sample was lysed in buffer containing 100ug/mL Digitonin, 50mM HEPES, and 150mM NaCl. Sample was pelleted at 2400g and supernatant was collected and labeled cytoplasm fraction. The remaining pellet was washed once and lysed in RIPA and labeled nuclear/membrane fraction. Equal volumes of each fraction along with 20ug of WCL were loaded and ran in a 4-12% Tris-Bis Polyacrylamide Gel (Invitrogen). Samples were transferred to PVDF and blotted for HA (Viral Proteins), Alpha tubulin (cytoplasm control), calnexin (membrane control), Histone H3 (nuclear control) ( Table 2).

Viral infection and immunofluorescent microscopy
SARS-CoV-2 (USA-WA1/2020) viral stocks were obtained from BEI and propagated on VeroE6 cells (ATCC) in DMEM with 2% FBS. The virus was passaged 3 times on VeroE6 cells and titers were calculated via plaque assay. Viral stocks were subject to sequencing to confirm that no furin cleavage site or other mutations were acquired during passaging. A549-ACE2 cells were donated from Ralf Bartenschlager's lab and cultured in DMEM supplemented with 10% FBS and 2% Geneticin (Thermo Fisher). Cells were plated on 8-well chamber slides (ibidi) 24 hours before infection. On the day of infection, cells were inoculated with SARS-CoV-2 at an MOI of 0.1 and incubated at 37˚C under 5% CO2. At 48 hours, chamber slides were washed with PBS and submerged in 4% PFA for 30 minutes according to BSL3 fixation and inactivation protocols. All SARS-CoV-2 live virus experiments were conducted under BSL3 conditions at Stanford University. Cells were then permeabilized with PBS containing 2% BSA and 0.2% Triton X-100. Staining of primary antibodies against both viral (NSP1, NSP14, ORF6) and host (EIF3A, Nup98, Rae1, MCM3AP) targets was done by incubating the cells with dilutions (1:500 for all viral targets, EIF3A 1:200, Nup98 1:100, Rae1 1:100, MCM3AP 1:100) at room temperature for 30 minutes followed by 3 washes with PBS containing 2% BSA and 0.2% Triton X-100. Goat anti-sheep conjugated to Alexa Fluor 488 was used to detect viral primary antibodies and Goat-anti rabbit conjugated to Alexa Fluor 657 was used to detect the host primary antibodies. Staining of secondary antibodies was done at 1:500 dilution for 30 minutes and followed by 3 washes and then Hoescht staining was performed at a final concentration of 1ug/mL for 5 minutes and followed by 3 washes with PBS. After the final PBS wash, fresh PBS was added to the chambers and images were taken on Zeiss 880 confocal microscope using the 63X oil immersion lens. Additional antibody information is listed in Table 2.

HiChIP data processing and virtual 4C visualization
HiChIP all valid pair matrices for GM12878, Naïve T cells, Th17 cells and Treg were downloaded from GEO (GSE101498, [101]). v4C plots were generated from HiChIP valid pair matrices. The interaction profile of a specific 5-kb bin containing the TRIM4 anchor was then plotted in R. H3K27ac ChIP-seq peaks for GM12878, Naïve T cells, Tregs and T helper cells were downloaded from ENCODE as 1d peak sets. FitHiChIP pipeline was used to call loops with 5kb bin, peak-to-all interaction type, loose background, and FDR < 0.01 [102]. The merged significant interaction files from FitHiChIP pipeline along with corresponding ATAC-seq profiles were visualized in WashU Epigenome web browser. Browser shots from WashU track sessions were then included in the v4C and interaction map anecdote.

Luciferase assays
For NSP1 translation assays, in-vitro transcribed transcripts were generated by first PCR amplifying DNA containing T7 promoter followed by UTR or IRES elements and firefly or renilla luciferase. Second, using HiScribe T7 ARCA mRNA Kit (with tailing) (NEB) capped and polyadenylated transcripts were synthesized. 5x105 293T cells were transfected with 2ug of plasmids expressing either GFP or NSP1 and then incubated overnight. The next day the cells were transfected with 2ug of the corresponding IVT transcripts and were harvested 8 hours post second transfection. Cells were harvested with 400ul of Passive Lysis Buffer (Promega) and quadruplicate samples were plated on an opaque 96 well plate. 50ul of LARII firefly luciferase substrate (Promega) was added and the plate was read on the luminescence setting of the Spectramax i5 plate reader. 50ul of Stop & Glo renilla luciferase substrate was then added and the plates reread. For IFNB1 promoter activity assays, 2.5x105 293T cells were transfected with 2ug of plasmids expressing either ORF6 or ORF9b along with 1ug of plasmid containing nanoluciferase under the control of a the human IFNB1 promoter and 50ng of a plasmid containing firefly luciferase under the control of the constitutive TK promoter and either 1ug of empty vector or 1ug of a plasmid expressing the 2-CARD domain of RIG-I. 24 hours post transfection cells were harvested in 200ul of Passive Lysis Buffer and triplicate samples were plated on an opaque 96 well plate. Nano-Glo Dual-Luciferase Reporter Assay System (Promega) was used to obtain a firefly luciferase reading for IFN-beta promoter activity normalized to the firefly luciferase transfection control.

NSP5 cleavage site prediction
Protein sequences for hits from SILAC and BASU-BioID proteomics experiments were run through the NetCorona algorithm [59] using the web application: (https://services.healthtech. dtu.dk/service.php?NetCorona-1.0). For Coronavirus Polyprotein controls, the SARS-COV-2 ORF1ab protein sequence (from Uniprot Fasta UP000464024) was run through the NetCorona web application. A previously tested SARS-COV-1 sequence [59], VATLQAENV, was found to be shared in the SARS-COV-2 protein sequence and was also used as a control.

FRET-based NSP5 cleavage assay
Predicted NSP5 cleavage sites were cloned into ECFP-TevS-YPET (Addgene Plasmid #100097) [103] Briefly, the plasmid was re-cloned to put the Tev Protease Site between an XbaI site and a BsiWI site. Cleavage sequences were cloned in between XbaI and BsiWI sites, restoring the XbaI and BsiWI sites, with one Glycine on each side of the predicted NSP5 cleavage sequences. This approach was based on a cloning strategy used previously to study norovirus protease cleavage sites [104]. For protease cleavage assay, 3x10^4 HEK 293T cells were plated in DMEM + 10% FBS into 96 well black, clear bottom plates (Greiner). 24 hours later, cells were transfected in quadruplicate with 0.1ug of FRET plasmid containing the NSP5 cutsite and 0.1ug of either WT-NSP5 or mutant NSP5C145A expression plasmids [17] with Lipofectamine 3000 following manufacturer protocol for 96-well plates. 24 hours later, media was removed and PBS was added to wells, and wells were imaged on a Spectramax i5 instrument (Molecular Devices) with the following wavelengths: 420/485 nm for ECFP, 485/535 nm for YPET and 420/535 nm for FRET as previously described [103]. After background subtraction of un-transfected wells, FRET efficiency was calculated as FRET/ECFP.

Quantification and statistical analysis
Gene ontology adjusted p-values were produced using the Benjamini-Hochberg method. For heatmaps, a threshold was set whereby at least one protein had a significant score for the presented GO terms. A -Log10 p-Value threshold of >5 for M proteins and >1.3 for all others was used for 'non-stringent' heatmaps (with the exception of molecular function heatmaps, which uses an M protein threshold of 3) and -Log10 p-Value > 5 for M proteins and >3 for all others was used for more stringent heat maps presented in Figs 1D and 2A.
Graphed data are expressed as mean ± SEM and sample size (N) represents independent experiments as noted. Statistical analysis was performed in GraphPad Prism 7 and described in the figure legends.
Student's t test was performed comparing the means between GFP controls and the experimental conditions where N is three independent experiments. For IFNB1 reporter assays, empty vector without RIG-I 2-CARD was set to one and all other conditions are relative to that empty control and is the average of three independent experiments. NSP5 FRET-cleavage was calculated as described previously and Student's t test was performed comparing the means between NSP5C145A mutant and NSP5 WT where N is three independent experiments.