The endogenous cellular protease inhibitor SPINT2 controls SARS-CoV-2 viral infection and is associated to disease severity

COVID-19 outbreak is the biggest threat to human health in recent history. Currently, there are over 1.5 million related deaths and 75 million people infected around the world (as of 22/12/2020). The identification of virulence factors which determine disease susceptibility and severity in different cell types remains an essential challenge. The serine protease TMPRSS2 has been shown to be important for S protein priming and viral entry, however, little is known about its regulation. SPINT2 is a member of the family of Kunitz type serine protease inhibitors and has been shown to inhibit TMPRSS2. Here, we explored the existence of a co-regulation between SPINT2/TMPRSS2 and found a tightly regulated protease/inhibitor expression balance across tissues. We found that SPINT2 negatively correlates with SARS-CoV-2 expression in Calu-3 and Caco-2 cell lines and was down-regulated in secretory cells from COVID-19 patients. We validated our findings using Calu-3 cell lines and observed a strong increase in viral load after SPINT2 knockdown, while overexpression lead to a drastic reduction of the viral load. Additionally, we evaluated the expression of SPINT2 in datasets from comorbid diseases using bulk and scRNA-seq data. We observed its down-regulation in colon, kidney and liver tumors as well as in alpha pancreatic islets cells from diabetes Type 2 patients, which could have implications for the observed comorbidities in COVID-19 patients suffering from chronic diseases.


Introduction
SARS-CoV-2 entry requires a two-step process: first, the envelope protein spike (S) binds to the viral cellular receptor Angiotensin-converting enzyme 2 (ACE2) membrane protein [1] and is then proteolytically activated by cellular serine proteases (SPs) like TMPRSS2, TMPRSS4 and Furin [2][3][4]. A wider spectrum of serine proteases might also contribute to viral priming since ACE2 and TMPRSS2 co-expression at the gene expression level is mainly restricted to specific cell types like ciliated, alveolar type 2 and secretory cells in lungs and enterocytes in colon and ileum [5]. A possible role of serine proteases and their inhibitors in SARS-CoV-2 viral infection has been examined in recent reviews [6,7]. TMPRSS2 has been proposed as a putative drug target [3,8,9] and as a biomarker for COVID19 disease severity [10,11]. Despite its central role, the regulation of TMPRSS2 is poorly understood, although its activation by androgen response elements has been documented in normal and tumor prostate tissues [12].
SPINT2, a member of the Kunitz-type serine proteases inhibitors [13] has been shown to inhibit TMPRSS2 protease activity [14,15] which could have implications for COVID-19 disease. A down-regulation of SPINT1, SPINT2 and SERPINA1 has been reported in colorectal Caco-2 cells infected with SARS-CoV-2 [16], however, a clear association between SPINT2 activation and viral permissivity has not been confirmed. To fill this gap, we evaluated the existence of coregulation between SPINT2/TMPRSS2 and found common transcription factors (TF) associated with genomic loci for both genes which was also in line with a consistent correlation of the genes across cell types. This coregulation suggested a modulation of SARS-CoV-2 infection by SPINT2. We could corroborate a negative correlation between SPINT2 gene expression and SARS-CoV-2 viral load in Calu-3 and Caco-2 cell lines. To validate our findings we knocked-down SPINT2 in Calu-3 and A549 cell lines and observed an increase in the number of SARS-CoV-2 infected cells. We hypothesized that SPINT2 levels would be lower in SARS-CoV-2 target cells from COVID-19 patients with severe symptoms which we could indeed observe in secretory cells from nasopharynx samples. This suggests that SPINT2 can be used as a biomarker for disease susceptibility. Finally, it is known that SPINT2 is down-regulated among different types of tumors [17] and we were able to corroborate this by systematically evaluating bulk-and scRNA-seq datasets which suggests a possible association to the COVID-19 comorbidity observed in cancer patients.

Results and discussions
SPINT2 and TMPRSS2 are coregulated across tissues TMPRSS2 proteolytic activity inhibition by SPINT2 has been previously reported [14,15]. We investigated a coregulation between SPINT2 and TMPRSS2, as a similar shared regulation through the transcription factor (TF) CDX2 has been described for SPINT1 and ST14 (Matriptase) previously in enterocytes [18]. Since SPINT2 is also able to regulate ST14 activity in small and large intestines [19] we decided to use enterocytes as a model to test this hypothesis. In order to find common TFs regulators of SPINT2/TMPRSS2 we performed two independent analysis: i) A footprinting analysis of chromatin open regions using ATAC-seq data from Human Intestinal Organoids [20] identifying potential TF binding sites and ii) Using scRNAseq data from ileum derived organoids [21] we calculated the activity of transcription factors based on the gene expression of their targets using the SCENIC algorithm [22]. TF activities were then correlated to SPINT2 and TMPRSS2 gene expression. We identified common TFs inferred to be bound to the open chromatin sites in the SPINT2 and TMPRSS2 genomic loci (Fig 1A, top and bottom, respectively) and those with TF activities positively correlated to both SPINT2 and TMPRSS2 gene expression (Fig 1B, top right quadrant). Comparing these two sets of TFs, we found ten shared regulators: ELF3, FOS, FOSL1, FOXC1, IRF1, IRF7, JUND, JUNB, ONECUT3 and KLF4. Interestingly, many of these regulators play a role in immune response upon infection, suggesting a possible feedback mechanism. SARS-CoV-2 infection has been shown to upregulate FOS expression in Huh7.5 and A549 cell lines [23]. IRF1 and IRF7 are interferon regulatory factors which regulate infection responses, and have been observed to be upregulated in COVID-19 patients [24]. JUNB has been found in SARS-CoV-2 infection gene expression signatures in Calu-3 and Caco-2 cell lines [25]. ELF3 is an important factor controlling the development of epithelium tissues [26] including intestinal epithelia [27]. Based on our analysis, we depict the regulatory model in which SPINT2 and TMPRSS2 gene expression are coregulated by common TFs in ileum enterocytes, hence maintaining the protease/inhibitor balance (Fig 1C). On the other hand, down-regulation of TMPRSS2 enzymatic activity by SPINT2 possibly maintains viral load at a low level. In line with this, we observed that the expression of SPINT2 and TMPRSS2 are positively correlated across normal human tissues (Fig 1D). Interestingly, tissues which have been shown to be targets for the virus display the highest correlation between SPINT2 and TMPRSS2 expression. Also, at single cell resolution, we observed that both genes are specifically co-expressed in many cell types [28] (Fig 1E) which corroborates the inferred coregulation.

Deriving a SARS-CoV-2 permissivity signature
Given the previous observed association between SPINT2 and TMPRSS2, a major SARS-CoV-2 virulence factor [3], we asked whether SPINT2 could account for differences in viral permissivity. Calu-3 and H1299 cells have been previously reported as SARS-CoV-2 permissive and non-permissive cell lines, respectively [25]. To test our hypothesis, we inferred a SARS-CoV-2 permissivity signature using the following approach: we calculated Differentially Expressed Genes (DEGs) between non-infected Calu-3 and H1299 cells (set z in Fig 2A). Because we were interested in a priori permissivity factors we excluded from this list any viral-induced

PLOS PATHOGENS
SPINT2 controls SARS-CoV-2 viral infection and is associated to disease severity genes. In order to do so, we calculated DEGs between the infected vs mock-infected cells from both Calu-3 (x) and H1299 cell lines (y) and then filtered out these genes from z to obtain non-viral inducible genes (i') (Fig 2A). Filtered genes (x + y) shown in the S1A Fig and listed in S1 Table are enriched in pathways related to RNA and protein synthesis and viral processes (S1B Fig) as has been also previously reported [29]. Using this approach, we identified a set of 480 candidate genes which might contribute to infection permissivity (S2 Table). We used normalized expression values for these 480 genes as input to train a Random Forest (RF) model for predicting the cumulative sum of viral gene expression in Calu-3 infected cells ( Fig  2B). We then used the top 25% ranked genes for further analysis. SPINT2 was found among the top ranked genes (Fig 2B). Of the top ranked genes, 21 corresponded to genes with functional annotations related to viral infection (Fig 2B inset pie chart and S2 Table), four have been reported to participate and/or interact directly with SARS-CoV-2 [30], eight corresponded to curated receptors [31] or ligands in the CellPhone Data Base [32] and 13 are cell membrane surface proteins. Importantly, by intersecting the set of genes in the permissivity signature with a reported SARS-CoV-2 viral infection transcriptional signature [33], we only found 3 hits (CXCL5, LGALS3BP and EHF), confirming that most of the identified genes in our permissivity signature are not viral-induced, but likely represent a priori susceptibility factors. We found the following Heat Shock Proteins (HSP): HSPB1, HSPA8 and HSPD1 to be differentially expressed. In a previous study a different HSP, HSP90, was observed to correlate to SARS-CoV-2 viral load in Calu-3 cells and its inhibition reduced viral infection [25]. We also found several ribosomal proteins (RPL9, RPL23, RPL26, RPL28, RPL38, RPS7, RPS12 and RPS27A) and elongation factors (EIF3A, EIF4A2 and EIF4B) which could be related to viral protein translation and ER stress response [34]. In order to confirm that the permissivity signature are not just reflecting tissue specific or immune signatures, a Pathway Enrichment Analysis (PEA) was performed using the top ranked genes. Interestingly, we found an enrichment of host-viral interactions processes, protein stabilization and Endoplasmic Reticulum (ER) trafficking pathways (Fig 2C).
Next, we investigated if these susceptibility genes, identified from lung-derived cell lines, are also expressed in other cell types. Therefore, we ranked the cell types in the Human Cell Landscape dataset (HCL) [28] based on the permissivity score derived from our top ranked genes in the permissivity signature ( Fig 2D) and found that stratified epithelial, basal, AT2 lung cells and enterocytes were among the top-ranked cell types which correspond to cell types known to be infected by the virus [35,36].
In order to further refine our permissivity signature by going beyond transcriptional levels, we used protein expression levels of a previously released proteomic dataset from SARS-CoV-2 infected Caco-2 cells [16]. We determined the Spearman correlations of the translation rates for the top ranked genes to that of the N and S viral proteins (Fig 2E). Some of the highly correlated genes (both negative and positive) have been previously reported to participate in viral infection processes. For example, LGALS3BP is a glycoprotein secreted molecule with antiviral properties observed in HIV and Hantavirus infection [37,38] and in the regulation of LPS induced endotoxin shock in murine models [39]. CLIC1 has been previously identified as a virulence factor of Merkel Cell Polyomavirus (MCPyV) which is upregulated during infection and promotes the development of Merkel Cell Carcinoma [40]. HSPD1 has been shown to promote viral infection of HIV, HBV and Influenza viruses [41]. Interestingly, SPINT2 was consistently correlated to viral translation (Fig 2E). Furthermore, the correlation of SPINT2 with viral gene expression is negative and this trend is consistent in both Caco-2 and Calu-3 cell lines, indicating a repressive role on SARS-CoV-2 infection (S1C and S1D Fig). Hence, these findings suggest that SPINT2 represents a permissivity factor that negatively correlates with SARS-CoV-2 infection.

SPINT2 knockdown increases SARS-CoV-2 infection in a TMPRSS2dependent manner in cell lines
To experimentally validate the negative correlation of SPINT2 expression with SARS-CoV-2 viral gene expression, we hypothesized that this gene could have a direct influence on SARS-CoV-2 infection by impairing early steps of viral entry. Hence, to test our hypothesis, we knocked-down SPINT2 using small-hairpins RNA in the human lung carcinoma derived line Calu-3 cells. SPINT2 expression was readily detectable in wild-type (WT) Calu-3 cells (Fig 3A and 3B). When Calu-3 cells were transduced with a specific shRNA directed against SPINT2, SPINT2 levels were significantly decreased compared to WT cells or cells transduced with a scrambled shRNA at both the transcript (Fig 3A) and protein level (Fig 3B). To address the impact of SPINT2 knockeddown on the permissivity of Calu-3 cells to SARS-CoV-2, WT, scrambled and SPINT2 knockeddown cells were infected with SARS-CoV-2 using the same multiplicity of infection (MOI) and, at 24 hours post-infection (hpi), the impact of SPINT2 silencing on SARS-CoV-2 infection was addressed by immunofluorescence using an antibody directed against the nucleocapsid protein of SARS-CoV-2. Knock-down of SPINT2 resulted in more than two-fold increase in the number of cells positive for SARS-CoV-2 at 24 hpi (Fig 3E and 3F). Concommittently, knockdown of SPINT2 was also associated with an increase in SARS-CoV-2 replication as monitored by reverse transcription quantitative PCR against the viral genome at 24 hpi (Fig 3C). Evaluation of the number of infectious virus particles released by cells revealed no significant increase in virus production and release upon knockdown of SPINT2 (Fig 3D). Interestingly, even when using higher MOI, loss of SPINT2 always resulted in an increased infection of Calu- In order to test the hypothesis whether SPINT2 modulation of viral load is dependent on TMPRSS2, we monitored its fold change expression. Interestingly, TMPRSS2 gene expression was found to be higher in SPINT2 knocked-down cells when compared to WT or scramble cells both in Calu-3 cells and in A549 cells (Figs 3G and S4A). Finally, to control that the increase in SARS-CoV-2 infection observed upon SPINT2 silencing was dependent on the TMPRSS2-mediated activation of the virus, we employed a pharmacological approach to inhibit the activity of the TMPRSS2 protease. Treatment of Calu-3 cells with the TMPRSS2 inhibitor (Camostat mesylate) resulted in an almost complete inhibition of viral infection as monitored by quantifying the number of SARS-CoV-2 infected cells at 24 hpi (Fig 3H) and by quantifying replication using quantitative RT-PCR against the SARS-CoV-2 genome (S4B Fig). As described above, knockdown of SPINT2 resulted in an increase of SARS-CoV-2 expression compared to cells treated with scrambled control shRNA and this increase was abrogated upon treatment of cells with the inhibitor of TMPRSS2 (Figs 3H and S4B).

SPINT2 overexpression decreases SARS-CoV-2 infection
As knockdown of SPINT2 resulted in an increase of SARS-CoV-2 infection, we hypothesized that a greater amount of SPINT2 in cells will result in an inhibition of TMPRSS2 which will in turn prevent infection. In order to test this hypothesis we monitored TMPRSS2 gene expression in Calu-3 cells overexpressing SPINT2. First, SPINT2 was overexpressed in Calu-3 cells (Fig 4A) and both WT and cells overexpressing SPINT2 were infected by SARS-CoV-2. At 24 hpi, cells were immunostained for the nucleocapsid N and quantification of the number and percentage of infected cells revealed that overexpression of SPINT2 negatively impacted SARS-CoV-2 infection (Fig 4B and 4C). We observed a decrease in viral replication by measuring SARS-CoV-2 genome levels using RT-PCR (Fig 4D). We also monitored gene expression of TMPRSS2 under SPINT2 overexpression and observed a downregulation of TMPRSS2 (Fig 4E) consistent with the SPINT2 knock-down results. Together, these data strongly suggest that SARS-CoV-2 infection negatively correlates with SPINT2 expression levels which is in full agreement with our observations using previously reported data in cell lines (Figs 2E and S1C and S1D).

SPINT2 modulates infection by controlling viral entry
To address which step of the SARS-CoV-2 lifecycle is favored upon SPINT2 knocked-down, we quantified the number of SARS-CoV-2 infected Calu-3 cells overtime (Fig 5A). SARS-CoV-2 infected cells (immunostained against the nucleocapsid N) were readily detectable as early as 4 hpi (Fig 5A and 5B) which was consistent with previous reports. Interestingly, at any time points post-infection, cells knocked-down for SPINT2 were always found more infected compared to cells treated with a control scrambled shRNA as monitored by quantifying the number of infected cells (Fig 5B), viral replication (Fig 5C) or production of de-novo infectious virus particles (Fig 5D). Together, these results suggest that knock-down of SPINT2 favors early steps of the virus lifecycle, most probably the entry step given the function of SPINT2 in controlling the TMPRSS2 activity. To directly address whether loss of SPINT2 promotes SARS-CoV-2 entry in cells, we thought of uncoupling the SARS-CoV-2 entry step to the rest of the viral lifecycle. For this, we exploited the Vesicular Stomatitis Virus pseudotyped with the SARS-CoV-2 spike in A549 [42]. Entry of this engineered virus relies on the SARS-CoV-2 spike protein while the rest of its lifecycle corresponds to the VSV replication/assembly cycle. Infection of A549 cells overexpressing ACE2 with the spike pseudotyped VSV encoding the green fluorescent protein (GFP) was readily detectable as early as 4 hpi. Interestingly, upon SPINT2 knocked-down, we observed an increase in the number and percentage of spikepseudotyped VSV infected cells compared to cells treated with an scrambled control shRNA (Fig 5E and 5F). All together, these data strongly suggest that loss of SPINT2 leads to greater infection by SARS-CoV-2 by promoting entry into the target cells.

SPINT2 is negatively correlated to viral load and is down-regulated in severe COVID-19 cases
Given the observed negative correlation between SPINT2 expression and SARS-CoV-2 infection in cell lines (Figs 2E and S1C and S1D) we next investigated if SPINT2 expression is associated with disease severity in COVID-19 patients. We used a publicly available scRNA-seq dataset on nasopharynx swabs samples from patients with severe and mild symptoms [43]. We correlated a list of serine proteases and inhibitors (SPRGs, S3 Table) to the viral RNA reads and found that SPINT2 was the second most negatively correlated gene (Fig 6A). Then, we selected the cell cluster with the highest expression of SPINT2, which correspond to secretory cells (S5A Fig) and among these cells, observed a lower SPINT2 gene expression in cells from critical COVID19 cases compared to moderate cases (Fig 6B). This finding is particularly relevant since secretory cells are primary targets of viral infection [44]. We also evaluated data on Peripheral Blood Mononuclear Cells (PBMC) from severe COVID-19 patients [45]. In this   eosinophils (S5B Fig). Among these cells, again, we observed lower SPINT2 expression in patients from Intensive Care Units (ICU) (Fig 6C). Additionally, we could also corroborate the negative correlation of SPINT2 and viral load using bulk RNAseq data from lung autopsies of COVID-19 deceased patients [46]. We calculated the correlations of gene expression between SPINT2, ACE2 and TMPRSS2 to E, M, N and S viral genes and observed the similar negative correlation (S5C Fig). Collectively, this evidence suggests that SPINT2 expression level could be associated to COVID-19 disease severity.

SPINT2 is down-regulated in multiple tumor types and pancreatic cells from T2D patients
COVID-19 patients with previous records of chronic diseases like cancer or diabetes are considered at higher risk [47][48][49][50][51]. Also, SPINT2 gene silencing by promoter hypermethylation has the percentage of infected cells was quantified. Nuclei were stained with DAPI (blue). Error bars indicate standard deviation. n = 3 biological replicates. P<0.05 � , P<0.01 �� , P<0.001 ��� , P<0.0001 ���� . Analysis was done by a two-tailed unpaired t-test with Welch's correlation for the respective time point for B-D. Two-tailed unpaired t-test with Welch's correlation was performed for E.
https://doi.org/10.1371/journal.ppat.1009687.g005 been reported in multiple tumor types which promotes tumor progression [17,[52][53][54]. For this reason, we hypothesized that SPINT2 down-regulation in tumor cells would increase viral infection permissivity which among others, could be one of the mechanisms behind the comorbidity observed in COVID-19 patients. We screened lung, colon, liver and hepatic tumor datasets to evaluate the differences in SPINT2 gene expression between tumor and paired normal samples. We found statistically significant down-regulation of SPINT2 in the kidneys and liver tumors (S6A Fig). Similarly, using comparable tumor scRNA-seq datasets [55][56][57][58][59] we observed a down-regulation of SPINT2 in colon adenocarcinoma (epithelial cells), renal clear cell carcinoma (endothelial cells) and hepatocellular carcinoma (hepatocytes) (Fig  7). Interestingly, we were able to detect SPINT2 down-regulation in colorectal tumor epithelial cells at single cell level but not in bulk RNA-seq data suggesting that SPINT2 expression might be modulated in specific cell subtypes (Figs 7 and S6A). In lung adenocarcinomas, we found SPINT2 upregulation in tumors both in TCGA and scRNA-seq data which might reflect the existence of different determinants for comorbidity in lung tissues independent of SPINT2 modulation. We also looked at the expression of SPINT2 in pancreatic cells from diabetes type 2 (DT2) patients [59]. Islet cells have high SPINT2 expression when compared to other cell types like endothelial cells (S6B Fig). We observed a strong down-regulation of SPINT2 in alpha-cells of DT2 patients, which have been shown to be primary targets of the SARS-CoV-2 virus. This down-regulation of the virulence associated factor SPINT2 might contribute to the comorbidity between COVID19 and DT2.

Discussion
In this study, we describe a tight protease-inhibitor/protease balance at the gene expression level between SPINT2 and TMPRSS2, a major co-receptor of SARS-CoV-2. We found Transcription Factor Binding Sites (TFBS) for ten regulators including IRF1, IRF3, JUNB, JUND and ELF3 whose TF activities were found to be correlated to both SPINT2 and TMPRSS2 gene expression which suggests their possible role as common regulators of both genes. Interestingly, ELF3 and IRF7 TF activity has been found to be modulated in SARS-CoV-2 infected vs bystander enterocytes from ileum [21], which could point to viral load modulation mediated by TMPRSS2 and SPINT2 through these TFs. We show that SPINT2 and TMPRSS2 gene expression levels are correlated across cell types and tissues. Interestingly, known SARS-CoV-2 target tissues have high correlation values and co-expression for both genes which suggest that SPINT2 could play a role in SARS-CoV-2 viral entry. Currently, it is unclear what the molecular signatures are that determine viral permissivity and how they are related to disease severity. We inferred a SARS-CoV-2 permissivity signature, using differentially expressed genes between permissive and non-permissive cell lines from which we removed viral induced genes. We were able to find SPINT2 in this permissivity signature and observed a negative correlation to SARS-CoV-2 viral load in Calu-3 cells. We also corroborated this trend at the protein level in Caco-2 cells. During the preparation of these manuscript, a study from Bojkova D et al, 2020 was published suggesting a possible role of SPINT1, SPINT2 and SERPINA1 in viral infection by observing the down-regulation of their protein levels in infected cells and also by evaluating the effect of Aprotinin a non-specific SP inhibitor on viral load [9]. In contrast, we could not observe SPINT2 as a viral-induced gene. Such discrepancy might be attributed to a difference in the level of information used between protein and mRNA levels.
However, here for the first time by knocking down and overexpressing SPINT2, we provide direct causal evidence that SPINT2 is indeed able to modulate SARS-CoV-2 infection. This modulation is dependent on TMPRSS2 and we provide evidence that SPINT2 knock-down impacts SARS-CoV-2 infection by affecting viral entry. SPINT2 inhibits TMPRSS2 enzymatic activity through its KD1 and KD2 domains [15]. Interestingly, beyond direct TMPRSS2 enzymatic activity inhibition we could observe an up-regulation of TMPRSS2 mRNA expression in the SPINT2 knocked-down Calu-3 cells. Further on, we also observed the same trend in the overexpression experiments, that is, TMPRSS2 was down regulated after SPINT2 overexpression. Further investigation is needed to explore this regulation at the gene expression level. It has been previously reported that SPINT2 can regulate gene expression through different mechanisms apart from direct inhibition. For example, SPINT2 can modulate the serine protease ST14 protein activity by regulating its shedding from the cell membrane of mouse intestinal epithelial cells [19] SPINT2 has been reported to regulate transcription of certain genes like CDK1A via histone methylation [60]. Then, we speculate that SPINT2 levels could also regulate TMPRSS2 transcriptionaly, independently of direct enzymatic activity inhibition. Our findings clearly show that SPINT2 regulates SARS-CoV-2 viral infection through the inhibition of TMPRSS2, since a drug induced inhibition of TMPRSS2 abrogates the infection-promoting effect of SPINT2.
We found a lower expression of SPINT2 in secretory cells from COVID-19 patients with severe symptoms [43]. This could have implications for COVID-19 disease severity since secretory cells have been shown to be the target of SARS-CoV viral infection using organotypic human airway epithelial cultures [44]. We found SPINT2 in the permissivity signature from which we filtered out viral induced genes, suggesting that this gene could be used as a marker for predicting COVID-19 disease susceptibility prior to infection, however this needs to be further evaluated.
Serine proteases (SPs) have been reported to be abnormally regulated in diverse chronic diseases [17,[61][62][63]. For example, during carcinogenic development SPs influence metastasis and cancer progression [64,65], while in the context of diabetes they control fibrinolysis, coagulation and inflammation which in turn affects disease severity [62]. This led us to hypothesize that shared molecular mechanisms between some chronic diseases and COVID-19 could be explained in part by the regulation of SPINT2. We observed SPINT2 down-regulation in Hepatocellular Carcinoma (HCC), Colon Adenocarcinoma (COAD) and renal Clear Cell Carcinoma (rCCC) tumor cells. SPINT2 down-regulation in liver has been reported to contribute to the development of HCC by the binding and inhibition of the serine protease HGFA which transforms Hepatocyte Growth Factor (HGF) into its active form which in turn promotes metastasis, cell growth and angiogenesis [17,66] and the same mechanism has been suggested for rCCC [12]. A marked down-regulation of SPINT2 can be observed in alpha islets pancreatic cells from diabetes patients. It has been reported that islet cells can be infected by SARS-CoV-2 which could contribute to the onset of acute diabetes [67]. Hence, these results suggests that kidney, colon and liver tumor types as well as pancreatic islets cells from diabetic patients could be more permissive and susceptible to SARS-CoV-2 viral infection due to an imbalance of SPINT2 gene expression, which could lead to the disruption of the proteaseinhibitor/protease balance [68].
In conclusion, we showed for the first time that SPINT2 is a permissivity factor that modulates SARS-CoV-2 infection. This modulation could be explained by the balance of TMPRSS2/ SPINT2 (serine protease/inhibitor) that we observed at the gene expression level across several tissues. We also found lower SPINT2 gene expression in samples from COVID-19 patients with severe symptoms, hence, this gene might represent a biomarker for predicting disease severity. We also found SPINT2 down-regulation in tumor types which could have implications for the observed comorbidities in COVID-19 patients with cancer.

Production of lentiviral constructs expressing shRNA against SPINT2 and lentiviral constructs overexpressing SPINT2
Oligonucleotides encoding the sequence for SPINT2 knockdown were designed from the TRC library based on Genetic Perturbation Platform (GPP) Web Portal, cloneID: TRCN0000073581 (Box 1) [69] Annealed oligonucleotides were ligated with the AgeI-HF and EcoRI-HF digested pLKO.1 puro vector (Add gene #8453) using the T4 DNA Ligase (New England Biolabs). The pENTR223 vector encoding the ORF of human SPINT2 (Genbank ID: CV023579) was obtained from the DKFZ Genomics & Proteomics Core Facility and then subsequently cloned by recombinational Gateway Cloning into an expression vector pWPI. Both resulting plasmids were transformed into E. coli DH5α-competent cells.
Amplified plasmid DNA was purified using the NucleoBondR PC 100 kit by Marchery-Nagel following the manufacturer's instructions.

Lentivirus production and selection of stable cell lines
HEK293T cells (ATCC CRL-3216) were seeded on 10 cm 2 dishes and allowed to adhere for 36 hours. The cells were transfected with 4 μg of pMD2.G (Addgene #12259), 4 μg of psPAX2 (Addgene #12260) and 8 μg of purified pLKO.1 plasmid containing the shRNA constructs upon reaching 70% confluency. Cell supernatant containing lentivirus was harvested 72 h post-transfection, filtered through a 45 μM Millex HA-filter (Merck Millipore) and purified by ultracentrifugation at 27,000x g for 90min. 2x10 5 Calu-3 cells were seeded onto collagen coated 6-well plates 24 h prior to transduction. Cell medium was replaced with 3 mL medium containing 20 μL of the purified lentivirus and 3μl polybrene transfection reagent (Merck Millipore). Medium was supplemented with 10 μg/mL puromycin for selection of successfully transduced cells two to three days after transduction.

SARS-CoV-2 viral infection
The SARS-CoV-2 isolate used in the experiments was obtained from the swab of a SARS-CoV-2 positive patient from the Heidelberg University Hospital. The virus was isolated and propagated in Vero E6 cells. All SARS-CoV-2 infections were performed with a multiplicity of infection of 0.04 as determined in Vero E6 cells. Prior to infection, culture media was removed and virus was added to cells and incubated for 1 hour at 37˚C. Fresh media was added back to the cells upon virus removal.

SARS-CoV-2 spike protein pseudotyped VSV assay
A549 cells expressing ACE2 were seeded at 10000 cells/ well in a 96-well plate 24h prior to infection. Spike pseudotyped VSV was added to the wells and the infection was allowed to proceed 8 hpi, media was removed, samples were washed 1X with PBS and fixed in 4% paraformaldehyde (PFA) for 20 mins at room temperature (RT). Cells were washed in 1X PBS, permeabilized in 0.5% Triton-X for 15 mins at RT and then incubated with DAPI for 30 mins at RT. Cells were washed in 1X PBS three times and maintained in PBS. Cells were imaged on a Zeiss Cell Discoverer 7 to quantify the number of infected cells relative to the number of nuclei. For details about the VSV SARS-CoV-2 S Δ18 eGFP, refer to [42].

RNA isolation, cDNA synthesis and qPCR
Cells were harvested 24 hours post infection for RNA isolation using RNAeasy RNA extraction kit (Qiagen) as per manufacturer's instructions. Complementary DNA was synthesized using iSCRIPT reverse transcriptase (BioRad) from 250 ng of total RNA per 20μL reaction according to the manufacturer's instructions. Quantitative RT-PCR assay was performed using iTaq SYBR green (BioRad) as per manufacturer's instructions. The expression of target genes was normalized to endogenous control TBP. Primer sequences are indicated in Box 2. The fold change in SARS-CoV-2 genome copy number was calculated using input as a reference. Input samples were harvested directly post-infection and accounted for the basal viral genome copy number detected due to viruses attaching to the cell membrane.

In-cell Western (TCID50)
Vero E6 cells were seeded at 20,000 cells/ well into a 96-well plate 24h prior to infection. 100uL of supernatant was added to the first wells and seven 1:10 serial dilutions were made. The cells were incubated for 24h and then fixed with 2% PFA for 20mins at RT. Cells were washed twice with 1X PBS upon PFA removal and then permeabilized for 15mins with 0.5% Triton-X in PBS. Blocking was carried out with 1:2 dilution of Li-Cor blocking buffer (Li-Cor) in PBS for 30mins at RT. Cells were then incubated with primary antibody against dsRNA, J2 (Scicons: 10010500, 1:1000) for 1h at RT. Cells were washed three times with PBS containing 0.1% Tween 20. Cells were then incubated with secondary antibody (anti-mouse CW800) and DNA dye Draq5 (Abcam) diluted 1:10.000 in blocking buffer for 1h at RT. Cells were again washed three times with PBS containing 0.1% Tween 20. The plate was then imaged on a LICOR (Li-Cor) imager.

Indirect immunofluorescence assay
Cells were seeded on a 48-well plate at 50.000 cells/well. Cells were fixed in 4% paraformaldehyde (PFA) for 20 mins at RT 24 hours post infection. Cells were washed in 1X PBS and permeabilized in 0.5% Triton-X for 15 mins at RT. 30 minutes of blocking were carried out using 3% BSA-PBS at RT. Mouse monoclonal antibody against SARS-CoV-2 Nucleocapsid (NC) protein (Sino biologicals MM05) as primary antibody was diluted in 1% BSA-phosphate-buffered saline (PBS) and incubated for 1h at RT. Cells were washed with 1X PBS three times and incubated with secondary antibodies conjugated with AF488 (Molecular Probes) and DAPI for 30-45 mins at RT. Cells were washed in 1X PBS three times and maintained in PBS. Cells were imaged on a Zeiss Cell Discoverer 7microscope to quantify the number of infected cells relative to the number of nuclei.

Western blot
Cells were rinsed once with 1X PBS and lysed with 1X RIPA (150 mM sodium chloride, 1.0% Triton X-100, 0.5% sodium deoxycholate, 0.1% sodium dodecyl sulfate (SDS), 50 mM Tris, pH 8.0 with phosphatase and protease inhibitors (Sigma-Aldrich)) for 5 mins at room temperature (RT). Lysates were collected and equal protein amounts were separated by SDS-PAGE and blotted onto a nitrocellulose membrane by wet-blotting (Bio-Rad). Membranes were blocked with 5% BSA in TBS containing 0.1% Tween 20 (TBS-T) for two hours at RT. Primary antibodies against SPINT2 (Sigma Aldrich HPA011101, 1:500) and α-tubulin (Sigma Aldrich T9026, 1:1000) were diluted in blocking buffer and incubated overnight at 4˚C. Membranes were washed 3X in TBS-T for 15 mins at RT. Secondary antibodies were diluted in blocking buffer and incubated at RT for 1 hour with rocking. Membranes were washed 3X in TBS-T for 15 mins at RT. HRP detection reagent (GE Healthcare) was mixed 1:1 and incubated at RT for 5 mins. Membranes were exposed to film and developed.

TMPRSS2 inhibition assay
Calu-3 cells were seeded onto a 48-well plate 24 hours prior to treatment. Cells were incubated with 5 μM of Camostat mesylate (Sigma Aldrich, SML0057) for 30 mins prior to virus infection and throughout the 1 hour virus infection. After infection, fresh media containing Camostat mesylate was added to the cells and incubated for another 24h. Cells were then harvested 24 hpi.
Statistics and computational analyses and statistics. In order to quantify infected cells from indirect immunofluorescent stained samples, ilastik 1.2.0 was used on DAPI images to generate a mask representing each nucleus as an individual object. These masks were used on CellProfiler 3.1.9 to measure the intensity of the conjugated secondary antibodies in each nucleus. A threshold was set based on the basal fluorescence of non infected samples, and all nuclei with a higher fluorescence were considered infected cells.

Calu-3 and H1299 cells preprocessing
For Calu-3 cells we filter out cells with an extremely high number of detected genes (>50,000) which probably corresponds to doublets. In H1299, since few cells were detected to be infected, because this line is non-permissive, in order to obtain DEGs we defined infected cells as those with cumulative sum of viral genes expression >0.
Assessing non-viral induced permissivity signaturesAs we wanted to differentiate between permissivity and infection signatures, we first looked for differentially expressed genes in SARS-CoV-2 permissive vs non-permissive cell lines and then we removed all the genes which were up-or down-regulated during infection (Fig 2A). We performed Differential Expression Analysis using Seurat [70] (DEA) between Calu-3 and H1299 cells in non-infected mock cells at 4 hours of culture (z). Then, we obtained DEGs of Calu-3 infected vs mock at 12 hours post infection (x); we did the same with H1299 infected cells vs non-infected cells (H1299 infected cells were defined as explained above). In all DEA we set a Log Fold Change (FC) = 0.25 threshold. Finally, we removed these infection signatures from the DEGs of Calu-3 vs H1299 to obtain the permissivity gene signature (i'). A gene set enrichment analysis was performed to evaluate the composition of removed genes using enrichR.

Ranking genes using RF and pathway enrichment analysis
A Random Forest (RF) regression analysis was performed using the normalized gene expression of the permissivity signature to predict the cumulative sum of the expression of viral genes in Calu-3 cells at 12 hpi. We trained the RF using a random subsample of 75% and tested the results with the remaining set. Next, we estimated the feature importance for each of the permissivity signature genes and performed enrichment analysis on the top 25% ranked genes.

Scoring permissivity signatures
For the scoring of cells based on the permissivity signature among cell types in the HCL dataset, we used the top 25% RF ranked genes and applied the AddModuleScore function of Seurat setting nbin = 100.

SPINT2 expression correlation to viral gene expression
For the translatome correlation analysis, the summed intensity normalized values were used as provided in the study [16]. In order to compute the correlations of SPRGs (S3 Table) to the viral reads in the scRNA-seq data from Chua RL et al, 2020 the raw count matrices were extracted from the Seurat object provided by the authors, splitted by sample and then imputed using scimpute [71] with the following parameters: drop_thr = 0.5 and Kcluster equal to the number of annotated cell types in each matrix. The imputed matrices were then merged and log2 normalized. Finally, correlations were performed restricted to infected cells (viral read counts>0). In the bulk RNA-seq data from deceased COVID-19 patients log2 RPM of normalized counts are used. In both cases correlation to viral genes were carried out using spearman coefficients.

scRNA-seq data preprocessing
In order to have a standardized workflow for the processing of scRNA-seq data we used SCT normalization using the Seurat workflow for every dataset except for Human Cell Landscape data where log2 normalization and scaling were performed since this dataset is large and using SCT was unpractical. HCC data were downloaded from GEO (GSE149614) and reprocessed. We used the Louvain method implemented in Seurat for community detection and clusters were identified by using tissue markers. We used the markers used to characterize cell types from an independent scRNA-seq human liver atlas [72] and using these markers identified clusters of epithelial, endothelial, hepatocytes, Kupffer and NK cells. For kidney, colon, prostate tumors, pancreatic cells from T2D, PBMC and Airways epithelium from SARS-CoV-2 patients's datasets and the annotations were used as provided in the corresponding publications (see Data and script availability section).

SPINT2 expression in TCGA tumors data
TPM normalized counts from tumor samples in TCGA were downloaded (https://www. cancer.gov/tcga). Our analysis was restricted to tissue and sample matched tumor and normal samples only. Difference in average expression was estimated using Wilcoxon Test with Holm correction.

Inference of transcription binding sites
A footprinting analysis was carried out using the TOBIAS pipeline [73] with a default parameters setting of MACS-nomodel-shift -100 -extsize 200 -broad. Then, we extracted the inferred Transcription Factor Binding Sites (TFBS) for those TF with activities found to be positively correlated to both SPINT2 and TMPRSS2 using the single cell RNA-seq data. TFBS were visualised using the PlotTracks TOBIAS function and the network was built in Cytoscape [74]. Edges in the network represent TF binding scores.  Table. List of DEG between infected and mock-infected cells in Calu-3 and H1299 cell lines. The 344 genes are likely to represent response genes upon infection, which we filter out of our permissivity signature. (XLSX) S2 Table. List of genes in the permissivity signature. The table lists the statistical parameters of the differential expression analysis between the mock-infected Calu-3 and H1299 cell lines. (XLSX)