Integrative analysis of miRNA and mRNA sequencing data reveals potential regulatory mechanisms of ACE2 and TMPRSS2

Development of novel approaches for regulating the expression of angiotensin-converting enzyme 2 (ACE2) and transmembrane serine protease 2 (TMPRSS2) is becoming increasingly important within the context of the ongoing COVID-19 pandemic since these enzymes play a crucial role in cell infection. In this work we searched for putative ACE2 and TMPRSS2 expression regulation networks mediated by various miRNA isoforms (isomiR) across different human organs using publicly available paired miRNA/mRNA-sequencing data from The Cancer Genome Atlas (TCGA) project. As a result, we identified several miRNA families targeting ACE2 and TMPRSS2 genes in multiple tissues. In particular, we found that lysine-specific demethylase 5B (JARID1B), encoded by the KDM5B gene, can indirectly affect ACE2 / TMPRSS2 expression by repressing transcription of hsa-let-7e / hsa-mir-125a and hsa-mir-141 / hsa-miR-200 miRNA families which are targeting these genes.


Introduction
The COVID-19 pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) had a dramatic impact on the health of millions of people and became a challenge for the worldwide health systems [1,2]. High mortality rate of the infection is generally due to lung failure induced by the acute respiratory distress syndrome (ARDS) [3]. Furthermore, there are indications that SARS-CoV-2 can affect other organs including the digestive system [4], kidney [5] and brain [6]. The mechanism of host cell infection by SARS-CoV-2 is not completely understood and remains an active research topic. However, it became evident that the key players during the virus entry into the host cell are angiotensin-converting enzyme 2 (ACE2) and transmembrane serine protease 2 (TMPRSS2). Specifically, the interaction of PLOS ONE PLOS ONE | https://doi.org/10.1371/journal.pone.0235987 July 29, 2020 1 / 12 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 these membrane proteins with the viral spike protein (S-protein) is crucial for host-membrane fusion and endocytosis [7,8]. MicroRNA (miRNA) is a class of small (about 22 nt long) non-coding RNAs, whose main function is negative post-transcriptional and translational regulation of gene expression [9]. In most cases, this regulation is mediated by complementary binding of miRNA to the 3' UTR of the target mRNA and promoting its degradation and/or translation repression [10,11]. Furthermore, multiple reports have consistently shown that aberrant miRNA expression is associated with various pathologies including cancer [12][13][14], neurological [15,16] and cardiovascular diseases [17,18]. Finally, some host-and virus-encoded miRNAs can contribute to the process of infection by targeting certain mRNAs (both host and viral) [19,20].
Previous analysis of miRNA sequencing reads showed that miRNAs are usually present in various isoforms (isomiRs) which differ from each other by 1-3 nucleotides at the 5'-or 3'-termini of the molecules [21]. Importantly, multiple recent studies confirmed that isomiR expression profiles can deliver more insights as compared to canonical miRNAs profiles. Thus, in one recent report, the authors constructed a machine learning algorithm that allowed distinguishing thirty two different types of cancer based on the information about presence or absence of particular isomiRs but not canonical miRNAs [22]. It is also important to note that two 5'-isomiRs (i.e. isomiRs having different 5'-ends) of the same miRNA can have distinct set of target mRNAs due to the altered seed region [23,24].
In this work we explored the landscape of ACE2 and TMPRSS2 regulation mediated by miRNAs and isomiRs in different human organs using bioinformatic analysis of publicly available paired miRNA/mRNA-sequencing datasets. We observed negative correlation of various miRNA and mRNA expressions across a number of samples, what together with the predicted miRNA-binding domains allowed identifying and quantifying putative miRNA-mediated targeting of certain mRNAs [25]. These results can facilitate developing novel therapeutic approaches for preventing coronavirus infections and contribute to the fundamental mechanisms of virus-host interactions.

ACE2 and TMPRSS2 expression profiles across different organs
In order to examine expression profiles of ACE2 and TMPRSS2 in human tissues we used publicly available data from The Cancer Genome Atlas (TCGA; https://www.cancer.gov/tcga) which provides information on mRNA and miRNA expression in tumor as well as adjacent normal tissues for a variety of organs. Specifically, we collected data from 541 samples in total (11 different organs). We found that both enzymes were highly expressed in all analyzed tissues, see Fig 1 (numerical characteristics are listed in S1 Table). Thus, ACE2 showed the highest expression rate in the colon, kidney, stomach and liver, while in other organs its average expression level was 2-5 log 2 units less and stayed at approximately the same level. The TMPRSS2 expression was higher as compared to ACE2 almost in all analyzed tissues (average log 2 (fold change) = 3.18) reaching its peak in the prostate.

Landscape of interactions between isomiRs and ACE2 / TMPRSS2
To identify the putative interactions between isomiRs and ACE2 / TMPRSS2 genes in particular tissues we have applied the two-step procedure. Firstly, we used TargetScan software to generate a list of miRNAs that could potentially bind to the 3' UTR of target mRNAs. Next, we have performed a correlation analysis to identify highly expressed isomiRs having significant negative correlation with ACE2 / TMPRSS2 gene expression levels in each organ. As a result, we designated 10 and 23 isomiRs which could potentially target ACE2 and TMPRSS2, respectively. The highest number of ACE2 targeting isomiRs were found in kidney (6 isomiRs) and lung (3 isomiRs) tissues, while in the case of TMPRSS2 the highest number of regulating isomiRs were expressed in esophagus (11 isomiRs), stomach (8 isomiRs) and breast (6 iso-miRs) cells (Fig 2, correlations and p-values are presented in S2 Table).  The largest intersection corresponded to isomiRs regulating TMPRSS2 gene expression in esophagus and stomach (4 common isomiRs). Interestingly, some miRNAs (including hsa-miR-23b-3p, hsa-miR-335-3p and hsa-miR-452-5p) were expressed as both 5'-isomiRs (+1, -1 and +3 nucleotides respectively) and canonical forms. Furthermore, we identified two ACE2 / TMPRSS2 regulating miRNAs to be expressed in more than two tissues. The latter included hsa-miR-125a-5p (targeting ACE2 in esophagus, kidney and lung), as well as hsa-miR-199a-5p (regulating expression of TMPRSS2 in liver, stomach and uterine corpus, see Fig 3).
To further explore the interplay between different isomiRs and the corresponding mRNAs we have focused on genes hosting miRNAs in their introns. In total, we found 14 out of 33 isomiRs to be encoded within the introns in the sense orientation (see Table 1). Interestingly, hsa-miR-125a-5p (targeting ACE2 in lung, kidney and esophagus) and hsa-let-7e-5p (targeting TMPRSS2 in stomach and esophagus) were found to be encoded within the same intron of the SPACA6 gene.

Expression of JARID1B is neccesary for ACE2 and TMPRSS2 expression in the majority of human cells
To explore putative interactions between JARID1B and ACE2 / TMPRSS2 genes more deeply we analyzed publicly available single-cell RNA sequencing (scRNA-seq) data obtained from certain human organs (annotated and deposited by Sungnak et al. [26]). In general, the expression of all three genes was relatively low in most cell types which led to a high dropout rate. Specifically, the mean dropout rate (i.e. the ratio of the number of cells with no JARID1B, ACE2 or TMPRSS2 aligned reads to the total number of cells) across all cell types was 87.9% for KDM5B, 99.1% for ACE2 and 93.2% for TMPRSS2. Thus, a correlation between the expression of genes forming an arbitrary pair at a single-cell level is strongly biased due to a large fraction of cells with no reads.
To overcome the issue caused by high dropout rates we calculated the mean expression of each gene for all cell types from the explored datasets (see S3 Table). The latter was followed by a binarization of the obtained expression profiles. Specifically, a gene was considered "expressed" in a given cell type if its average expression in this cell type was greater than the first quartile of expressions across all cell types. As a result, the ACE2 gene was expressed in 114 out of 272 analyzed cells, while 100 cell types (87.7% of total) also expressed KDM5B. This observation indicates that the expression of KDM5B gene could be necessary for ACE2 expression (binomial test p = 2.2 × 10 −7 ) in the majority of cells. A similar conclusion can be done for KDM5B and TMPRSS2 genes since 135 out of 159 cell types (84.9%) expressing TMPRSS2 also expressed KDM5B (binomial test p = 1.61 × 10 −7 ).
Finally, we identified the cell types where expression levels of KDM5B, ACE2 and TMPRSS2 genes were higher than the corresponding upper quartiles (Table 2). Interestingly, the list of cell types having high expression levels of the aforementioned genes included: nasal cells (ciliated and secretory), bronchial cells (ciliated, secretory and basal) as well as ciliated and alveolar cells of lung parenchyma.

Discussion
In this work, we explored potential interactions between miRNAs and ACE2 / TMPRSS2 genes in various human organs. Along with the crucial role of both enzymes during SARS-CoV / SARS-CoV-2 entry into the cell, ACE2 and its targeting miRNAs have been also shown to affect the development of acute respiratory distress syndrome (ARDS). Thus, previous studies have indicated the critical role of ACE2 in acute lung injury, since ACE2 deficiency in the lungs enhanced the ARDS pathogenesis [27]. It was further shown that ACE2 gene becomes downregulated in the lungs during SARS-CoV infection [28], while another research team reported strong upregulation of hsa-miR-200c-3p / hsa-miR-141-3p cluster upon avian influenza virus H5N1 infection, presumably induced by viral proteins [29]. Interestingly, hsa-miR-200c-3p and hsa-miR-141-3p can directly target 3' UTR of ACE2 mRNA, since transfection of HEK293T cells with the corresponding miRNA mimics and inhibitors resulted in significant decrease or increase of ACE2 expression, respectively [29]. These results indicate the putative link between the viral infection and consecutive ARDS development caused by downregulation of ACE2 in the lung tissues. Furthermore, miRNA overexpression and knockdown experiments performed on HK-2 renal tubular epithelial cells indicated that hsa-miR-125b (which belongs to the same miRNA family as hsa-miR-125a) directly targets ACE2 mRNA [30].
Using the literature-curated interaction database we have found strong indications that the JARID1B gene can repress let-7e / miR-125a as well as miR-200 families presumably via epigenetic mechanisms. Thus, Mitra et al. [31] have previously provided experimental evidence that the JARID1B gene represses transcription of the let-7e and miR-125a by promoting H3K4me3 demethylation. Specifically, shRNA-mediated knockdown of JARID1B in MCF-7 and T47D cells led to a several-fold increase in expression of both miRNAs, while simultaneous ChIP analysis confirmed the H3K4 demethylation of the corresponding regulatory DNA sequences. In another report, Enkhbaatar et al. [32] showed that JARID1B represses transcription of miR-200 family by a similar mechanism. Specifically, the overexpression of JARID1B in lung cancer cell line A549 resulted in 3-fold decrease of miR-200a and miR-200c expression while JAR-ID1B knockdown led to 1.5-fold increase of their steady-state levels. Subsequent ChIP assay showed putative changes in the H3K4me3 methylation of the corresponding DNA regions. Therefore, our results go in accordance with the experimental data showing the existence of regulatory network involving let-7e / miR-125a / miR-200 families, histone demethylase JARID1B as well as ACE2 / TMPRSS2, and further suggest a new way for regulating ACE2 expression. Furthermore, our single-cell RNA sequencing data analysis strongly supports the existence of such interactions in other cells, and indicates that in the majority of human cells ACE2 and TMPRSS2 are not expressed without JARID1B. In particular, high expression levels of JARID1B, ACE2 and TMPRSS2 in human respiratory epithelium cells intimate that further investigation of the identified regulatory network could expand our understanding of the viral infection pathogenesis.
The potential interactions between ACE2 and histone modifiers such as HAT1, HDAC2, and JARID1B in the lung has been previously shown by Pinto and co-authors [33]. Furthermore, Gordon D. et al. suggested the inhibition of histone deacetylase 2 (HDAC2) to be used as a promising strategy for COVID-19 therapy [34]. Specifically, the authors described high-confidence interaction between HDAC2 protein and the main viral protease Nsp5, supposing that Nsp5 may inhibit HDAC2 transport into the nucleus thereby altering its functional activity. In addition, they suggest testing certain HDAC2 repressors including Valproic acid and Apicidin since targeting HDAC2 can result in antiviral activity. Interestingly, Valproic acid was also shown to directly inhibit JARID1B in human embryonic kidney cells (HEK 293) [35] and H9 human embryonic stem cells (H9 hESC) [36]. Thus, repression of histone modifiers can adjust the cell response to viral infection by engaging a complex regulatory mechanism involving ACE2 and TMPRSS2 genes. However, additional experiments are required to confirm the impact of the described network as well as drugs targeting the above mentioned enzymes.
Several miRNAs identified in this work have been previously reported to contribute to the viral infection process as well. Thus, nucleocapsid protein of the human coronavirus OC43 (HCoV-OC43) had been shown to bind to hsa-miR-9-5p, a prominent negative regulator of the transcription factor NF-κB [37]. The latter results in NF-κB activation and consequent alteration of the innate immune response. Finally, Mallick B. et al. [38] have shown that hsa-miR-98-5p targets 3' UTR of SARS-CoV Spike protein in bronchoalveolar stem cells. The authors further hypothesized that such interactions can be used by the virus to evade fast elimination by the immune system.
While several other miRNAs had been also reported to be aberrantly expressed during the coronavirus infection, they were not among the putative regulators of ACE2 or TMPRSS2 genes identified in this work. For instance, using deep sequencing of mouse lung miRNome upon SARS-CoV infection Peng and co-authors [39] showed that hsa-let-7f-5p, hsa-miR-139-3p, hsa-miR-139-5p were up-regulated directly after the infection in lung cells. A similar miRNA-seq experiment was performed on RNA isolated from Calu-3 human lung adenocarcinoma cells infected by MERS-CoV [40]. Four other miRNAs were aberrantly expressed, including down-regulated hsa-miR-98-5p as well as up-regulated hsa-miR-125a-5p, hsa-miR-125b-5p and hsa-miR-141-3p. Importantly, due to the structural similarity between MERS-CoV / SARS-CoV and SARS-CoV-2, we hypothesize that the described mRNA/miRNA interactions may contribute to maintenance of ACE2 / TMPRSS2 expression levels during coronavirus infection, thus regulating the infection process.  [41] was used to quantify mRNA expression, while reads normalized by upper quartile (R-UQ) were used as an isomiR expression scale, all values were log 2 -transformed. The expression of a 5'-isomiRs was calculated as a combined (summed up) value of expression of all isomiRs having the same 5'-end genomic coordinate according to miRBase v21 [42]. The transcripts having zero coverage in more than half of the analyzed samples for each tissue type were discarded. IsomiR nomenclature from [43] was used: number after a vertical bar denotes offset from 5'-end in the direction from 5'-to 3'-end. For example, hsa-miR-21-5p|+1 denotes isomiR which differs from the canonical form by the absence of one nucleotide from the 5'-end of the molecule.

IsomiR / miRNA analysis
TargetScan v7.2 software [44] was used to predict the isomiRs potentially targeting 3' UTR of ACE2 and TMPRSS2 genes. The entries having weighted context++ score below 0.8 quantile were removed from the downstream analysis. For each tissue type we calculated Spearman correlation coefficient between the expression of ACE2 / TMPRSS2 genes and the isomiRs predicted by TargetScan which were highly expressed in considered samples (20% of isomiRs with the highest median expression). The -0.3 correlation coefficient threshold and 0.05 pvalue threshold were applied on the obtained values. Hierarchical clustering of isomiRs across tissues was done using Jaccard metric.
The list of intragenic miRNAs and their host genes was constructed by the following procedure: 1. The list of miRNAs with their genomic coordinates (hsa.gff3 file from miRBase v21) was converted into BED format.
2. Human genome annotation was downloaded from Ensemble v97 (Homo_sapiens. GRCh38.97.chr.gff3 file for GRCh38.p12 version) [45]. This file was further split into gene and exon annotations. The resulting annotation files were converted into BED format.
3. The list of miRNAs was intersected with lists of genes and exons separately using bedtools intersect v2.26.0 [46]. Finally, exonic miRNAs were excluded from the intragenic miRNA list and the entries from different strands were filtered out from the resulting files.

Single-cell RNA sequencing data analysis
Annotated scRNA-seq files were downloaded from https://www.covid19cellatlas.org/ in � . h5ad format, see S3 Table for the list of file names and putative organs. Raw count values were normalized by the sum of counts in each cell and scaled using 10 4 factor. Obtained values were log 2 -transformed.

Software utilized
All code was written in Python 3 programming language with extensive use of Pandas [48] and NumPy [49] modules. Statistical analysis was performed using the SciPy [50]. Single-cell RNA sequencing data was processed using the Scanpy [51]. Plots were constructed using the Matplotlib / Seaborn [52]. All used data and source codes have been made available on GitHub (https://github.com/s-a-nersisyan/miRNA_ACE2_TMPRSS2).
Supporting information S1