Long non-coding RNAs (lncRNAs) NEAT1 and MALAT1 are differentially expressed in severe COVID-19 patients: An integrated single cell analysis

Hyperactive and damaging inflammation is a hallmark of severe rather than mild COVID-19 syndrome. To uncover key inflammatory differentiators between severe and mild COVID-19 disease, we applied an unbiased single-cell transcriptomic analysis. We integrated a bronchoalveolar lavage (BAL) dataset with a peripheral blood mononuclear cell dataset (PBMC) and analyzed the combined cell population, focusing on genes associated with disease severity. Distinct cell populations were detected in both BAL and PBMC where the immunomodulatory long non-coding RNAs (lncRNAs) NEAT1 and MALAT1 were highly differentially expressed between mild and severe patients. The detection of other severity associated genes involved in cellular stress response and apoptosis regulation suggests that the pro-inflammatory functions of these lncRNAs may foster cell stress and damage. The lncRNAs NEAT1 and MALAT1 are potential components of immune dysregulation in COVID-19 that may provide targets for severity related diagnostic measures or therapy.

Symptoms and progression of COVID-19 vary widely (3) as some patients may be asymptomatic while others exhibit disease with varying severity.(4,5) Common symptoms include fever, cough, and fatigue, which generally appear 2 to 14 days after exposure, (6,7) while rarer symptoms include dyspnea, headache/dizziness, nausea, diarrhea, and hemoptysis.(8) Severe cases of COVID-19 are distinguished by strong inflammatory responses that can lead to multiorgan damage and death.(9) The mechanisms that separate mild and severe disease remain poorly understood.
After viral exposure, the inflammatory response to COVID-19 commences with signaling cascades that lead to secretion of type I interferons, cytokines and chemokines.(10)This initial exposure also activates inflammasomes, multimeric protein complexes that play an important role in triggering inflammation and the subsequent initiation of an adaptive immune response.(11)(12)(13) The Nod-like receptor pyrin domain-containing 3 (NLRP3) inflammasome is a major cause of cytokine storm associated the with clinical manifestations of severe COVID-19 disease.(14) Furthermore, coronavirus viroporin proteins activate the NLRP3 inflammasome which regulates the secretion of IL-1β and IL-18.(15) Pyroptosis, a programmed cell death pathway that leads to immune cell depletion, is also regulated by activation of the NLRP3 inflammasome and is an important mechanism of viral pathogenesis in both SARS-CoV-2 and SARS-CoV.(16)(17)(18) These studies suggest that investigation of inflammasome regulation may elucidate understanding of COVID-19 disease pathophysiology.
Single-cell studies of COVID-19 patients have found dysregulated immune compartments in the respiratory tract as well as peripheral blood.(19)(20)(21)(22)(23)(24) However, it is challenging to directly compare results across studies in different tissues due to differences in cell cluster identification between physiological compartments.We postulated that simultaneous analysis of severe versus mild COVID-19 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 31, 2021.; https://doi.org/10.1101/2021.03.26.21254445 doi: medRxiv preprint patients across respiratory and peripheral immune compartments using integrated clustering would uncover overall effectors of immune dysregulation in the COVID-19 immune response.To achieve this goal, we integrated single-cell datasets, one from bronchoalveolar lavage (BAL) and one from peripheral blood mononuclear cells (PBMC), (19,20) in order to examine disease transcriptomics across severities as well as between local and peripheral cellular environments.We utilized an unbiased analytical strategy that was agnostic to specific gene functions and focused on genes with severity dependent expression across different cell types.Taken together, we uncovered genes contributing to the dysregulated COVID-19 immune response prominent in severe relative to mild disease.Moreover, we identified cell types where these inflammatory regulators manifest in local and peripheral compartments. .

Dataset preprocessing and integration
We selected publicly available single-cell datasets with patient severity metrics and ample sequencing depth to pass our quality filters for integration into our combined dataset.The raw count matrices for BAL cells and PBMC cells were downloaded from the NCBI Gene Expression Omnibus (accession number GSE145926) and the COVID-19 Cell Atlas (https:/www.covid19cellatlas.org/#wilk20),respectively.Patients who were mechanically ventilated or had PaO 2 /FiO 2 I≤I300 mmHg indicating hypoxemia consistent with acute respiratory distress syndrome (ARDS) (25) were designated as severe patients while all others were considered to have mild disease (Table S1).The BAL dataset contained three healthy controls, while the PBMC dataset contained six (Table S2).Both datasets were preprocessed using the R program Seurat.( 26) Briefly, cells were filtered to only include cells with unique molecular identifier (UMI) counts greater than 1000, gene count between 200 and 6000, and less than 10% of genes mapping to mitochondrial genes.The function SCTransform from the Seurat package was applied to each dataset separately to regress out technical variability as well as the percentage of mitochondrial gene expression.(27) Transformed BAL and PBMC datasets were integrated with 3000 integration features and 50 integration anchors as recommended in Seurat.(28) We found that the "M3" mild patient sample from the BAL dataset contained only 369 total cells, while every other patient sample for BAL or PBMC had at least 1200 cells.The M3 sample was removed before differential expression analysis to avoid skewing results due to extremely low cell counts.

Clustering and Identification
The integrated dataset was dimension reduced using principal component analysis (PCA) and clustered with a resolution set to 0.5 and including the top 30 principle components.The clustering was .visualized using uniform manifold approximation and projection (UMAP).(29) The raw integrated dataset was normalized by applying SCTransform to the full integrated dataset.This normalized count matrix is utilized for all subsequent analysis.Marker genes for each cluster were computed using the FindAllMarkers function with the Model-based Analysis of Single-cell Transcriptomics algorithm (MAST) for differential expression with UMI count as a latent variable.(30,31) Cluster markers were then inspected and labeled according to known cell markers(Supplemental file 1).(32,33) A second round of clustering with a resolution of 1 was then conducted to further classify subtypes of identified cells.
Clusters with fewer than 300 cells were reassigned to larger clusters using Seurat integration label transfer.Cluster identities were scored and verified using a signature matrix generated from flow cytometry sorted RNA-seq data of immune cells.(34)Plots of cell clusters and key cell type markers were generated using Seurat's plotting functions.

Cell Proportions
Cell types were tallied for each sample, and the percentage abundance of each cell type was calculated.Cell proportions for healthy controls, mild patients, and severe patients were compared using a two-sided pairwise test of equal proportion with false discovery rate (FDR) p-value adjustment.
The resulting proportions were plotted using the ggplot2 R package.(35)

Differential Expression
For each cell type, differentially expressed genes (DEGs) were calculated separately for BAL and PBMC cells using MAST with UMI count as a latent variable.To support MAST differential gene expression analysis between three sample groups, the Seurat built-in differential expression function FindMarkers was modified to generating iterations of the hurdle model corresponding to each set of two compared conditions.Genes from each cell type that were differentially expressed across all three comparisons between healthy controls, mild, and severe patients were tallied.For BAL, only genes with FDR adjusted p-values less than 1e-7 across all comparisons were considered (17.4% of all DEGs in BAL), while PBMC DEGs were considered if all FDR adjusted p-values across comparisons for that gene were less than 0.05 (16.4% of all DEGs in PBMCs).The difference in p-value threshold was used to filter a similar proportion of genes from BAL and PBMC data due to the smaller number of DEGs detected in PBMCs (200-400 in PBMCs, 1000+ in BAL).These highly differentially expressed genes were further filtered by removing genes that were not differentially expressed across all conditions in at least 5 different cell types (1/3 of our total cell types).The lists of PBMC and BAL highly differentially expressed genes were then combined, removing duplicates.Some of the genes identified were highly cell type specific.These genes also had the highest residual variance.Since these genes represent intrinsic cell type differences rather than biologically interesting differentially expressed genes, they were removed.To determine this filter threshold, the residual variance of the top 100 variable genes were plotted in decreasing order to determine the "elbow" point where the variance stops decreasing at a rapid rate.(FigureS1) This resulted in the 21 most variable genes being removed.We termed the 50 remaining differentially expressed genes recurrent differentially expressed genes (rDEGs) since they were found in multiple cell types and showed differential expression between patients and healthy controls as well as between severities.The rDEG expression data was exported to Monocle 3 to generate modules for gene ontology (GO) enrichment analysis.We generated 4 modules from the 50 rDEGs using the find_gene_modules function in Monocle 3 with 30 principle components and a resolution of 0.8.(36-38)Differential module expression was calculated using ANOVA using aggregated module expression levels and processed with a tukey posthoc test.Module gene ontology enrichment was computed topGO with default settings.(39) Module and gene level plots were generated using the R packages ggplot2, ComplexHeatmap, and Circlize.(35,40,41) .

Validation
We compared rDEG trends from our analysis with two additional COVID-19 datasets, one with nasopharyngeal data to compare against the local inflammatory environment of our BAL data, and one in PBMCs to compare against the peripheral environment in our PBMC data.(22,23) Cell types from our analysis were transferred onto the validation datasets using Seurat to identify the validation cells for comparison.Each validation dataset was filtered and preprocessed separately using the same parameters as the main dataset.After preprocessing and label transfer, DEGs were generated independently for each validation set for our transferred cell types.We compared the rDEGs we focused on in the manuscript with DEG results from each validation set, noting whether the same DEG was detected and whether the direction of change was the same.

:
Integrated PBMC and BAL analysis identified 26 clusters consolidated into 15 cell types.
After quality filtering, we recovered 100,739 single cell transcriptomes.From these, we recovered 26 cell clusters from Seurat.(Figure S2).Since we identified cell types from the integrated dataset containing both PBMC and BAL cells, we were able to examine how each of our cell clusters behaves across both physiological compartments.The clusters did not aggregate based on sample type or patient condition (Figure S2), indicating successful integration clustering of the two datasets.We also identified two clusters of CD4+ T cells (CD4 and IL7R), one cluster of T regulatory cells (IL2RA and LAG3), (46,47) and three clusters of CD8+ T cells (CD8A).Two of the CD8+ T cell clusters were labeled as CD8+ memory cells due to their high CCL5 and GZMH expression.(48) Other identified immune cell clusters include natural killer (NK) cells (SPON2 and NCAM1), neutrophils (NAMPT), (34) naïve B cells (MS4A1), plasmablasts (IGJ and MZB1), plasmacytoid dendritic cells (IRF8 and PLD4), (49,50) and myeloid dendritic cells (CD1C)and LGALS2).(34,51) In addition to immune cell types, we also found two epithelial clusters.One contained a mixture of epithelial and granulocyte markers including KRT19 and SLPI (34,52) while the other also contained the additional markers PPIL6 and CFAP300 for pneumocytes and ciliary cells, respectively.(53,54) The 26 clusters were consolidated into 15 cell types (Figure S2) to streamline further analysis by combining clusters that are not distinguishable when examining their canonical marker expression levels.This consolidation also prevents cell groups with many clusters such as the M1 MoMa group from overshadowing those with fewer clusters in our subsequent differential gene expression ranking analysis.Cells within each cluster were compared against their original identifications from both their respective dataset, and most clusters identifications were consistent.The one exception was the .intermediate MoMa type, which was predominantly composed of macrophages from BAL, but also contained a mixture of monocytes, CD4+ and CD8+ T cell identifications from PBMCs (Table S3).
Proinflammatory cell types are enriched in severe COVID-19 patients.
Cell type proportions within BAL and PBMC sample groups showed distinct differences between patients and healthy controls as well as between mild and severe patients (Figure 1).In BAL samples of patients with severe disease, proinflammatory cells such as M1 MoMa and neutrophils showed increased abundance (p<<<.01,Table S4), while immunoregulatory cell types including M2, intermediate MoMa, and Tregs were less abundant (p<<<.01).BAL samples of patients with mild disease showed decreased abundance of M1 MoMa and neutrophils and increased Tregs and CD8+ memory T cells compared to healthy controls and severe patients.that are significant versus their respective controls are labeled with a triangle (p<.05).Conditions that are significant between severe and both mild as well as healthy controls are labeled with a star (p<.05).Conditions that are significant between severe and mild, but not between severe and healthy controls are labeled with a diamond (p<.05).
In PBMCs, the trends in M1 MoMa and neutrophils are reversed.Tregs and CD8+ memory T cells are less abundant in PBMCs of mild patients.These opposing patterns may illustrate heavy recruitment of the cell types abundant in BAL, resulting in depletion in the PBMCs that results in an increase in relative abundance of non-recruited cells in PBMCs.Mild patients also showed an increase of intermediate MoMa in PBMCs, reinforcing the pattern of relative increases in abundance of immunoregulatory cell types in mild patients in both BAL and PBMC compartments.
We identified an average of 1158 DEGs per cell type for BAL samples, and 260 DEGs per cell type for PBMC samples.(TableS5,S6) After filtering, we identified 50 rDEGs across our 15 cell types that formed 4 distinct modules (Figure 2).Module 1 showed significant GO enrichment for developmental processes (p<.05) but did not show differential expression between conditions.Module 2 showed significance for viral defense and Type I interferon GO terms.Most genes in this module were interferon induced genes including the first three IFIT family genes, ISG15, CXCL10, and MX1.This module was significantly overexpressed in BAL of all patients versus healthy controls (p<.01).Stress response, apoptosis, and viral entry related genes show severity dependent expression.
We analyzed individual rDEGs in each of our five most abundant cell types: M1 MoMa, M2 MoMa, CD4+ T cells, NK cells, and CD8+ memory T cells (Figure 3).This analysis confirmed previous reports of downregulation of HLA genes (20,58) such as HLA-DRA and HLA-DRB5 in COVID-19 patients, with severe patients showing the most downregulation.We also saw upregulation of interferon related genes including MX1, and IFIT1-3.This increase was greatest in mild patients, correlating with previous findings of immune exhaustion (Table S7).(59,60) Further examination showed additional severity dependent patterns of differential expression of transcripts related to the stress response, cell death, and viral entry in cell types involved in the viral immune response. .The pro-inflammatory lncRNA NEAT1 passed our rDEG threshold in BAL samples for nine different cell types, more than any other gene in our analysis.These cell types include M1, M2 and intermediate MoMa, NK cells, CD4+ T cells, CD8+ memory T cells, naïve B cells, myeloid dendritic cells, and epithelium/basal cells (Figure 4).NEAT1 is localized to the site of infection and inflammation since it is not differentially expressed in PBMCs.Additionally, among rDEGs, it has one of the highest averages in log2-fold change between severe and mild patients (Figure 4).NEAT1 is overexpressed in BAL of severe patients and underexpressed in mild patients.The epithelial/basal cell group is the exception where mild groups also show NEAT1 overexpression over healthy controls, but expression is still significantly higher in severe patients versus mild patients.

Validation of rDEG expression patterns.
We projected the cell type classifications in our analysis via Seurat's label transfer feature to two other COVID-19 datasets, one with nasopharyngeal samples (22) and one with PBMC samples (23).
Comparison of cell labels from the original datasets versus our transferred labels shows general agreement among the cell IDs, allowing us to use our cell type labels for direct comparison of rDEG patterns in the validation data (Figure S3&S4).We compared rDEG expression patterns of the genes .analyzed in the previous two sections in the same five cell types.Among each gene's statistically significant changes in our analysis between healthy, mild, and severe cases, 36.3% were significant in our validation cohorts.However, more than two thirds of the non-significant findings in validation were from comparisons with healthy control in the nasopharyngeal dataset.This is likely due to the small number of cells recovered from these controls after filtering.Only 1148 cells were recovered from control samples after filtering compared to 35715 and 25546 for moderate and severe cases respectively.Among the findings that were significant in both our data and in validation, we found that 79% were significant in the same direction.Notably, NEAT1 showed 100% agreement in validation of BAL, while MALAT1 showed 100% agreement in validation of PBMCs (Table S8,S9 & S10).Discussion: Our analysis of BAL and PBMC single cell data in COVID-19 patients has elucidated key differences between mild and severe disease.We were able to combine cells from both PBMC and BAL in an integrated analysis.Although our intermediate MoMa group had a mixed group of PBMC cells, our overall identifications were consistent across both datasets.Furthermore, the cells in the intermediate MoMa group consisted of cells with weak expression of a wide range of canonical markers.These cells may be intermediate immune cells from different lineages that share a similar transcriptomic profile.By conducting analysis simultaneously on cells from the local infection site in the lung as well as the peripheral immune system, we contrast how the disease manifests and interacts across both compartments.We have identified differentially expressed genes that vary with severity, are highly differentially expressed across multiple cell types, and represent key functions related to the hyperinflammatory disease state.NEAT1 was the most widely differentially expressed gene across cell types within BAL; it also exhibited a high log-fold change that correlated with disease severity.The ubiquity of NEAT1, its specific localization to BAL cells, and pro-inflammatory functions suggests that it may be a key mediator of the inflammation seen in severe COVID-19.NEAT1 is a well characterized   Supplementary Excel Tables S1 Table : Key characteristics of patients within each dataset.Patient information from the BAL and PBMC cohorts used in this analysis.Patients who were intubated or had PaO2/FiO2 Y≤Y300 mmHg were classified as severe.In BAL, patient "Mild 3" had only 369 cells recovered after filtering and was only used for initial clustering.Most patients in the BAL cohort.Exact ages were not available in the PBMC cohort.The first patient in this cohort was sampled twice, once while classified as a mild patient, and .

From the 26
clusters, 11 were identified as monocyte/macrophage (Mo/Ma) clusters.Since our clusters contain both monocytes from the PBMC sample as well as macrophages from the BAL, we designated them as MoMa clusters.Six of the MoMa clusters showed classically M1 associated transcriptomes with increased expression of VCAN, FCN1 and CD14 expression.(42,43)These clusters .also expressed other pro-inflammatory factors such as S100A8, CCL2, CCL3, CCL7, and CCL8.(44,45)Three other MoMa clusters showed M2 polarization with increased FN1 expression along with decreased VCAN and FCN1 expression.These M2 MoMa clusters also expressed Th2 associated inflammatory factors such as MRC1 and CCL18.(45)All MoMa clusters expressed FCGR3A (CD16a).(42)Two additional clusters were labeled as intermediate MoMa because they did not show distinct transcriptomes corresponding to either M1 or M2 groups.One intermediate MoMa cluster overexpressed MALAT1, while the other overexpressed metallothionein proteins including MT1F and MT1G.

Fig 1 :
Fig 1: Severe patients show increased proportions of proinflammatory cell types.A. Overall average abundance of each major cell type for all cells.B. Per patient abundance of all major cell types for all cells.C. Per cohort (bronchoaveolar lavage (BAL) and

Fig 2 :
Fig 2: rDEGs grouped into four distinct modules with immune regulation enriched GO terms.A. Heatmap of modules

Fig 3 :
Fig 3: rDEG expression in most abundant cell types highlights differential immune regulation between mild and severe patients in both BAL and PBMC cohorts. A. Heatmaps visualizing rDEGs within each of the top five most abundant cell types in

Fig 4 :
Fig 4: lncRNAs NEAT1 and MALAT1 are strongly differentially expressed between severe and mild patients and represent key inflammatory regulators in BAL and PBMC respectively.A. Violin plots showing overall expression level density across patient

.
activator of the NLRP3 inflammasome, as well as NLRC4 and AIM2 inflammasomes, which in turn amplify the inflammatory response.(55)However, an overactive immune response contributes to lasting tissue damage in severe COVID-19 disease.Intense inflammation through activation of the NLRP3 inflammasome can also lead to pyroptosis, driven by the upregulation of NEAT1.(55,61)These highly inflammatory and damaging effects of NEAT1 illustrate how overexpression in severe patients might lead to the inflammatory tissue damage seen in severe COVID-19.MALAT1 also exerts various immunological effects including the mediation of NLRP3 inflammasome activation.(62,63)MALAT1 has been linked to M1-like activity in macrophages, promoting inflammation.(64)Our finding that MALAT1 is overexpressed in BAL MoMa of severe versus mild patients suggests that it might be involved in precipitating a shift towards M1 macrophages that exacerbates inflammation.This is further supported by our findings that severe patients show expansion of M1 macrophages and decrease of M2 and intermediate macrophages in BAL, while mild patients show decrease of M1 macrophages.Furthermore, MALAT1 was overexpressed in CD4+ T cells of mild patients.This is also reflective of MALAT1's protective role in T cells.Loss of MALAT1 expression has been shown to push T cells towards the inflammatory Th1 and Th17 phenotype while also decreasing Treg differentiation.(65)This function matches our observed increase in abundance of Tregs in mild patients.Thus, the upregulation of MALAT1 we see in mild patients may be contributing to the more subdued immune response observed in these patients.The severity dependent differential expression of other genes in our analysis provides further evidence of increased cellular stress reflective of a NEAT1 and MALAT1 enhanced hyperinflammatory state.NF-κB is induced in COVID-19 infection.(66)Although we did not detect differential activity of NF-κB directly, we found upregulation of its inhibitor NFKBIA in BAL of severe patients which suggests a feedback response to strong NF-κB activity.NFKBIA's downregulation in PBMCs of severe patients may be due to localization of cells expressing NFKBIA to the site of infection in attempts to regulate the .hyperactiveinflammatory state.(67)The upregulation of BCL2A1 and MTRNR2L12 is also indicative of extensive cellular stress.(68,69)While MTRNR2L12 is upregulated in both mild and severe disease, BCL2A1 is upregulated exclusively in severe disease.The increased activity of these anti-apoptotic genes, particularly in BAL of severe patients, shows additional evidence of the cellular stress induced by infection and inflammation.These genes may be responding to pyroptosis pathways triggered by inflammasome activation via NEAT1 and MALAT1.Further evidence of inflammatory cell damage is seen in the downregulation of NUPR1 in BAL of M1 and M2 macrophages of severe patients with upregulation in mild patients.Downregulation of this stress response gene has been shown to cause mitochondrial dysfunction and ROS production that can lead to cell death.(70)Lastly, our observation that CTSL, a protein crucial for COVID-19 viral entry is upregulated across multiple cell types in severe patients provides a potential initial mechanism for the induction of the NEAT1 and MALAT1 mediated inflammatory state through increased efficiency of viral entry.(57)Limitations in our study include the small sample size, variable clinical presentation and treatment.Additionally, time from presentation to sample collection varied across patients.The stratification of patients as severe or mild may also introduce unknown factors due to patient variability in presentation and classification.Although our validation shows promising reproduction of expression patterns, additional studies with more subjects and stringent recruiting and sample collection would further elucidate these findings.We have demonstrated a clear ensemble of differential gene activity associated with severe disease in COVID-19 infection that revolves around the lncRNAs NEAT1 and MALAT1.Their specific activity changes in severe patients coupled with inflammasome promoting functions, suggest important roles in the COVID-19 hyperinflammatory process.These findings indicate that NEAT1 and MALAT1 may be candidates for treatment targeting or biological marker exploration.

.
numeric log frequency of cells that fit each corresponding set of labels.The size of each dot represents the percentage of each transferred cell types which is represented by each original cell type.From left to right, the full cell type names from the nasopharyngeal dataset are: non-resident macrophage (nrMa), monocyte-derived macrophage (MoD-Ma), monocyte-derived dendritic cell (moDC), resident macrophage (rMa), cytotoxic T cell (CTL), regulatory T cell (Treg), natural killer T cell (NKT), proliferating natural killer T cell (NKT-p), natural killer cells (NK), neutrophils (Neu), B cells, plasmacytoid dendritic cell (pDC), basal cell, ciliated cell, differentiating ciliated cell, FOXN4+ epithelial cell, ionocyte, interferon responsive cell (IRC), mast cell (MC), epithelial outliers, secretory, differentiating secretory, squamous, and unknown epithelial.
S4 Figure: Transferred cell identities correspond to original cell identities from the PBMC validation set.Dot plot shows cells from the PBMC validation set, identified by their transferred labels on the Y axis and their original labels on the X axis.Color of each dot indicates the numeric log frequency of cells that fit each corresponding set of labels.The size of each dot represents the percentage of each transferred cell types which is represented by each original cell type.This dataset contained several cell types where the same label was applied to more than one subcluster, resulting in numeric suffixes for similar cell types.mDCs correspond to myeloid dendritic cells and pDCs correspond to plasmacytoid dendritic cells.