Target-agnostic drug prediction integrated with medical record analysis uncovers differential associations of statins with increased survival in COVID-19 patients

Drug repurposing requires distinguishing established drug class targets from novel molecule-specific mechanisms and rapidly derisking their therapeutic potential in a time-critical manner, particularly in a pandemic scenario. In response to the challenge to rapidly identify treatment options for COVID-19, several studies reported that statins, as a drug class, reduce mortality in these patients. However, it is unknown if different statins exhibit consistent function or may have varying therapeutic benefit. A Bayesian network tool was used to predict drugs that shift the host transcriptomic response to SARS-CoV-2 infection towards a healthy state. Drugs were predicted using 14 RNA-sequencing datasets from 72 autopsy tissues and 465 COVID-19 patient samples or from cultured human cells and organoids infected with SARS-CoV-2. Top drug predictions included statins, which were then assessed using electronic medical records containing over 4,000 COVID-19 patients on statins to determine mortality risk in patients prescribed specific statins versus untreated matched controls. The same drugs were tested in Vero E6 cells infected with SARS-CoV-2 and human endothelial cells infected with a related OC43 coronavirus. Simvastatin was among the most highly predicted compounds (14/14 datasets) and five other statins, including atorvastatin, were predicted to be active in > 50% of analyses. Analysis of the clinical database revealed that reduced mortality risk was only observed in COVID-19 patients prescribed a subset of statins, including simvastatin and atorvastatin. In vitro testing of SARS-CoV-2 infected cells revealed simvastatin to be a potent direct inhibitor whereas most other statins were less effective. Simvastatin also inhibited OC43 infection and reduced cytokine production in endothelial cells. Statins may differ in their ability to sustain the lives of COVID-19 patients despite having a shared drug target and lipid-modifying mechanism of action. These findings highlight the value of target-agnostic drug prediction coupled with patient databases to identify and clinically evaluate non-obvious mechanisms and derisk and accelerate drug repurposing opportunities.


Introduction
The emergence of the COVID-19 pandemic presented an urgent need for new and effective therapeutics, and repurposing of approved drugs with known safety profiles offered a path to identify viable treatment options.Research programs that focused on drug repurposing for COVID-19 presented opportunities to broadly uncover and understand new features of existing drugs [1][2][3].Recent retrospective studies by members of our group and others have shown that COVID-19 patients taking drugs from one of the most prescribed drug classes in the world-statins-exhibit a reduced mortality rate, but these studies pooled all statin compounds (e.g., lovastatin, simvastatin, atorvastatin, etc.) together in their analyses [4][5][6].All statins are prescribed to lower lipid and cholesterol levels, and share a common mechanism involving inhibition of HMG-CoA reductase (HMGCR); however, statins are also known to have antiinflammatory and immunomodulatory properties, through mechanisms that involve several pathways [5][6][7][8][9][10][11], potentially by upregulating heme oxygenase-1 (HO-1) [7].In addition, while three retrospective studies that pooled all statins demonstrated a significant reduction in mortality risk, no improvement in mortality outcome could be detected in another study [10].This raises the possibility that different statins might differ in their ability to reduce morbidity and mortality in COVID-19 patients, which could influence the results of studies based on which drugs were included.Moreover, if true, it would be important to distribute this information widely because it could influence clinical decision-making with regards to statin selection during the current COVID-19 crisis [12,13].
Throughout the pandemic, multiple scientific teams predicted that existing drugs could be repurposed as potential COVID-19 therapeutics computationally through the application of high-throughput in silico screens based on artificial intelligence, network diffusion, or network proximity algorithms using the human interactome, SARS-CoV-2 targets, drug targets, docking structures, or biomedical literature as algorithmic inputs [1][2][3].These screens proposed hundreds of potential therapeutic options and led to further testing in SARS-CoV-2-infected culture systems and animal models [1].However, while in vitro and pre-clinical testing have offered promising predictions, clinical validation and translation of predicted compounds are much more challenging and few, if any, of these drugs proposed to be repurposed for COVID-19 have demonstrated clinical efficacy.Thus, there is a need for combining improved drug prediction capabilities, despite complex and often inadequately understood biology, with real world evidence, such as electronic health records (EHRs) [4,14], to better inform which predicted compounds should advance toward clinical evaluation (Fig 1A).A related approach has previously been used to identify repurposed drugs for coronary artery disease by integration of protein-protein interaction network proximity and large-scale patient-level longitudinal data [15].
With drug repurposing in mind, we used a Network Model for Causality-Aware Discovery (NeMoCAD) computational tool based on Bayesian statistical network modeling [16] to analyze transcriptomics signatures in tissue samples obtained from COVID-19 positive patients or SARS-CoV-2 infected human cell or organoid cultures to identify FDA-approved drugs that shift the host transcriptomic response to SARS-CoV-2 towards a healthy state (Fig 1B).This approach was agnostic as it was accomplished without an a priori defined drug target or mechanism of action.This computational analysis revealed that a subset of commonly administered statins was among the drugs most frequently predicted to revert the genome-wide gene expression profile of COVID-19 samples to that of a healthy state.Despite limited chemical diversity, statins induce a range of side effects that differ between individual statins [17,18] suggesting the potential for distinct biological activities outside their known shared HMGCR mechanism, which potentially could be harnessed for drug repurposing and eventual new drug development.To explore this possibility in a clinical setting, a retrospective analysis was carried out using a database containing EHRs of over 490,000 COVID-19 patients, more than 4,000 of which are from patients who are actively taking statins (Fig 1C).This analysis demonstrated that use of only a subset of statins, including simvastatin and atorvastatin, correlated with decreased morbidity and increased survival in COVID-19 patients, confirming hidden divergent activities within a seemingly homogeneous drug class.Experimental in vitro studies confirmed that the drug most frequently predicted to reverse the COVID-19 state and that correlated with decreased morbidity in EHRs-simvastatin-also potently inhibited infection of Vero6 cells by SARS-CoV2 in vitro.However, other statins were less effective, suggesting that individual statins might have molecule-specific activities beyond the shared HMGCR target and thus vary in their ability to protect COVID-19 patients.

Ethics statement
The EHR study was approved by the University of California, San Francisco, institutional review board.

Transcriptomic-based compound prediction for drug repurposing
The drug prediction software, NeMoCAD (Network Modeling for Causal Discovery), was used to predict compounds that would mimic the shift from a COVID-19-positive state to a control state [16].NeMoCAD is a drug repurposing algorithm that performs correlation analysis of transcriptional gene signatures and a Bayesian statistical analysis of a network comprised of drug-gene and drug-drug interactions to identify compounds capable of changing a transcriptional signature indicative of disease to a healthy state (e.g., reverting a disease state to a healthy state) [16].First, transcriptome-wide differential expression profiles are identified between two biological states in the input transcriptomic dataset (experimental or published) and this profile is used to define the target normalization signature (Fig 1B).We define this target signature as the subset of genes whose expression levels need to be reversed in order to revert one state to the other (e.g., normalize the diseased state).Then, pairwise analysis is carried out based on the comparison between gene expression profiles of compounds found in the LINCS database release v1 and the target normalization signature (Fig 1B).NeMoCAD computes multiple correlation statistics (e.g., Pearson correlation, cross entropy) across all differentially expressed genes as well as a combination score (e.g., Pearson correlation divided by the cross-entropy).The pairwise analyses implicitly assume genes are expressed independently of one another.
NeMoCAD then separately generates putative predictions that incorporate gene-gene dependencies based on Bayesian network analysis on a regulatory and drug-gene interaction network architecture defined using publicly available databases of gene-gene interactions based on single gene knockout datasets in human cells (KEGG, TRRUST) [19][20][21] and reference transcriptional signatures of drugs (LINCS, CTD) (Fig 1B) [22,23].NeMoCAD parses the perturbation signatures from the LINCS dataset, and estimates the conditional probability of upregulation of all LINCS genes given a certain drug is present (drug-gene probability).Druggene probabilities are averaged over the multiple treatment time points (6 and 24 hours) provided for each drug in LINCS.
NeMoCAD combines a directed unweighted network structure with interaction probabilities between drugs and genes for the connecting network edges.Together, these features generate a weighted directed graph consisting of all possible paths that connect at least 2 genes of interest from the relevant genes within the target transcriptomic normalization set and the drugs.This approach encodes the entire region of influence of a given list of genes and the drugs that can reverse their gene expression profiles in the desired manner.The network generated from database and transcriptomic information is then used as input for a message-passing algorithm (e.g., loopy belief propagation algorithm).The marginal probability distributions of drugs being "on" given the expression state of every gene are computed using the joint probability distribution.Drugs with a low probability of being "on" (probability < 0.5) are filtered out from prediction lists.Drugs are then ranked based on the combination score derived from the correlation metrics (Pearson correlation and cross entropy).As a robustness criterion, drug predictions were assessed at multiple fold change thresholds to identify drugs that are consistently associated with the subnetwork defined by the differential expression profile.Using 14 publicly available transcriptomic datasets derived from human patients, tissue samples, organoids, and cells (Table 1) [24][25][26][27][28][29][30], NeMoCAD identified transcriptome-wide differential expression profiles between the control and COVID-19 states for each dataset and defined a target normalization signature to mimic, which would shift the transcriptome from a COVID-19 disease to control state.The frequency of NeMoCAD predictions across input datasets was evaluated for each statin within the LINCS database and statins with LINCS gene expression probabilistic network profiles that most strongly correlate with the target normalization signature were identified.We assessed the frequency of predictions within the top 25% of drugs, ranked by combination score for each dataset.We selected the 25% threshold based on our prior experience using the NeMoCAD platform, during which we have typically selected predictions within the top quartile for further in vitro or in vivo screening.Prediction frequency was also stratified based on tissue source and type (Table 1).To understand differences in the reference transcriptional data that could influence drug predictions, we compared the LINCS drug-gene probability signatures for each statin by principal component analysis (PCA).Using the time-averaged drug-gene probabilities from LINCS for 12,328 genes across 7 statins of interest (atorvastatin, fluvastatin, lovastatin, pitavastatin, pravastatin, rosuvastatin, and simvastatin), we created a PCA object in the R package ggfortify using the autoplot function.The package ggplot2 was used for plotting customizations.R version 4.0.5 was used for all computations and plotting.

Viral infection of Vero6 cells with SARS-CoV-2 virus
All work with native SARS-CoV-2 virus was performed in a BSL3 laboratory and approved by our Institutional Biosafety Committee.All drug screens to assess SARS-CoV-2 inhibition and cytotoxicity were performed with Vero E6 (Vero6) cells (ATCC# CRL 1586) using published methods (Methods in S1 Text) [31].A curve fitting procedure was used to determine IC50 and CC50 values (Methods in S1 Text).

Viral infection and host response of HUVECs with the OC43 virus
To measure the impact of selected drugs on HCoV-OC43 infection, 96-well plates seeded with human umbilical vein endothelial cells (HUVECs) were infected with HCoV-OC43 and treated with drugs (Methods in S1 Text).Viral load, Hoechst fluorescence, and IP-10 measurements were measured and normalized to vehicle control samples for each assay.Each group was compared to vehicle controls using the Brown-Forsythe and Welch ANOVA tests and corrected for multiple comparisons using a Dunnett T3 test.

Visualizations
Plotting was performed in Prism 9 (GraphPad Software LLC) or in R versions 3.0.

Role of the funding source
Funding sources were not involved in study design, in the collection, analysis, and interpretation of data, or in the writing of the report.

Results
The NeMoCAD gene network analysis tool [16] was used to identify FDA-approved drugs predicted to normalize the COVID-19 gene expression profile based on transcriptomic signatures of human cells or organoids infected with SARS-CoV-2 as well as cells or tissues obtained from COVID-19 patients or healthy control subjects.NeMoCAD identified gene changes across the transcriptome, compared them with gene expression changes induced by approved drugs in existing databases (e.g., LINCS, KEGG, TRRUST, CTD), and then prioritized compounds based on their ability to shift the disease transcriptomic signature state back to a healthy state (Fig 1A).COVID-19 normalizing drugs were predicted based on 14 differential RNA-seq expression datasets (COVID-19 vs. healthy) from 12 independent transcriptomics studies (Table 1).Across all datasets, NeMoCAD prioritized a different number of drugs for each dataset (Table 1), with 172 drugs representing the intersection of all these results and therefore shared drugs relevant to all samples (Table 2).Of the 2,436 drugs we investigated across the 14 differential expression datasets, 1,477 of the 2,436 drugs were not predicted to normalize any disease signature.Across the 959 compounds predicted to normalize at least one disease signature, each drug was predicted on average by 8.1 of the 14 datasets.
Surprisingly, we found multiple statins that inhibit HMGCR to be predicted more frequently than expected by the average, with simvastatin predicted 14/14 times, pravastatin 13/14 times, and lovastatin 12/14 times (Fig 2A).Of the 9 statins included in the NIH LINCS program database, 8 were in the top 25% of drugs predicted for at least one dataset investigated (Fig 2B).Across all datasets, simvastatin and fluvastatin were most frequently among the top 25% of predicted compounds.We further assessed our predictions to understand how different types of input data might impact the types of compounds predicted.Stratification of the input datasets by sample source (COVID patient, autopsy sample, or cell culture/organoid) and tissue origin (lung/bronchi or other) revealed that simvastatin is frequently predicted across all dataset types (Fig 2C).In addition, atorvastatin is often predicted when cell culture and organoid samples are used as data inputs, whereas fluvastatin is commonly predicted in human autopsy samples, and lovastatin and pravastatin are predicted at an intermediate frequency using patient input data.Specific investigation of tissue origin revealed that simvastatin and fluvastatin are most frequently predicted when input datasets are derived from lung or bronchi tissue (Fig 2C).Simvastatin, atorvastatin, and lovastatin are also frequently predicted using samples from other non-lung tissues, including nasopharyngeal swabs, blood, liver, pancreas, and cardiac cells.This finding that statins might differ in their ability to suppress responses to SARS-CoV-2 infection and that these effects could be independent of their common lipid lowering activity induced us to explore whether different statins also exhibit disparate activities in COVID-19 patients.We used the Cerner Real World Data COVID-19 deidentified EHR database to assess the effects of various statins on the survival of COVID-19 patients who were prescribed these medications.This large database represents a diverse population of patients diagnosed with COVID-19 from January to September 2020 with a duration of follow-up of as long as 8 months in 87 health centers across the US.Among 70,308 eligible patients, we identified 4,330 patients with who were prescribed atorvastatin, lovastatin, pravastatin, rosuvastatin, or simvastatin (Fig 1C).There were no patients who were prescribed fluvastatin in this database.The remaining 65,978 patients had no history of statin exposure (control patients).Cohort characteristics are shown in Tables 3 and 4. As disease severity could vary between the medication exposed and unexposed groups, we accounted for the type of encounter (Urgent care, ER, Admission for Observation, or Inpatient) at the time of COVID diagnosis and found that after matching, there was adequate balance in the encounter type between the compared medication exposed and unexposed groups, with the absolute value of standardized mean difference (SMD) of less than 0.1 for encounter type (S1 and S2 Figs in S1 Text).Overall, the propensity score distributions and SMDs of all matched covariates between treated and control groups before and after matching showed adequate balance between groups after matching, with absolute SMD values of less than 0.1 for all covariates, including demographics, heart diseases and other COVID-19 comorbidities, as well as conditions for which statins are prescribed (S1 and S2 Figs in S1 Text).
Importantly, among individual statins, we found that only treatment with atorvastatin, rosuvastatin, or simvastatin was associated with a statistically significant decrease in the relative risk of death in statin-treated patients compared to matched controls (Table 5).The mortality rate among atorvastatin-treated patients was 16.1% (431 of 2676) versus 20.4% (545 of 2676) among matched untreated control patients, with a reduction of 14% in the RR (0.86 [95% CI, 0.83-0.93];Bonferroni adjusted p-value = 6.24E-05) (Table 5 and S1  Statins were also tested as part of a larger drug screening program in SARS-CoV-2-infected Vero6 cells.Within the statin drug class, simvastatin most potently inhibited infection with a half maximal inhibitory concentration (IC 50 ) of 0.8 μM and almost a 10-fold higher 50% cytotoxic concentration (CC 50 = 6.5 μM) (Fig 3A and 3B).The other statins were either unable to significantly inhibit SARS-CoV-2 infection in Vero6 cells or they were found to be toxic at doses required to see inhibitory effects (Fig 3A and S3 Fig in S1 Text).It is important to note that discordance observed between predictions made by NeMoCAD and SARS-CoV-2 inhibition in Vero6 cells is expected since NeMoCAD predicts drugs that will affect host response to infection, and not necessarily directly act on the virus to inhibit infection (e.g., reduce entry or replication).However, we also know that viral replication induces a host response and the transcriptional outcome of infection will always depend on interactions occurring within the virus-host system [32,33].Recent studies examining multiple coronavirus infections suggest that coronavirus infection signatures can be reversed in the host using known anti-SARS-CoV-2 inhibitors and that regulating cholesterol homeostasis and microtubule cytoskeleton organization in the host might contribute to antiviral efficacy [34,35].Therefore, we cannot completely decouple the effects of predicted drugs on viral inhibition versus more conventional host measurements, such as cytokine levels.Furthermore, we found that simvastatin also inhibited infection of HUVECs by a related coronavirus (OC43) (Fig 3C ) and potently reduced cytokine (IP-10) production without cytotoxic effects (S4 Fig in S1 Text).Similar results were observed with IL-6 and GM-CSF although the virus induced levels of these cytokines were variable.Thus, this particular statin appears to exhibit direct antiviral activity in addition to the HMGCR activity it shares with the other stains.

Discussion
Taken together, these data show that in silico prediction based on transcriptomics datasets from human patients, tissues, and cells combined with clinical database analyses offer a useful approach for identifying and validating non-obvious effects of FDA-approved drugs, thereby enabling rapid repurposing of these compounds.With multiple lines of evidence coming Past retrospective studies suggested that patients prescribed drugs within the statin class have an overall lower risk of mortality from COVID-19 [4,6,8], although this was not observed in one study [10].Predictions by NeMoCAD suggested that statins may differ in their ability to induce a shift from COVID-19 to healthy states, which could in part explain differences in results between these studies if the statin types differed [4][5][6]10].Specifically, NeMoCAD predicted that simvastatin, fluvastatin, and atorvastatin were the most likely statins to normalize the COVID-19 gene expression profile.Indeed, when we analyzed mortality in a large EHR database of patients with COVID-19, we confirmed that there are differences in the mortality risks of COVID-19 patients prescribed the different statins compared to their respective matched control cohort, with simvastatin, atorvastatin, and rosuvastatin associated with a significant reduction in the relative risk of death.We were unable to find a statistically significant

PLOS COMPUTATIONAL BIOLOGY
difference in mortality risk among patients prescribed lovastatin or pravastatin represented in our EHR database compared to their respective matched control cohorts.Unfortunately, there were no patients who took fluvastatin and only 13 who took pravastatin in our EHR database, so we cannot make any conclusions about the protective effects of these compounds.Exploring an EHR database with a greater number of patients prescribed these statins in the future should allow for greater insights into any differences in mortality risk associated with these particular drugs.Simvastatin and atorvastatin were predicted to be active by NeMoCAD, while rosuvastatin was not.Since the in silico predictions were based on the in vitro LINCS database, we expected some differences to be present between the predictions and EHR outcomes.NeMoCAD's transcriptomics-based predictions do not take into account the physical properties (S6 Table and S5 Fig in S1 Text) or potential antiviral activity for a drug.Structurally, rosuvastatin has a pyrimidine side ring, which makes it unique from the other statins which have pyrrole, naphthalene, or other side ring structures [36].Interestingly, remdesivir and molnupiravir, two of the few drugs to date that have received emergency use authorization (EUA) by the FDA for treatment of COVID-19 [37,38], are nucleoside analogues-chemically synthesized analogs of pyrimidines and purines [39].Remdesivir, molnupiravir, and other nucleosides have been shown to directly block SARS-CoV-2 infection in vitro and/or in animal models [39][40][41][42][43].The antiviral effects of nucleoside analogues is believed to result from their incorporation specifically by viral polymerases leading to defects in viral replication or through their antimetabolite activity where they compete with cellular enzymes for their natural ligands [39].We did not assess rosuvastatin in our in vitro screens of viral inhibition; however, in work by Ahmed et al, their structure-based drug repositioning approach for drugs with potential inhibitory effects on COVID-19 virus predicted anti-viral drugs and rosuvastatin among their top six hits and demonstrated the inhibitory effects on SARS-CoV-2 replication by rosuvastatin and other drugs in VeroE6 cells [44].The physical properties of rosuvastatin, in particular its structural similarity to nucleoside analogues, may confer this statin with additional antiviral properties not had by the other statins that we assessed, and could explain why our EHR analysis found rosuvastatin as associated with a reduction in the relative risk of death in patients with COVID-19; whereas, NeMoCAD did not predict this drug.
Moreover, assessment of the LINCS data that defines the probability of each drug affecting specific genes combined with PCA also revealed closer clustering amongst simvastatin, lovastatin, and atorvastatin, which is consistent with their frequent co-prediction, whereas rosuvastatin is more distant (Fig 4A).Global analysis of the mortality-reducing statins also shows greater similarity in LINCS probability distribution between simvastatin and atorvastatin, whereas a wider probability distribution is observed for rosuvastatin (Fig 4B ), and a detailed comparison of the shared top gene targets (> 90 th percentile) between these three statins similarly revealed that the most genes are shared between simvastatin and atorvastatin (Fig 4C).Therefore, these two statins may act in a more similar manner than rosuvastatin, which could influence the COVID-19 response through alternative mechanisms (e.g., direct antiviral and/ or post-transcriptomic effects) that would not be detected using our transcriptomics-based computational approach.Additionally, various statins are typically dosed based on their cholesterol and lipid lowering properties, not based on their immune modulating properties; thus, a drug that appears to be most effective in silico for one of its "side-effect" properties might not appear to have a similar effect in vivo simply because of dosing decisions in practice.The discordance observed between drug predictions and EHR outcomes suggests that either technique alone is insufficient to identify drugs for clinical use and therefore the combination of methods is essential for narrowing the list of drug candidates before inclusion in randomized trials.
Collectively, our results suggest that statins exhibit divergent effects on the host response to SARS-CoV-2 infection despite a shared annotated target and common mechanism for treating dyslipidemia.In addition to their host-modulating effects, earlier work indicates that patients on statins seem to have improved outcomes following bacterial infection, an effect which is especially pronounced with respiratory tract infections, including pneumonia [45].However, meta-analysis of these studies reveals mixed results, again potentially suggesting that statins do not act uniformly as infection modulators.These protective effects of statins against infection may be due to their well-documented anti-inflammatory and immunomodulatory properties [46,47].Although originally developed to lower serum cholesterol, accumulating evidence suggests that statins have strong anti-inflammatory effects that contribute to their beneficial effects in patients experiencing vascular disease like atherosclerosis [46].Furthermore, statins may upregulate HO-1 [7], which is a central modulator of the immune system, affecting antiinflammation and anti-oxidation, which could prevent the severe "cytokine storm" inflammatory response that is central to morbidity and mortality in COVID-19 patients [48].By upregulating HO-1, statins, including simvastatin, lovastatin, atorvastatin, or rosuvastatin, also can increase the production of carbon monoxide and bilirubin [9], both of which have immunomodulatory, antioxidative, and anti-inflammatory characteristics.In addition, statins may reduce the likelihood of graft-versus-host disease by inhibiting antigen presentation and shifting pro-inflammatory responses toward anti-inflammatory responses [47].We do not know precisely why certain statins appear to have superior disease-modifying activity than others.Unlike databases such as DrugBank that outline the known mechanisms of action, NeMoCAD surveys all possible interactions within the transcriptome and thus more work is needed to understand the clinical targets.However, longitudinal studies of the transcriptome in COVID-19 patients prescribed diverse statins could help clarify the mechanisms involved in these responses.
Despite strong evidence of their host modulating properties, there are also preliminary indications that some statins have more direct antibacterial and antiviral capabilities.Our in vitro studies with simvastatin suggest that at least this statin type can exhibit direct antiviral activity against both SARS-CoV-2 and the common cold coronavirus OC43, independently of its known lipid-lowering action.Simvastatin has previously been shown to exhibit superior antibacterial effects compared to fluvastatin and pravastatin, including against S. pneumoniae and M. catarrhalis infections [45,49].In the case of bacterial infection, it has been suggested that the activity of simvastatin is linked to its hydrophobicity, which may perturb the bacterial cell membrane compared to the more hydrophilic fluvastatin and pravastatin [49].The hydrophobicity of statins is also one characteristic that can influence their target-binding characteristics and pharmacokinetic profile, potentially impacting how statins act on a multitude of organ systems and disease states [50].Critically, the pro-drug form of simvastatin, predicted computationally and tested in our in vitro assays, is rapidly metabolized to its active acid form in vivo and therefore only a fraction of the total dose is maintained as the original pro-drug.We also observed much weaker inhibition of SARS-CoV-2 infection when treating in vitro with the hydroxy acid metabolized form of simvastatin in Vero6 cells relative to treatment with the pro-drug form of simvastatin (S6 Fig in S1 Text).Therefore, future work should investigate simvastatin combined with an inhibitor of cytochrome P450, a hemeprotein that plays a key role in the metabolism of drugs.By limiting drug metabolism, it would be possible to better assess if the simvastatin pro-drug is responsible for anti-viral effects in vivo.
NeMoCAD predicted a diverse array of drugs from COVID-19 datasets as it works outside of the very limited "on target mechanisms" reported to the FDA or otherwise well understood.This enables us to capture potential off-target effects that may be present across many drugs; however, the identification of specific pathways and/or genes that are key for the effects of the various drugs predicted by NeMoCAD remains challenging.Many of the potential COVID-19 therapeutics identified using past computational drug repurposing strategies failed when tested either using in vitro culture models, animals, or in the clinic.In fact, very preliminary in vivo studies by our team showed no effect of simvastatin on infection nor host response in SARS-CoV-2-infected hamsters and mice, despite attaining plasma levels that were greater than the IC 50 for inhibition of infection and host response in vitro.These contrasting results highlight the challenges involved in conducting translational investigation from in silico predictions to in vitro, pre-clinical, and clinical studies, in particular raising questions about the clinical translation relevance of any disease model.Leveraging large patient databases early in the drug repurposing process to validate drugs predicted by computational approaches makes it possible to estimate how potential repurposed drugs may perform in infected patients and clinical sub-populations who regularly take these medications for other diseases or disorders.Importantly, in these databases we cannot completely rule out a possibility of unaccounted confounders correlated both with the drug and mortality, nor can we be confident that the medical histories of all patients are represented accurately as some information may be incomplete.Moreover, the retrospective nature of EHR analysis only allows us to identify an association between statin treatment and COVID-19 mortality, but not causal effects.Therefore, we envision that the process of generating in silico predictions and validating in databases will be a means to further narrow drug candidates and identify a more curated collection of therapeutics for testing in randomized control trials.Considering the continuing challenges with vaccine distribution and uptake, as well as the vulnerability of older populations to COVID-19, understanding non-obvious effects of approved drugs on patient mortality from infectious disease will be useful for combating this pandemic as well as ones that are likely to emerge in the future.
Finally, our findings suggest that drug repurposing efforts may require consideration of the many molecule-specific effects rather than taking reported drug targets and mechanisms at face value.Indeed, we believe that our approach, which counterpoints computational network-level analysis of biological interactions with in vitro exploratory screening and retrospective clinical evidence analysis, may form the basis of an altogether more powerful repurposing strategy in a pandemic scenario.

Fig 1 .
Fig 1. Combination approach for pathway-agnostic identification of compounds for drug repositioning.(a) Overview of the combination approach to drug repositioning.(b)The NeMoCAD gene network analysis tool is a drug repurposing algorithm that uses Bayesian statistical network analysis combined with data from publicly available datasets (e.g., LINCS, KEGG, CTD, TRRUST) for reference transcriptional signatures and to define regulatory network architecture.The algorithm identifies transcriptome-wide differential expression profiles between two biological states (e.g., healthy vs. diseased) in experimental or published transcriptomic datasets and defines the target normalization signature, i.e., the subset of genes that would need to reverse their expression to revert one state to the other.The output of NeMoCAD includes correlation and causation predictions for numerous chemical compounds and approved drugs in the LINCS database based on their ability to reverse the differential expression profile of interest.(c) Patient selection and analysis of electronic health records.(Figure panels a & b created with BioRender.com.).https://doi.org/10.1371/journal.pcbi.1011050.g001 2 and 4.0.5.Schematic in Fig 1 was made in Biorender.

Fig 2 .
Fig 2. Simvastatin is identified by gene network-based predictions as the most likely drug in its category to reverse COVID-19 transcriptional profiles.(a) Statins are predicted to shift the COVID-19 state to a healthy state, with simvastatin predicted for all datasets analyzed (14).Rosuvastatin was the only statin not predicted (n.p.) for any COVID-19 transcriptomics signatures.(b) 8 of 9 statins in the LINCS database were in the top 25% of drugs predicted for at least one dataset investigated.(c) Frequency of prediction for each statin when input datasets are stratified by sample source and tissue origin.Prediction frequency is normalized by the number of input datasets from each sample source and tissue origin.https://doi.org/10.1371/journal.pcbi.1011050.g002

Fig 3 .
Fig 3. Effects of statins on coronavirus infections in vitro.(a) Inhibitory and cytotoxicity parameters from SARS-CoV-2 infection of Vero6 wild-type or GFP-expressing cells for a subset of statins contained in the LINCS database.Mean parameters for each statin are derived from two independent experiments.(b) Dose-response curves demonstrating the ability for simvastatin to inhibit GFP-SARS-CoV-2 infection (MOI = 0.1) in a dose-dependent manner in Vero6 cells.(c) Simvastatin also inhibits the human coronavirus, OC43, in HUVEC cells when added at 1 or 5 μM concentrations (***p < 0.001, **p < 0.01).Error bars represent average ± SD; repeated in n = 2 independent biological experiments displayed in light (experiment 1) and dark (experiment 2) shaded data points.https://doi.org/10.1371/journal.pcbi.1011050.g003

Fig 4 .
Fig 4. Comparison of LINCS drug-gene interaction probability data for each statin.(a) PCA plots of drug-gene interaction data for each statin reveals clustering of simvastatin, atorvastatin, and lovastatin when assessed across all genes in this dataset.(b) Distribution of LINCS drug-gene probabilities for statins that reduce mortality in patients.(c) Simvastatin, atorvastatin, and rosuvastatin share top gene targets (> 90 th percentile), with the most shared between atorvastatin and simvastatin.https://doi.org/10.1371/journal.pcbi.1011050.g004 Detailed methodology for electronic health record analyses and in vitro experiments.S1 Fig. Density plots of the distribution of propensity scores for the patients treated with (A) atorvastatin, (B) lovastatin, (C) pravastatin, (D) rosuvastatin, and (E) simvastatin, and control groups before and after matching.S2 Fig. Absolute standardized differences in means for matched categorical and continuous variables between patients treated with (A) atorvastatin, (B) lovastatin, (C) pravastatin, (D) rosuvastatin, and (E) simvastatin, and control groups for all data (before matching) and matched data.S3 Fig. Dose-response curves demonstrating the effects of statins on SARS-CoV-2 infection (MOI = 0.1) in Vero6 cells.S4 Fig. Simvastatin added at 1 or 5 μM concentrations in HUVEC cells infected with human coronavirus, OC43, modulates IP-10 levels without significant cytotoxicity detected by Hoechst fluorescence (***p < 0.001, **p < 0.01, *p < 0.05).Error bars represent average ± SD; repeated in n = 2 independent biological experiments displayed in light (experiment 1) and dark (experiment 2) shaded data points.S5 Fig.Chemical structures of different statins.S6 Fig. Doseresponse curves demonstrating the differential effects of (A) simvastatin and (B) its active acid form, simvastatin hydroxy acid, on SARS-CoV-2 infection (MOI = 0.1) in Vero6 cells.S1 Table.Propensity score (PS) matching by demographics, encounter type at the time of the first recorded COVID-19 diagnosis, COVID-19 comorbidities, and prescription indications: mortality rates for atorvastatin-exposed and PS-matched unexposed patients, a relative risk [95% confidence interval] and Benjamini-Hochberg adjusted p-value in each iteration.S2 Table.Propensity score (PS) matching by demographics, encounter type at the time of the first recorded COVID-19 diagnosis, COVID-19 comorbidities, and prescription indications: mortality rates for lovastatin-exposed and PS-matched unexposed patients, a relative risk [95% confidence interval] and Benjamini-Hochberg adjusted p-value in each iteration.S3 Table.Propensity score (PS) matching by demographics, encounter type at the time of the first recorded COVID-19 diagnosis, COVID-19 comorbidities, and prescription indications: mortality rates for pravastatin-exposed and PS-matched unexposed patients, a relative risk [95% confidence interval] and Benjamini-Hochberg adjusted p-value in each iteration.S4 Table.Propensity score (PS) matching by demographics, encounter type at the time of the first recorded COVID-19 diagnosis, COVID-19 comorbidities, and prescription indications: mortality rates for rosuvastatin-exposed and PS-matched unexposed patients, a relative risk [95% confidence interval] and Benjamini-Hochberg adjusted p-value in each iteration.S5 Table.Propensity score (PS) matching by demographics, encounter type at the time of the first recorded COVID-19 diagnosis, COVID-19 comorbidities, and prescription indications: mortality rates for simvastatin-exposed and PS-matched unexposed patients, a relative risk [95% confidence interval] and Benjamini-Hochberg adjusted p-value in each iteration.S6 Table.Statin properties.Adapted from references [36,51].(DOCX) S1 Form.NeMoCAD EULA Form.If you are a user from an academic or non-profit research institution, please fill out this form to obtain access to NeMoCAD.(DOCX)

Table 1 . RNA-sequencing datasets used as inputs for network-based drug predictions. Dataset Source Public Identifier/ Link Sample Source Tissue/Cell Type Conditions assessed for predictions COVID-19 samples Control samples Drugs predicted Greninger 17
Data from the Cerner Real World Data COVID-19 deidentified EHR database containing records of 490,373 patients with a diagnosis of COVID-19 or COVID-19 exposure across 87 health care centers were analyzed.The following statins were included: atorvastatin, fluvastatin, lovastatin, pitavastatin, pravastatin, rosuvastatin, and simvastatin.Primary outcome was death after the onset of COVID-19.Inclusion criteria, considered comorbidities, and statistical analysis are detailed in the Methods in S1 Text.

Table 3 . Cohort characteristics before propensity score matching (PSM), reflecting differing percentages of characteristics (demographics) with standardized mean differences (SMD) for those prescribed a specific statin compared to the control cohort not treated with a statin. Characteristic No Statins Atorvastatin SMD Lovastatin SMD Pitavastatin SMD Pravastatin SMD Rosuvastatin SMD Simvastatin SMD
together, we found that clinical observations cannot always be explained by experimental biological studies, and vice versa.However, we demonstrate here that by looking at multiple lines of evidence in parallel, drug repurposing candidates are derisked across multiple biological levels to enable actionable repurposing interventions and potential new target discovery efforts.