Drug Repositioning for Diabetes Based on 'Omics' Data Mining

Drug repositioning has shorter developmental time, lower cost and less safety risk than traditional drug development process. The current study aims to repurpose marketed drugs and clinical candidates for new indications in diabetes treatment by mining clinical ‘omics’ data. We analyzed data from genome wide association studies (GWAS), proteomics and metabolomics studies and revealed a total of 992 proteins as potential anti-diabetic targets in human. Information on the drugs that target these 992 proteins was retrieved from the Therapeutic Target Database (TTD) and 108 of these proteins are drug targets with drug projects information. Research and preclinical drug targets were excluded and 35 of the 108 proteins were selected as druggable proteins. Among them, five proteins were known targets for treating diabetes. Based on the pathogenesis knowledge gathered from the OMIM and PubMed databases, 12 protein targets of 58 drugs were found to have a new indication for treating diabetes. CMap (connectivity map) was used to compare the gene expression patterns of cells treated by these 58 drugs and that of cells treated by known anti-diabetic drugs or diabetes risk causing compounds. As a result, 9 drugs were found to have the potential to treat diabetes. Among the 9 drugs, 4 drugs (diflunisal, nabumetone, niflumic acid and valdecoxib) targeting COX2 (prostaglandin G/H synthase 2) were repurposed for treating type 1 diabetes, and 2 drugs (phenoxybenzamine and idazoxan) targeting ADRA2A (Alpha-2A adrenergic receptor) had a new indication for treating type 2 diabetes. These findings indicated that ‘omics’ data mining based drug repositioning is a potentially powerful tool to discover novel anti-diabetic indications from marketed drugs and clinical candidates. Furthermore, the results of our study could be related to other disorders, such as Alzheimer’s disease.


Introduction
Diabetes mellitus is one of the most prevalent diseases in the world, affecting approximately 382 million people around the world in 2013, costing at least $548 billion in 2013 according to the international diabetes federation (IDF). Diabetic drug safety is a big concern during the development of new drugs. Avandia from GSK, for example, was found to be associated with risk of heart attack [1], resulting in a recommendation of suspension by European Medicines Agency (EMA) in 2010. Aleglitazar from Roche, a Peroxisome proliferator-activated receptor gamma (PPARG) agonist, was terminated in phase III clinical trial in 2013 due to safety concerns for bone fractures, heart failure and gastrointestinal bleeding. Among the current diabetic drug developmental pipelines in leading pharmaceutical companies, 24 drugs have survived the early stages of drug development (phase I, II clinical trials) and are now in phase III clinical trials or post-market surveillance. Among the 24 drugs, 17 (71%) are incretin analogs, DPP4-inhibitors or insulin analogs (S1 Table). However, the association between incretin therapy and risk of pancreatitis and cancer is still uncertain and under investigations by the FDA and EMA [2]. It has been long recognized that the traditional drug development process requires a lot of time (10-17 years) and is extremely costly, but has a low success rate (< 10%) and high safety risk. Therefore, novel strategies are needed for developing novel diabetic drugs in a more efficient way with lower safety risks.
Drug repositioning (or repurposing) has long been used in the drug development process by reusing marketed drugs and clinical candidates for a new indication (such as treating another disease) [3]. Compared to de novo drug discoveries, drug repositioning may tremendously reduce the development time to 3-12 years, cost and safety risks. For instance, most repositioned candidates have already been assessed by phase I or II clinical trials regarding their original indications [4]. Therefore, toxicity information in animals and humans is often available.
There are multiple approaches for drug repositioning. The "Disease Focus" approach, for example, employs experimental data related to diseases (e.g. 'omics' data) and knowledge of how drugs modulate phenotypes related to diseases (e.g. side effects). Several methods, such as expression pattern comparison [5] (connectivity map, CMap), text mining [6] and networks analysis [7], have been established for mining 'omics' data. Meanwhile, computational methods have been applied to predict drug-protein interactions [8], drug off-targets [9] and drug side effects [10]. Recently, scientists started to use data from genome wide association studies (GWAS) [11] and pathogenesis knowledge from the Online Mendelian Inheritance in Man (OMIM) database [12] to perform drug repositioning.
With the technological advancement in genomics, proteomics and metabolomics, biomedical data are quickly emerging and can be utilized as a valuable resource for drug repositioning. GWAS data has been successfully utilized for drug repositioning [11]. Proteomics, assessing the whole proteome in cells, tissues or body fluids, is involved in different stages of targetbased and phenotype-based drug discoveries, including target selection, target validation, lead selection/optimization and preclinical testing. Metabolomics plays an important role in translational medicine, preclinical research/biomarker discovery, and patient stratification [13]. Proteins are the most common targets of small compound drugs. Therefore, data from metabolomics and proteomics studies is a valuable resource for drug repositioning. However, no such effort has been made so far. The current study aims to systematically integrate GWAS, proteomics and metabolomics data for drug repositioning in diabetes treatment.

Literature search and data extraction
To obtain information on diabetes related genes, proteins and metabolites, we searched the PubMed database up to August 2014 using the keywords "diabetes and GWAS", "diabetes and proteomics", "diabetes and protein", "diabetes and metabolomics", "diabetes and metabolites". We included the literature in our study according to the following criteria: 1) all samples have to be human samples, such as serum, plasma or tissues; 2) the clinical phenotype has to be "type 1 diabetes" or "insulin dependent diabetes mellitus", "type 2 diabetes", "gestational diabetes", "impaired glucose tolerance", "impaired fasting glycemia" or "insulin resistance".
For the GWAS studies, we extracted information on 1) the genes associated with diabetes; 2) the SNPs; 3) patient ethnicity; 4) the phenotypes (diabetes type). For the proteomics studies, we extracted the following information: 1) the proteins associated with diabetes; 2) the direction of the change in protein level; 3) the methods used for measuring protein level; 4) the sample types; 5) the phenotypes. For the metabolomics studies, we extracted information on 1) the metabolites associated with diabetes; 2) the direction of the change in metabolite levels, 3) the sample types, 4) patient ethnicity, 5) the methods used for assessing the metabolites and 6) the phenotypes.

Mining diabetic metabolites related proteins
In vivo, enzymes and transporters are two groups of proteins directly associated with the turnover of human metabolites. By searching the Human Metabolome Database (HMDB, http:// www.hmdb.ca), we obtained the names of enzymes or transporters associated with diabetes related metabolites that were discovered from previous metabolomics studies.

Constructing the diabetic metabolites-proteins network
To visualize the association between diabetic metabolites and their corresponding enzymes or transporters, Cytoscape was used (www.cytoscape.org) to construct the metabolites-proteins network [14].

Mapping diabetes risk proteins to proteins with drug projects
Diabetes related genes or proteins retrieved from genomics and proteomics studies were combined with proteins related to diabetic metabolites retrieved from metabolomics data to generate a set of diabetic risk proteins. The Therapeutic Target Database (TTD version 4.3.02) contains information on 236 targets of 20667 drugs at the stages of approved, clinical trial and experimental. TTD was used to assess if those diabetes risk proteins have drug projects available [12]. Therefore, we selected diabetic risk proteins with drug projects to gather information on the 1) drug target, 2) current disease indication, 3) drug name, 4) drug development stage and 5) drug action mode. To focus on those most promising drugs to be repurposed in diabetes therapy, targets/drugs that are at the research or preclinical stages were excluded. Targets/ drugs at the approved stage or in clinical trials were included in the following studies.

Application of pathogenesis information into anti-diabetic drug repositioning
Most drugs are either antagonist or agonist, therefore the pathogenesis of target proteins is a key basis for predicting if the drug may improve or worsen the disease phenotype [12]. We employed the OMIM (http://www.omim.org) and a literature search (PubMed) to gather knowledge on the pathogenesis of the anti-diabetic targets. Specifically, the gain of function (GOF) and the loss of function (LOF) roles in human or animal models were gathered to select antidiabetes protein targets [15][16][17][18][19][20][21][22][23][24]. To take advantage of this strategy, we excluded those drugs with evidence of aggravating diabetic symptoms. For example, if drug D activates target T, and GOF of target T was known to increase diabetes risk, then drug D is more likely to cause diabetes instead of treating it.

CMap analysis
The Connectivity Map (CMap) is a collection of genome-wide transcriptional expression data from cultured human cells treated with compounds, and simple pattern-matching algorithms [5]. In the current study, the candidate drugs were input into CMap to evaluate if it is positively associated with known anti-diabetic drugs (e.g. metformin) or reversely associated with known diabetes risk compounds (e.g. streptozocin).

Omics studies revealed diabetes related genes, proteins and metabolites
By searching Pubmed, we included in the current study 16 GWAS papers, 17 proteomics studies and 18 metabolomics papers studying diabetes (Fig 1). We selected 115 genes, 56 proteins and 227 metabolites that were reported to be significantly associated with diabetes or impaired glucose metabolism in humans (Fig 1, S2-S4 Tables).

Visualization of metabolite-protein network associated with diabetes
The 227 diabetes associated metabolites revealed from the metabolomics studies were linked to 840 enzymes or transporters (1660 metabolite-protein pairs) based on the HMDB (The Human Metabolome Database, http://www.hmdb.ca). The metabolite-protein network was Flow-chart of drug repositioning by mining 'omics' data. We retrieved 17 GWAS studies, 18 proteomics studies and 19 metabolomics studies that assessed diabetes patients until August 2014. 115 genes, 56 proteins and 227 metabolites were significantly associated with diabetes. An HMDB search revealed 1660 metabolite-protein pairs corresponding to 840 proteins. Overall, 992 unique proteins associated with diabetes were gathered and mapped to the TTD database and 108 of them had drug projects information. After removing those under experimental and preclinical stages, we obtained 35 protein targets, including 5 known anti-diabetic targets (27 drugs projects) and 30 unknown anti-diabetic targets (167 drugs projects). Pathogenesis knowledge was retrieved from the OMIM and Pubmed databases, 12 targets corresponding to 58 drugs were indicated to have novel indication for diabetes treatment. CMap analysis indicated that 9 of the 58 drugs have the potential to treat diabetes. generated using Cytoscape (V3.1.1) (Fig 2) and shows the highly connected metabolic pathways of various metabolites.

Diabetes risk proteins mapping to drug projects
840 metabolic proteins associated with diabetic metabolites were combined with 115 genes and 56 proteins, leading to 992 unique diabetic risk proteins. Uniprot IDs were retrieved to map the corresponding proteins. A TTD database search showed that 108 of the 992 proteins have at least one drug project. To focus on the most promising candidates, we filtered out those drugs projects at the research or preclinical stage, because they only had information from in vitro or animal model experiments and had no toxicity information in humans. Therefore, we selected 35 druggable proteins for the following studies (Fig 1). 3.4 Five known and thirty unknown anti-diabetic drug targets discovered using the current repositioning strategy Five of the 35 proteins (14.3%) (alpha-2A adrenergic receptor, insulin, lysophosphatidic acid transferase, glucokinase and PPAR gamma) are known anti-diabetic drug targets of 22 drugs on the market or at clinical trials for diabetic therapeutics (S5 Table), indicating that the current reposition strategy works well and has the potential to reveal novel indications.
In addition, 30 of the 35 proteins with a current indication to treat other diseases may be repurposed to treat diabetes (S6 Table). They correspond to 167 drugs at the approved or clinical trial stages.

Pathogenesis knowledge leads to repositioning 12 targets for treating diabetes
The 'omics' results only suggest an association between proteins and risk of diabetes; it does not indicate the cause-effect mechanism. The drugs blocking or activating target proteins usually cause the loss or gain of function of the target. So, we cannot predict the outcome of the drugs without knowing the pathogenesis information of that specific target protein [12]. The current study used the OMIM database and a literature search of human or animal studies to gather knowledge of GOF or LOF for these 30 candidate targets. We excluded 3 targets (CD1a, 5HT 2B, DHOD) corresponding to 11 drugs projects, because they have no direct pathogenesis links to diabetes. We also excluded 6 targets corresponding to 14 drugs that were associated with diabetic complications. According to the drug action mode information from TTD, we excluded 102 drug projects corresponding to 14 targets, since they may aggravate the diabetic symptoms.
Finally, 58 unique drugs corresponding to 12 protein targets had pathogenesis information that supports their therapeutic potential for diabetes (Table 1). Interestingly, one target (MTNR1B) has previously been repurposed for diabetes treatment [25,26]. Another target (Alpha-2A adrenergic receptor) has one drug under phase II clinical trial for diabetes treatment (Yohimbine).

CMap results support 'omics data' based repositioned drugs
CMap was previously used successfully for drug repositioning by measuring the similarity in gene expression profiles between compounds in mammalian cell lines. The current study analyzed 58 drugs by CMap and assessed their association with known anti-diabetic drugs or diabetic risk compounds. We found that 9 of the 58 drugs (15.5%) have CMap information relating to anti-diabetic drugs or diabetic risk compounds (S7 Table), 11 of the 58 (19.0%) drugs have CMap data but lack links to anti-diabetic drugs or diabetic risk compounds, and 38 of the 58 (65.5%) drugs have no CMap information.

Discussion
Using 'omics' data mining and pathogenesis information, the current study repurposed 58 drugs for potential diabetes treatment. Gene expression profile comparison indicated 9 drugs with a higher potential in treating diabetes. Among these 9 drugs, diflunisal, nabumetone, niflumic acid and valdecoxib have a common target of prostaglandin G/H synthase 2 (COX2). COX2 converts arachidonate to prostaglandin H2 in vivo. Importantly, arachidonate was reported to be increased in the serum of type 2 diabetes and gestational diabetes patients [27,28]. Moreover, LOF of COX2 may increase insulin secretion, and GOF of COX2 may induce insulin dependent diabetes mellitus (IDDM) [18,19]. Similar to other inhibitors of COX2, diflunisal is currently used for pain treatment, and nabumetone, niflumic acid and valdecoxib are used for treating rheumatoid arthritis and osteoarthritis. Importantly, CMap analysis showed that mammalian cells treated by these 4 drugs have a similar gene expression pattern as cells treated by anti-diabetic drugs (glimepiride and metformin) or resveratrol (an activator of Sirt1 and PGC1a, previously shown to improve glucose metabolism) [29]. These evidences collectively indicate that COX2 could be a potential drug target for type 1 diabetes (also called IDDM) treatment. Therefore, the COX2 inhibitors are promising candidates for treating diabetes due to their ability to block prostaglandins (PGs) formation in monocytes and prevent antigen-presenting cell dysfunction, both of which could predispose a person to autoimmunity and IDDM (Fig 3). Interestingly, a Phase II clinical trial (ClinicalTrials.gov Identifier: NCT00506298) was conducted to assess the efficacy of CRx-401 (bezafibrate + diflunisal vs bezafibate + placebo) in lowering fasting plasma glucose levels in patients taking metformin, but the results were not revealed.
Among the 9 drugs, phenoxybenzamine and Idazoxan target ADRA2A (alpha-2A adrenergic receptor). A genetic variation of ADRA2A (risk A allele for rs553668) is known to cause overexpression of ADRA2A, which aggravates adrenergic suppression of insulin secretion and causes type 2 diabetes [30]. Therefore, ADRA2A inhibitors may be utilized to treat a subset of type 2 diabetes patients who carry the genetic risk variant in ADRA2A gene. In fact, one The other 3 promising drugs are diflorasone, d-cycloserine and perhexiline. Diflorasone inhibits phospholipase A2, a protein previously shown to generate arachidonic acid [27,28] and disrupt beta cell insulin stores [15]. Therefore, diflorasone has the potential to improve beta cell function. Glycine, a co-agonist of the NMDA receptor, was shown to be reduced in type 2 diabetes or gestational diabetes patients in 4 independent studies (S4 Table). Activation of NMDA receptors in the brain by d-cycloserine (NMDA receptor agonist) may have the potential to reduce glucose production and treat diabetes [20]. Perhexiline is an inhibitor of carnitine O-palmitoyltransferase I that converts carnitine (reduced in diabetes condition) to acyl-carnitine (increased in diabetes condition) during lipid beta oxidation. Perhexiline has the potential to improve insulin sensitivity and treat type 2 diabetes because inhibition of carnitine palmitoyltransferase-1 activity was reported to alleviate insulin resistance in diet-induced obese mice [22].
CMap results should be taken with cautious, since this technique has a "batch effect" [31]. For example, cells under the same culture conditions after different compound treatment may have highly similar expression patterns. A strategy of calculating Bridge Adjusted Expression Similarity (BAES) [32] to improve data quality may be used to minimize the batch effect in the future.
The current study integrates biomolecular information associated with diabetes from genomics, proteomics and metabolomics studies. A network of diabetic metabolites and proteins was generated to give an overview of how diabetes alters metabolites. This map may be used to identify the dysfunctional metabolic enzymes in diabetic patients. As diabetes is a metabolic disorder caused by both genetic and environmental factors, analyzing genome data together with protein/metabolite data could provide an in-depth understanding of diabetic etiology. In terms of the origins of the proposed 9 drugs (S7 Table), 7 were discovered from metabolomics studies and 2 were repurposed from GWAS studies, indicating that the GWAS and metabolomics results provided the most valuable dataset for anti-diabetic drug repositioning in the current study. Interestingly, one of the most successful compound anti-diabetic drugs, metformin, targets a metabolic enzyme (ACC2). The current study is the first to include metabolomics data into drug repositioning, which may assist in the identification of dysfunctional metabolic enzymes or transporters underlying the altered metabolic profiles.
Mapping diabetes risk proteins to drug projects is a critical step in drug repositioning. In the current study, the well-known public Therapeutic Target Database (version 4.3.02) containing information on 236 targets and 20667 drugs was used to perform the mapping. In future studies, other databases such as DrugBank (http://www.drugbank.ca) may also be used to obtain additional information for disease related proteins and to validate initial findings.
In summary, drug repositioning through mining 'omics' data provides a powerful tool to find novel indications for marketed drugs and clinical candidates of complex human diseases, such as diabetes. By analyzing GWAS, proteomic and metabolomic data in diabetes, mapping diabetes related proteins to drug projects (TTD), and inputting pathogenesis knowledge, we repurposed 58 drugs with a potential indication for diabetes treatment. Preclinical or clinical trials might be initiated to establish the efficacy of these repositioned drugs for the purpose of diabetes treatment. Furthermore, the results of our study could be related to other common disorders. For instance, there is convincing evidence of an increased risk of dementia in people with diabetes, including a strong link between type 2 diabetes and Alzheimer's disease [33]. Of note, an anti-diabetic drug (Liraglutide) has been used for a Phase II clinical trial for treating Alzheimer's disease (ClinicalTrials.gov Identifier: NCT01843075).
Supporting Information S1