Computational drug repositioning has been proved as an effective approach to develop new drug uses. However, currently existing strategies strongly rely on drug response gene signatures which scattered in separated or individual experimental data, and resulted in low efficient outputs. So, a fully drug response gene signatures database will be very helpful to these methods. We collected drug response microarray data and annotated related drug and targets information from public databases and scientific literature. By selecting top 500 up-regulated and down-regulated genes as drug signatures, we manually established the DrugSig database. Currently DrugSig contains more than 1300 drugs, 7000 microarray and 800 targets. Moreover, we developed the signature based and target based functions to aid drug repositioning. The constructed database can serve as a resource to quicken computational drug repositioning. Database URL: http://biotechlab.fudan.edu.cn/database/drugsig/.
Citation: Wu H, Huang J, Zhong Y, Huang Q (2017) DrugSig: A resource for computational drug repositioning utilizing gene expression signatures. PLoS ONE 12(5): e0177743. https://doi.org/10.1371/journal.pone.0177743
Editor: Ferdinando Di Cunto, Universita degli Studi di Torino, ITALY
Received: February 5, 2017; Accepted: May 2, 2017; Published: May 31, 2017
Copyright: © 2017 Wu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This study was supported by the Major scientific and technological specialized project of China for ‘significant new formulation of new drugs’ funded by National health and family planning commission of the people's republic of china [Grant 2013ZX09102057, http://www.moh.gov.cn, QH]. There was no additional external funding received for this study. Hongyu Wu is affiliated Shanghai High-Tech United Bio-Technological R&D Co. Ltd (SHUB) as a paid employee and works for projects on State Key Laboratory of Genetic Engineering (SKLGE) at Fudan University as a PHD student. Qingshan Huang is affiliated Fudan University and as a paid consultant at SHUB. Shanghai High-Tech United Bio-Technological R&D Co. Ltd (SHUB) provided support in the form of salaries for authors QH and HW, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.
Competing interests: Hongyu Wu is affiliated Shanghai High-Tech United Bio-Technological R&D Co. Ltd (SHUB) as a paid employee and works for projects on State Key Laboratory of Genetic Engineering (SKLGE) at Fudan University as a PHD student. Jinjiang Huang currently major in Microbiology as a PhD student at SKLGE. Qingshan Huang is affiliated Fudan University and as a paid consultant at SHUB. Dr Jinjiang Huang and Prof. Yang Zhong declare no potential conflict of interest. This does not alter our adherence to all the PLOS ONE policies on sharing data and materials.
Over the past decades, to develop a de novo drug often takes billions of dollars and about 9–12 years . New drug discovery has grown to be time-consuming and costly. This directly resulted in small quantity and high price of new drugs on the market. Drug repositioning, by exploring new clinical indications for those existing drugs has become an increasingly important strategy for drug development resulted from their proved drug safety and the abridged process of drug discovery and preparation [1–9]. However, traditional drug repurposing is mostly through serendipity or explored from a better understanding of the drugs’ mechanism of action. The efficacy of these methods is very low. When the drug-related and genome-wide data initiatives grew quickly, the mode for computational drug repositioning has been changed.
By integrating data from various sources, like pharmacological, genetic, chemical or clinical data, a set of new computational repositioning strategies and techniques has emerged [3,10,11]. Especially, the Connectivity Map (CMap) [12,13] project which produced large-scale drug response gene expression profiles lead to the establishment and development of methods of ‘gulit-by-association’ and ‘signature reversion‘ for computational drug repurposing . With these methods, Sirota et al. and Dudley et al. had found that an antiulcer drug and an antiepileptic drug can be reused for lung cancer and inflammatory bowel disease by comparing each of these disease signatures to each of the gene expression signatures for 164 drugs from CMap [15,16]. Obviously, the quantity and quality of drug response gene signatures is the core for these computational approaches. But these data still scattered in separated or individual experimental data, it brought about low efficient outputs. So, a database archiving enough drug response gene signatures will be very helpful to computational drug repurposing.
Based on above observation, we collected most of drug response microarray data from GEO or scattered in the separated database to develop the DrugSig database. Moreover, we manually inspected targets information for each drug extracted from microarray data and archived them into DrugSig. Finally, we implemented two functions for repositioning old drugs using signature or target based drug repositioning method respectively. The constructed database will serve as a resource to quicken computational drug repositioning.
Results and discussion
DrugSig was created as a resource for computational drug repositioning utilizing gene expression signatures. As a web based database, DrugSig provides a user-friendly web interface for users to easily query and retrieve information on drug signatures. All the data in DrugSig can be accessed and retrieved directly from the web browser. Fig 1 describes the schema of the creation of DrugSig. All raw data were manually collected from literate and public databases. We processed these microarrays, drugs, targets and literate data into seven Mysql tables, such as drugs, instances, drugsig, platform, targets, drug_target and papers table. On the basis of these data, we developed tools for signature based and target based computational drug repurposing functions and recorded the computational history into uses table.
DrugSig web interface
A concise navigational interface comprised of the Home, Browse, Search, Tools and Guide options was designed to generate a clearly structured database layout that enables fast and easy navigation (Fig 2A). The Browse interface allows users to navigate all drugs included in DrugSig. The current database is composed of more than 1300 drugs. A click of each drug will display a results page with four sections: drug summary, drug signature, drug targets and links. (Fig 2B). Drug summary section consists of drug name, chemical name, formula, CAS no, description and drug indications. In addition, it also provides a link to DrugBank  for further investigation. Drug signature section demonstrates its common signatures which are comprised of top 50 up-regulated and down-regulated genes and its data source (list all related microarray). For each microarray, there is a page to display its signatures. The drug targets section consists of the drug targets and their expression value in cells treated by the drug and other drugs. The prior expression level reflects the expression of the target gene response to the drug while the latter expression level shows the potential of other drugs which inhibit or stimulate the target. The Search interface can be used to retrieve specific information using either a quick or advanced option (Fig 2C). A quick search only allows keywords field, while the advanced search accepts the specification of up to six separate fields: Drug Id, Drug Name, DrugBank Id, Disease, Target, and Signature Symbol. The user can query the database by either one particular condition or a combination of various conditions. The prior five fields search produces a results page which list all drugs meet the specified conditions. The final field search produces a list of the gene expression of specified signature symbol if its expression level lies in top 500 up-regulated or down-regulated genes. The Tools interface implements the signature based and target based drug repositioning functions (see ‘Drug repurposing tools implemented in DrugSig’ section below for further details). The Guide interface provides detail instructions to potential users on how to use DrugSig.
Drug repurposing tools implemented in DrugSig
Tools of drug repurposing implemented in DrugSig consist of signature based and target based drug repositioning functions. The signature based drug repositioning function provides an interface to input user’s gene list to compute against DrugSig (Fig 3A). After submitting the gene list to DrugSig, user can click the start computing button to compute the scores which is the ratio of the number of common genes between user’s gene list and each gene signature to the number of user’s genes (Fig 3B). Once the computing finished, DrugSig will sent a notice email to user. The results can be accessed later from the email or by searching the task history with user’s email address. The computing result contains queried gene list, top 50 score drugs produced reverse gene list and top 50 score drugs produced similar gene list (Fig 4A). The reverse and similar drugs infer to potential indications. Each drug produces a page display the reverse gene list and similar gene list for further investigation (Fig 4B).
(A) The input interface. (B) The computing interface.
(A) The drug list for signature based drug repositioning. (B) The gene list for each calculated drug.
The target based drug repositioning function provides an interface to explore the specified target (Fig 5A) and its targeting drugs, as well as target gene expression level in cells after treated by drugs (Fig 5B). The targeting drugs may have similar indications for drug repurposing. The expression level of target gene partly infers the potential of the drug which either inhibits or enhances the target.
The purpose of establishing the DrugSig database is to aid drug repurposing. DrugSig server as not only a tool to reposition old drugs with user’s input but also an open resource for users to develop new computational approaches for drug repositioning. The current DrugSig contains more than 1300 drugs, 7000 microarray and 800 targets. DrugSig is different from the existing webservers or databases for drug repositioning. Although several drug repositioning related webservers or databases exist such as CMap, PREDICT , PROMISCUOUS , INDI  and Mantra , each has certain shortcomings, such as covering a limited collection of drug response microarray or only containing a computational framework. These shortcomings limit the accurate and scope of computational drug repurposing also increase the difficulties in using these data for scientists. Although main data in DrugSig had also been collected from CMap, DrugSig covered pertinent data from other experiments. Moreover, with pertinent experiments growth, DrugSig will contain more and more drugs and signatures.
Moreover, many projects focus on precision medicine will disclose insights between disease and genes. Recently, Rubio-Perez et al. developed an in silico drug prescription strategy based on driver alterations in each tumor and their druggability options and use it to identify druggable targets and promising repurposing opportunities . When applied these insights on DrugSig, we can promptly verify these druggable targets and promising repurposing opportunities. So we constructed a gene list from TCGA Esophageal carcinoma (ESCA) data and submitted it to compute against DrugSig. Results showed that the top compounds predicted to be therapeutic for ESCA were acetysalicylic acid, an anti-inflammatory drug had been reported to treat ESCA, and dizociline, an antiepileptic drug not previously described to have efficacy for ESCA. Related validation work is in progress.
Limitations and future prospects
Currently DrugSig holds only 1300 drugs and 6000 plus signatures. Moreover, the functions and methods implemented in database are limited. In the future, we plan to updates the data continuously per half a year, and integrate some gene function analysis tools and other computational drug repurposing approaches into DrugSig to improve its interactivity with users and to increase functions to aid computational drug repositioning. In addition, we plan to develop open services convenient for researchers to get gene signatures applied to develop new computational approaches for drug repositioning.
DugSig is a web accessible database for computational drug repurposing studies. The current version of DrugSig includes more than 1300 drugs, 6000 plus signatures and 800 available targets (till Jan, 2017). The database can be queried either by simply using keywords or by combinatorial conditions searches. DugSig will not only aid in expanding our current understanding of drugs and their mechanisms of action but may have implications in the development of new indications for existed drugs. DugSig now is available at http://biotechlab.fudan.edu.cn/database/drugsig/.
Data acquisition and storage
The microarray data in DrugSig were obtained from the GEO  databases or individual scientific researches. The steps of the curation of DrugSig contained collecting, processing and computing (Fig 1). We first searched the scientific literature which contains drug response microarray experiments form PubMed using keywords like “human cell AND treatment AND (‘gene signature’ OR ‘expression profile’) AND (genechip OR microarray OR ‘gene expression’) AND English [la]” and collected available drug response microarray data from GEO database or special sources described in scientific literature. Finally, we obtained more than 7000 microarray raw data. We then read the data via RMA method of affy package in BioConductor  and constructed the drug induced signatures using two approaches depending on the quantity of raw data. When the replicates < 3 in raw data we computed the drug signatures by simple fold changes (FC > 2.0 or FC < 0.5) and when the replicates > = 3 we computed the drug signatures by Limma package of BioConductor program (FC > 2.0 or FC < 0.5 and P value < 0.01) which implemented the linear models to calculate the differently expression genes. In addition, if the number of calculated differential expressed probes < 500, we selected all differential expressed probes as signatures. After abstracted more than 1300 drugs from the related experiments, we investigated the drug related information from several public databases such as DrugBank [17,25], KEGG , CTD  and TTD . Finally, 800 plus available targets were constructed according to descriptions from the literature. All of the collected information and computed data had been classified and filled into seven relational tables in MySQL. Moreover, we constructed the up list and down list file from DrugSig table and used them to implement signature based drug repurposing.
Database architecture and web interface
We would like to thank all of our colleagues at School of Life Sciences, Fudan University and at Shanghai High-Tech United Bio-Technological R&D Co., Ltd., of China for their contributions in the literature search and discussions regarding this manuscript.
- Conceptualization: HW QH.
- Data curation: HW JH.
- Formal analysis: HW.
- Funding acquisition: QH.
- Investigation: HW JH.
- Methodology: HW JH.
- Resources: YZ QH.
- Software: HW.
- Supervision: YZ QH.
- Writing – original draft: HW JH.
- Writing – review & editing: HW YZ QH.
- 1. Ashburn TT, Thor KB. (2004) Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov 3: 673–683. pmid:15286734
- 2. Li J, Zheng S, Chen B, Butte AJ, Swamidass SJ, Lu Z. (2015) A survey of current trends in computational drug repositioning. Brief Bioinform 17: 2–12. pmid:25832646
- 3. Strittmatter SM. (2014) Overcoming Drug Development Bottlenecks With Repurposing: Old drugs learn new tricks. Nature medicine 20: 590–591. pmid:24901567
- 4. Jahchan NS, Dudley JT, Mazur PK, Flores N, Yang D, Palmerton A, et al. (2013) A drug repositioning approach identifies tricyclic antidepressants as inhibitors of small cell lung cancer and other neuroendocrine tumors. Cancer Discov 3:1364–1377. pmid:24078773
- 5. Corbett A, Williams G, Ballard C (2013) Drug repositioning: an opportunity to develop novel treatments for Alzheimer's disease. Pharmaceuticals (Basel) 6: 1304–1321.
- 6. Corbett A, Pickett J, Burns A, Corcoran J, Dunnett SB, Edison P, et al. (2012) Drug repositioning for Alzheimer's disease. Nat Rev Drug Discov 11: 833–846. pmid:23123941
- 7. Sardana D, Zhu C, Zhang M, Gudivada RC, Yang L, Jegga AG. (2011) Drug repositioning for orphan diseases. Brief Bioinform 12: 346–356. pmid:21504985
- 8. Harrison C (2011) Signatures for drug repositioning. Nat Rev Genet 12: 668.
- 9. Chong CR, Sullivan DJ Jr. (2007) New uses for old drugs. Nature 448: 645–646. pmid:17687303
- 10. Rastegar-Mojarad M, Ye Z, Kolesar JM, Hebbring SJ, Lin SM (2015) Opportunities for drug repositioning from phenome-wide association studies. Nature biotechnology 33: 342–345. pmid:25850054
- 11. Wang Z-Y, Zhang H-Y (2013) Rational drug repositioning by medical genetics. Nature biotechnology 31: 1080–1082. pmid:24316641
- 12. Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, et al. (2006) The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313: 1929–1935. pmid:17008526
- 13. Lamb J (2006) The Connectivity Map: a new tool for biomedical research. Nature reviews cancer 7: 54–60.
- 14. Iorio F, Rittman T, Ge H, Menden M, Saez-Rodriguez J (2013) Transcriptional data: a new gateway to drug repositioning? Drug Discov Today 18: 350–357. pmid:22897878
- 15. Sirota M, Dudley JT, Kim J, Chiang AP, Morgan AA, Sweet-Cordero A, et al. (2011) Discovery and preclinical validation of drug indications using compendia of public gene expression data. Sci Transl Med 3: 96ra77. pmid:21849665
- 16. Dudley JT, Sirota M, Shenoy M, Pai RK, Roedder S, Chiang AP, et al. (2011) Computational repositioning of the anticonvulsant topiramate for inflammatory bowel disease. Sci Transl Med 3: 96ra76. pmid:21849664
- 17. Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, et al. (2014) DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res 42: D1091–1097. pmid:24203711
- 18. Gottlieb A, Stein GY, Ruppin E, Sharan R (2011) PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol Syst Biol 7: 496–504. pmid:21654673
- 19. von Eichborn J, Murgueitio MS, Dunkel M, Koerner S, Bourne PE, Preissner R. (2011) PROMISCUOUS: a database for network-based drug-repositioning. Nucleic Acids Res 39: D1060–1066. pmid:21071407
- 20. Gottlieb A, Stein GY, Oron Y, Ruppin E, Sharan R (2012) INDI: a computational framework for inferring drug interactions and their associated recommendations. Mol Syst Biol 8: 592–603. pmid:22806140
- 21. Carrella D, Napolitano F, Rispoli R, Miglietta M, Carissimo A, Cutillo L, et al. (2014) Mantra 2.0: an online collaborative resource for drug mode of action and repurposing by network analysis. Bioinformatics 30: 1787–1788. pmid:24558125
- 22. Rubio-Perez C, Tamborero D, Schroeder MP, Antolín AA, Deu-Pons J, Perez-Llamas C, et al. (2015) In silico prescription of anticancer drugs to cohorts of 28 tumor types reveals targeting opportunities. Cancer cell 27: 382–396. pmid:25759023
- 23. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. (2013) NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 41: D991–995. pmid:23193258
- 24. Reimers M, Carey VJ (2006) Bioconductor: an open source framework for bioinformatics and computational biology. Methods Enzymol 411: 119–134. pmid:16939789
- 25. Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, et al. (2011) DrugBank 3.0: a comprehensive resource for 'omics' research on drugs. Nucleic Acids Res 39: D1035–1041. pmid:21059682
- 26. Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res 32: D277–280. pmid:14681412
- 27. Davis AP, Grondin CJ, Lennon-Hopkins K, Saraceni-Richards C, Sciaky D, King BL, et al. (2015) The Comparative Toxicogenomics Database's 10th year anniversary: update 2015. Nucleic Acids Res 43: D914–920. pmid:25326323
- 28. Yang H, Qin C, Li YH, Tao L, Zhou J, Yu CY, et al. (2016) Therapeutic target database update 2016: enriched resource for bench to clinical drug target and targeted pathway information. Nucleic Acids Res 44: D1069–1074. pmid:26578601