Prediction of Cancer Drugs by Chemical-Chemical Interactions

Cancer, which is a leading cause of death worldwide, places a big burden on health-care system. In this study, an order-prediction model was built to predict a series of cancer drug indications based on chemical-chemical interactions. According to the confidence scores of their interactions, the order from the most likely cancer to the least one was obtained for each query drug. The 1st order prediction accuracy of the training dataset was 55.93%, evaluated by Jackknife test, while it was 55.56% and 59.09% on a validation test dataset and an independent test dataset, respectively. The proposed method outperformed a popular method based on molecular descriptors. Moreover, it was verified that some drugs were effective to the ‘wrong’ predicted indications, indicating that some ‘wrong’ drug indications were actually correct indications. Encouraged by the promising results, the method may become a useful tool to the prediction of drugs indications.


Introduction
Cancer is the main cause of death in both developed and developing countries [1]. In 2008 alone, there were 12.7 million new cancer cases and 7.6 million cancer deaths worldwide [1]. Meanwhile, the quantity of newly approved drugs diminished continually in spite of an increase of R&D investments [2]. R&D of a drug requires comprehensive experimental testing, which often costs millions of dollars, involves several thousand animals, and takes many years to complete. However, as a result, not many chemicals have undergone the degree of testing needed to support accurate health risk assessments or meet regulatory requirements for drug approval. Thus, it is very attractive to develop quick, reliable, and non-animal-involved prediction methods, e.g. using structure-activity relationships (SARs), to predict the anticancer activities of chemicals.
Some pioneer studies indicated that interactive proteins are more likely to share the same functions than non-interactive ones [3,4,5]. Likewise, interactive compounds are also more likely to share common properties [6,7,8]. STITCH (Search Tool for Interactions of Chemicals, http://stitch.embl.de/) is a well-known database containing the interactions information of proteins and chemicals [9,10]. It provides three categories of interactive compounds: (1) those participating in the same reactions; (2) those sharing similar structures or activities and (3) those with literature associations, such as binding the same target [9]. In this study, we attempted to build a prediction model of drug-indication by quantifying chemical-chemical interaction of every pair of interactive compounds. Briefly, drugs and their corresponding indications (i.e., 8 kinds of cancers) were extracted from KEGG (Kyoto Encyclopedia of Genes and Genomes, http://www. genome.jp/kegg/) [11], a well-known database dealing with genomes, enzymatic pathways, and biological chemicals, and Drugbank [12], another database containing detailed information of drugs and their target information. Then, the score of each indication of the query compound was obtained from the confidence scores of interactions between the query compound and its interactive compounds using the indications of the interactive compounds. And the order from the most likely indication to the least was obtained for each drug. Finally, the prediction quality of the model was evaluated by Jackknife test and some other parameters.
In addition to build an effective prediction model, another aim of our study is to investigate the drug repositioning ability of our model. Drug repositioning, i.e. finding novel uses of existing drugs, is an alternative strategy towards drug development because it has the potential to speed up the process of drug approvals. Several drugs, such as thalidomide, sildenafil, bupropion and fluoxetine, have been successfully repositioned to new indications [13,14]. Experimental approaches for drug repositioning usually employ high throughput screening (HTS) to test the libraries of drugs against biological targets of interest. More recently, several in silico models were developed to address the issues of drug repositioning. Iorio et al. predicted and validated new drug modes of action and drug repositioning from transcriptional responses [15]. Butte's group reported two successful examples of drug repositioning based on gene expression data from diseases and drugs [16,17]. Cheng et al. merged drug-based similarity inference (DBSI), targetbased similarity inference (TBSI) and network-based inference (NBI) methods for drug-target association and drug repositioning [18]. In our study, according to the assumption that interactive drugs are more likely to target the same indication, we investigated the repositioning possibility of some 'wrong' predicted drugs by retrieving references, and attempted to propose alternative indications for some drugs.

Materials
The information of 98 drugs that can treat cancers was retrieved from KEGG DISEASE in KEGG [11]. These drugs can treat the following 10 kinds of cancers: (1) Cancers of the nervous system (2) Cancers of the digestive system (3) Cancers of haematopoietic and lymphoid tissues (4) Cancers of the breast and female genital organs (5) Cancers of soft tissues and bone (6) Skin cancers (7) Cancers of the urinary system and male genital organs (8) Cancers of endocrine organs (9) Head and neck cancers (10) Cancers of the lung and pleura Since some drugs have no information of chemical-chemical interactions, we discarded these drugs, resulting in 68 drugs. After that, we found that 'Skin cancers' and 'Head and neck cancers' only contained 3 and 4 drugs, respectively. It is not sufficient to establish an effective prediction model with only a few samples, thus these two kinds of cancers were abandoned. As a result, 68 drugs were obtained, comprising the benchmark dataset S. These 68 drugs were classified into 8 categories in a way that drugs that can treat one kind of cancers comprised one category. The codes of the 68 drugs and their indications can be found in Table S1. The number of drugs in each category is listed in column 5 of Table 1. For convenience, we used tags C 1 , C 2 , . . . ,C 8 to represent each kind of cancers. Please see the column 1 and 2 of Table 1 for the corresponding of tags and cancers. It is observed from Table 1 that the sum of the number of drugs in each category is much larger than the different drugs in S, indicating that some drugs belong to more than one category, i.e. some drugs can treat more than one kind of cancers. In details, 50 drugs can treat only one kind of cancers, while 18 drugs can treat at least two kinds of cancers. Please refer to Figure 1 for a plot of the number of drugs against the number of cancers they can treat. Thus, it is a multi-label classification problem which needs to assign each drug to the aforementioned 8 categories in descending order. The classifier only providing one candidate cancer that a query drug can treat is not an optimal choice. Similar to the situation when dealing with proteins and compounds with multiple attributions [7,19], the proposed method also needs to provide a series of candidate cancers, ranging from the most likely cancer to the least likely one.
To better evaluate the proposed method, the benchmark dataset S was divided into one training dataset S tr and one validation test dataset S te , i.e. S = S tr <S te and S tr >S te = Ø, where drugs that can only treat exact one kind of cancer and half of drugs that can treat at least two kinds of cancers comprised S tr , while S te contained the rest drugs in S. The number of drugs in each category for S tr and S te is listed in column 3 and 4 of Table 1, respectively.
In addition, to test the generalization of the proposed method, we extracted 59 drug compounds from Drugbank [12], which are not in the benchmark dataset S. After excluding drug compounds without information of chemical-chemical interactions, 44 drugs were obtained, comprising the independent test dataset S ite . The number of drugs in each category of S ite is listed in column 6 of Table 1 and the detailed information of these drug compounds including their codes and indications can be found in Table S2.

Chemical-chemical Interactions
In recent years, the information of chemical-chemical interactions is penetrating into the prediction of various attributions of compounds [7,8,20]. The basic idea is that interactive compounds are more likely to share common functions than non-interactive ones. Compared with the information based on chemical structure, it includes other essential properties of compounds, such as compounds activities, reactions, and so on.
The information of interactive compounds was downloaded from STITCH (chemical_chemical.links.detailed.v3.1.tsv.gz) [9]. In the obtained file, each interaction consists of two compounds and five kinds of scores entitled 'Similarity', 'Experimental', 'Database', 'Textmining' and 'Combined_score'. In details, the first four kinds of scores are calculated based on the compound structures, activities, reactions, and co-occurrence in literature, respectively, while the last kind of score 'Combined_score' integrates the aforementioned four scores. Thus, it is used in this study to indicate the interactivity of two compounds, i.e. two compounds are interactive compounds if and only if the 'combined_score' of the interaction between them is greater than zero. In fact, the value of 'combined_score' also indicates the strength of the interaction, i.e. the likelihood of the interaction's occurrence. Thus, it is termed as confidence score in this study. For convenience, we denote the confidence score of the interaction between c 1 and c 2 by S(c 1 ,c 2 ). In particular, if c 1 and c 2 are noninteractive compounds, S(c 1 ,c 2 ) is set to zero.
112 drug compounds were investigated in this study as described in Section ''Materials'', and 1,393 chemical-chemical interactions whose confidence scores were greater than zero were obtained. Among the interactions which scores are greater than zero, 50 of them belonged to the label 'Similarity', 4 belonged to 'Experiment', 114 belonged to 'Database', and 1,352 belonged to 'Textmining'. It is necessary to point out that some drug interactions had two or more than two kinds of scores. As far as the quantity of chemical-chemical interactions is concerned, the tag 'Textmining' contributed most to the construction of the prediction method described in Section ''The method based on chemical-chemical interactions''.

Prediction Method
The method based on chemical-chemical interactions. Systems biology has been applied extensively into the predictions of properties of proteins and compounds and is deemed to be more efficient than some conventional methods [7,20,21,22]. In this study, we attempt to classify cancer drugs into the aforementioned 8 categories based on chemical interactions. Suppose there are n drugs in the training set S 0 , say d 1 ,d 2 , . . . ,d n . Cancers that d i can treat is represented as follows: where T is the transpose operator and For a query drug d q , which cancer it can treat can be determined by its interactive compounds in S 0 . To evaluate the likelihood that d q can treat cancer C j , we calculated a score as follows: j~1, 2, 3, 4, 5, 6, 7, 8 Larger score of P(d q [C j ) indicates that it is more likely the query drug can treat cancer C j . And P(d q [C j )~0 suggests that the probability that the query drug can treat cancer C j is zero, because there are no interactive compounds in S 0 that can treat cancer C j .  As mentioned in Section ''Materials'', predicting which cancers a drug can treat is a multi-label classification problem. A reliable classifier should provide not only the most likely cancer but also a series of candidate cancers, ranging from the most likely one to the least likely one. According to the results of Eq. 3, it is easy to arrange the candidate cancers using the decreasing order of the corresponding scores. For example, if the results of Eq. 3 are: it means that there are three candidate cancers of d q , where the most likely cancer it can treat is C 3 , followed by C 1 and C 5 . Furthermore, C 3 is called the 1 st order prediction, and C 1 is the 2 nd order prediction, and so forth.

The Method Based on Molecular Descriptors
To compare our method with other methods, the method based on molecular descriptors was constructed as follows. The structure optimization of each drug compound was performed using the AM1 semi-empirical method implemented in AMPAC 8.16 [23]. 454 descriptors including constitutional, topological, geometrical, electrostatic, and quantum-chemical descriptors were calculated by Codessa 2.7.2 [24]. To encode each drug compound effectively, the descriptors with missing values were discarded, resulting in 355 descriptors, i.e. each drug compound d can be represented by a 355-D (dimension) vector which can be formulated as follows: where T is the transpose operator. Accordingly, the relationship of two drugs d 1 and d 2 can be calculated by the following formula: Similar to the method based on chemical-chemical interactions, the score that a query drug d q can treat cancer C j can be calculated by the following formula: The rest procedure is the same as that of the method based on chemical-chemical interactions, which also provides a series of candidate cancers that d q can treat, ranging from the most likely one to the least one.

Validation and Evaluation
Jackknife test is one of the most popular methods for evaluating the performance of classifiers. During the test, each sample is singled out one-by-one and predicted by the classifier trained by the rest samples in the dataset. The test procedure is open, thereby avoiding arbitrary problem [7]. Therefore, the outcome obtained by Jackknife test is always unique for a given dataset. In view of this, many investigators have adopted it to evaluate the accuracies of their classifiers in recent years [25,26,27,28,29].
As described in Section ''Prediction method'', the methods in this study can provide a series of candidate cancers for a given query drug. The j-th order prediction accuracy is computed by the following formula [7,8]: where N is the total number of drugs in the dataset and h j is the number of drugs such that their j-th predictions are the true cancers that they can treat. It is obvious that ACC j measures the quality of the j-th order prediction. If the true cancers that a query drug can treat are positioned in low order, it is deemed as an optimal predicted result. Thus, high ACC j with low order number j and low ACC j with high order number j indicate a good performance of the classifier. ACC 1 is the most important indicator of the performance of the classifier.
To evaluate the methods more thoroughly, we calculated the prediction accuracy on cancer C j for the i-th order prediction as follows: where N j is the number of drugs that can treat cancer C j in the dataset and v i,j is the number of drugs such that its i-th order prediction is correctly predicted to treating cancer C j . In addition, another measurement was taken, which was adopted in some previous studies [6,7,8] and can be calculated as follows: where m represents the first m predictions that are taken into consideration, W i,m is the number of the correct predictions of the i-th drug compound among its first m predictions, n i is the number of cancers that the i-th drug compound can treat. It is easy to deduce that V m means the proportion of all true cancers that the samples in the dataset can treat covered by the first m predictions of each sample in it. It can be seen from Figure 1 that different drug compounds may have different numbers of cancers they can treat. In view of this, the parameter m in Eq. 10 usually takes the value of the smallest but no less than the average number of cancers that drug compounds in the dataset can treat. It can be computed by Generally speaking, higher V m suggests better performance of the method.

Results and Discussion
As described in Section ''Materials'', the benchmark dataset S was divided into a training dataset S tr and a validation test dataset S te , which contained 59 and 9 drugs, respectively. In addition, an independent test dataset S ite containing 44 drugs was constructed to test the generalization of the method. The predicted method introduced in Section ''The method based on chemical-chemical interactions'' was used to make prediction. The detailed predicted results are given as follows.
Performance of the Method Based on Chemical-chemical Interactions on the Training Dataset As for the 59 drugs in the training dataset S tr , the predictor was performed and evaluated by Jackknife test. Listed in column 2 of Table 2 are the 8 prediction accuracies calculated by Eq. 8, from which we can see that the 1 st order prediction accuracy was 55.93%, while the 2 nd order prediction accuracy was 22.73%. It is also observed from column 2 of Table 2 that the prediction accuracies generally followed a descending trend with the increase of the order number, indicating that the proposed method arranged the candidate cancers in the training dataset quite well. In details, for each order prediction, we calculated the accuracies of each kind of cancer according to Eq. 9, which were listed in row 2-9 of Table 3. It can be seen that most of the 0.00% accuracy occurred when the prediction order was high, indicating that for each kind of cancer, it was better predicted with lower order number of the predictions. The average number of cancers which drugs in S tr can treat was 1.31 (77/59), calculated by Eq. 11. It means that the average success rate would be only 16.38% if ones make prediction by random guesses, i.e. randomly assign a cancer indication to each sample, which is much lower than the 1 st order prediction accuracy obtained by our method. Because the average number of cancers a drug can treat is 1.31, the first 2 order predictions of each sample in S tr were taken to calculate the proportion of true cancers that samples in S tr can treat covered by these predictions according to Eq. 10, obtaining a ratio of 61.04%.

Performance of the Method Based on Chemical-chemical Interactions on the Validation Test Dataset
As for the 9 drugs in the validation test dataset S te , their candidate cancers were predicted by the method described in Section ''The method based on chemical-chemical interactions'' based on the information of the drugs in S tr . 8 prediction accuracies calculated by Eq. 8 were listed in column 3 of Table 2. It can be seen that the 1 st order prediction accuracy was 55.56%, while the 2 nd order one was 66.67%. It is also observed from Table 2 that the prediction accuracies of this dataset were generally higher than those of the training dataset, due to the fact that drugs in S te can treat two or more than two kinds of cancers, while most drugs in S tr can only treat one kind of cancers. Similarly, we calculated the accuracies of each kind of cancer for the 1 st , 2 nd , …, 8 th order prediction by Eq. 9. Row 10-17 of Table 3 listed them. The average number of cancers that drugs in S te can treat was 3.78 (34/9), indicating that if ones make prediction by random guesses, the average success rate would be 47.22%, which is significantly lower than the 1 st and 2 nd order accuracies listed in column 3 of Table 2. This suggests that the performance of the method on the validation test dataset is fairly good. Since the average number of cancers that drugs in S te can treat was 3.78, the first 4 order predictions of each sample in S te were considered. According to Eq. 10, 61.76% of true cancers were correctly predicted by the first 4 order predictions.

Performance of the Method Based on Chemical-chemical Interactions on the Independent Test Dataset
The candidate cancers of the 44 drugs in the independent test dataset S ite were also predicted by our predictor based on the drug information in S tr . 8 prediction accuracies were obtained and listed in column 4 of Table 2, from which we can see that the 1 st order prediction accuracy was 59.09%, while the 2 nd order prediction accuracy was 29.55%. To better evaluate the method, the prediction accuracies on each kind of cancer for the 8 order predictions were calculated by Eq. 9 and listed in row 18-25 in Table 3. The average number of cancers that drugs in S ite can treat was 1.32 (58/44), suggesting that if ones make prediction by random guesses, the average success rate would be 16.5%, which is much lower than the 1 st order prediction accuracy obtained by our method. Because the average number of drug indications was 1.32, the first 2 order prediction of each sample in S ite was considered. According to Eq. 10, 67.24% of true cancers were correctly predicted by the first 2 order predictions.

Comparison with other Methods
To indicate the effectiveness of our method for the prediction of drugs cancer indications, some other methods were built to make comparison.
The method based on molecular descriptors described in Section ''The method based on molecular descriptors'' was conducted on S tr with its performance evaluated by Jackknife test. The 8 prediction accuracies calculated by Eq. 8 were listed in column 2 of Table 4, from which we can see that the 1 st order prediction accuracy was 41.38%. It is much lower than the 1 st order prediction accuracy of 55.93% obtained by the method based on chemical-chemical interactions. Also, for drugs in S te and S ite , their cancer indications were predicted by molecular descriptors on S tr . The prediction accuracies were listed in column 3 and 4 in Table 4. In details, the 1 st order prediction accuracy on S te and S ite were 55.56% and 44.19%, respectively. Compared with the prediction accuracies of 55.56% on S te and 59.09% on S ite using chemical interactions, they performed at the same level on S te , and chemical interactions are much better than chemical descriptors on S ite . In addition, we considered the first 2-order, 4order and 2-order predictions on S tr , S te , and S ite due to the average number of cancers that drugs in these datasets can treat. The proportion of true cancers that samples in S tr , S te , and S ite can treat covered by these predictions were 51.39%, 58.82% and 49.12%, respectively, which were all lower than the corresponding proportions of 61.04%, 61.76% and 67.24%, respectively, obtained by the method based on chemical-chemical interactions. Therefore, the method based on chemical interactions was superior to the method based on molecular descriptors. As was described in the above three sections, the performance of our method was much better than that of the random guesses, which randomly assigned a cancer indication to a query drug. Here, another random guesses method was applied to evaluate our method from a different aspect. For any query drug d q , we randomly selected a drug compound in the training set, say d, and assigned true cancers that d can treat to d q , i.e. the predicted cancers of d q were same as the true cancers that d can treat. Since there is no order information in the predicted candidate cancers for each sample, the measures provided by Section ''Validation and evaluation'' cannot evaluate the performance of this method. Thus, Recall and Precision [30,31] were employed to evaluate its performance, which can be computed by.
where TP i is the number of correct predicted cancers for the i-th drug compound, R i represents the numbers of cancers which the ith drug compound can treat, P i represents the numbers of predicted cancers for the i-th drug compound, and N is total number of tested samples. The random guess method described in the above paragraph was conducted on S tr with its performance evaluated by Jackknife test. The Precision and Recall were 15.29% and 16.88%, respectively. For the predicted results on S tr by chemical-chemical interactions, the 1 st order prediction of each sample were picked, obtaining Precision of 55.93% and Recall of 42.86%, which were much higher than the random guess method.  It is easy to see that our method depend deeply on the confidence scores of chemical-chemical interactions. To test the importance of these scores, we randomly exchanged the confidence scores of some interactions. Based on the random permutations, the data were evaluated by Jackknife test on the training dataset S tr . The 1 st order prediction accuracy was 23.73%, while the other prediction accuracies of 2 nd , 3 rd ,…,8 th order prediction were 18.64%, 11.86%, 18.64%, 20.34%, 15.25%, 13.56%, 8.47%, respectively. It is observed that the 1 st order prediction accuracy obtained by random permutation was much lower than the 55.93% obtained by chemical interactions. Furthermore, the 8 prediction accuracies were not followed a descending trend with the increase of the order number, indicating that the candidate cancers were not arranged well. This implicates that confidence scores are very important to the predictions. Discussion 26 1 st order predictions were 'wrong' in the training dataset, that is, the predicted cancer indications of these drugs were not recorded in KEGG. These 26 drugs and their 1 st order predictions were available in Table S3. However, some references reported that 23 of these 26 drugs were actually effective to their 'wrong' indications, and it was the same with 3 of the 4 drugs in the validation test dataset (See Table S3 for the detailed 4 drugs and their 1 st order prediction) and 13 of the 18 drugs in the independent test dataset (See Table S3 for detailed 18 drugs and their 1 st order prediction). Thus, we hope that our prediction model can provide some information of drug repositioning. In the following paragraphs, we cited some references to support our predicted results.

Twenty-three Wrong Predicted Pairs of Drug and Indication in the Training Dataset
Cisplatin-Cancers of haematopoietic and lymphoid tissues. Cisplatin (KEGG ID: D00275), ''penicillin of cancer drugs'', is widely prescribed for many cancer treatments, such as testicular, ovarian, bladder, lung, stomach cancers, and lymphoma [32,33,34]. Prasad et al. investigated the effect of cisplatin on the Dalton's lymphoma, and concluded that cisplatin can induce complete regression of ascites Dalton's lymphoma in mice [35].

Ifosfamide-Cancers of haematopoietic and lymphoid
tissues. Ifosfamide (D00343) can be used to treat germ cell testicular cancer, cervical cancer, small cell lung cancer, non-Hodgkin's lymphoma, and so on [36]. Extranodal natural killer/ T-cell lymphoma, nasal type (ENKL) is Epstein-Barr virusassociated lymphoid malignancies, and patients with stage IV, relapsed or refractory ENKL have dismal prognoses. Yamaguchi et al. explored a new regimen SMILE, including the steroid dexamethasone, methotrexate, ifosfamide, L-asparaginase, and etoposide, and concluded that SMILE was effective for this kind of disease [37,38].

Lomustine-Cancers of haematopoietic and lymphoid
tissues. Lomustine (D00363) is a component of the combination chemotherapy for treating primary and metastatic brain tumors, and also used as a secondary therapy for refractory or relapsed Hodgkin's disease [39]. Moreover, previous studies reported that lomustine can be considered for the treatment of canine lymphoma in dogs [40,41,42,43], although it induced common but not life-threatening toxicity [44].
Mitotane-Cancers of the urinary system and male genital organs. Mitotane (D00420) is the first-line drug for metastatic adrenocortical carcinoma [45,46,47], and also used for the adjuvant therapy after removing the primary tumor [48].
However, mitotane treatment can induce some side effects, such as adrenal insufficiency and male hypogonadism [49].
Temozolomide-Cancers of haematopoietic and lymphoid tissues. Temozolomide (D06067) is an oral alkylating agent used for the treatment of anaplastic astrocytoma and glioblastoma multiforme [53]. Reni et al. reported that temozolomide was effective for immunocompetent patients with recurrent primary brain lymphoma, and its toxicity was negligible [54].
Thiotepa-Cancers of haematopoietic and lymphoid tissues. Thiotepa (D00583) is an alkylating agent to treat breast, ovarian, and bladder cancer [55]. A regimen of reducedintensity conditioning with thioteopa, fludarabine, and melphalan produced remissions and a limited transplant mortality rate in most multiple myeloma patients [56]. Moreover, Kolb et al. studied a phase II nonrandomized single-arm trial using TVTG regimen (topotecan, vinorelbine, thiotepa, dexamethasone, and gemcitabine) for relapsed or refractory leukemia, and reported 47% response rate of patients and acceptable toxicities [57].
Floxuridine-Cancers of the digestive system. Floxuridine (D04197) is used to treat hepatic metastases of gastrointestinal adenocarcinomas, and also used for palliation of cancers in the liver and gastrointestinal tract [58]. Moreover, hepatic arterial infusion (HAI) can significantly enhance the antitumor activity of floxuridine against colorectal liver metastases, as compared with systemic infusion [59].
Carboplatin-Cancers of haematopoietic and lymphoid tissues. Carboplatin (D01363) is approved with less side effects compared with its parent compound cisplatin in the clinical treatment, and mainly used to treat ovarian, lung, head cancers, and so on [34]. Through a phase II trial, Gopal et al. reported that GCD (gemcitabine, carboplatin, dexamethasone, and rituximab) was a safe and effective outpatient salvage regimen for relapsed lymphoma [60]. And Moskowitz et al. also reported that ICE regimen (ifosfamide, carboplatin, and etoposide) was effective for patients with non-Hodgkin's lymphoma [61].
Epirubicin-Cancers of haematopoietic and lymphoid tissues. Epirubicin (D02214) is a component of adjuvant therapy in patients after resection of the primary breast cancer [62]. When used to treat chronic lymphocytic leukaemia, the combination of fludarabine and epirubicin achieved a higher response rate and a more rapid response, as compared with fludarabine alone [63].

Gemcitabine-Cancers of haematopoietic and lymphoid
tissues. Gemcitabine (D01155) is a nucleoside analog that can treat breast, non-small cell lung, and pancreatic cancer [64]. Moreover, a regimen including gemcitabine, carboplatin, dexamethasone, and rituximab was reported to be effective for relapsed lymphoma [60].
Vinorelbine-Cancers of the breast and female genital organs. Vinorelbine (D01935) is used to treat non-small cell lung cancer [65]. Aapro et al. explored the effects of vinorelbine on metastatic breast cancer (MBC), and concluded that oral vinorelbine was highly effective and well tolerated for patients with MBC, no matter a single-agent or in combination with other agents [66]. Moreover, vinorelbine was also considered as a promising alternative for older patients with advanced breast cancers because of its clinical activity and low side effects [67].
Irinotecan-Cancers of the breast and female genital organs. Irinotecan (D01061) is used to treat metastatic colorectal cancer and extensive small cell lung cancer [68]. Previous studies reported that irinotecan was effective for the refractory metastatic breast cancer after anthracyclines or taxanes treatment [69,70]. Moreover, the combination of irinotecan and docetaxel also achieved a high response rate in pre-treated advanced breast cancer patients [71].
Capecitabine-Cancers of the breast and female genital organs. Capecitabine (D01223) is an oral agent used for the treatment of metastatic breast cancers, and toxicities are generally manageable [72,73,74].
Gefitinib-Cancers of the breast and female genital organs. Gefitinib (D01977) is used for the continued treatment of patients with locally advanced or metastatic non-small cell lung cancer after failure of either platinum-based or docetaxel chemotherapies [75]. Moreover, gefitinib is the first selective inhibitor of the epidermal growth factor receptor (EGFR) tyrosine kinase, which controls cell proliferation by activating the Ras signal transduction cascade [75]. Thus, gefitinib may be a promising agent used for the treatment of metaplastic breast carcinoma with frequent expresses of EGFR [76].
Sorafenib-Cancers of the lung and pleura. Sorafenib (D06272) is a multi-kinase inhibitor by targeting Raf/MEK/ER pathway, and approved for the treatment of advanced renal cell carcinoma and advanced hepatocellular carcinoma [77]. Blumenschein et al. reported that continuous treatment with sorafenib 400 mg twice daily helped disease stabilization of patients with advanced non-small-cell-lung cancer, which is associated with Raf/MEK/ER [78].
Paclitaxel-Cancers of the lung and pleura. Paclitaxel (D05333) is used for the treatment of Kaposi's sarcoma, lung cancer, ovarian cancer, and breast cancer [79]. Hensing et al. explored the effects of carboplatin and paclitaxel (C/P) on elderly patients with advanced non-small-cell-lung cancer, as compared with younger patients. The study indicated that the survival rates and quality-of-life of elderly and young groups are not different, so C/P should be a reasonable regimen for elderly patients with this kind of cancer [80].
Dacarbazine-Cancers of the breast and female genital organs. Dacarbazine (D00288) is used to treat metastatic malignant melanoma and Hodgkin's disease [81]. Moreover, the regimen including cisplatin, adriamycin, and dacarbazine was reported to be effective for patients with metastatic uterine and ovarian mixed mesodermal sarcomas [82].
Sunitinib-Cancers of the breast and female genital organs. Sunitinib (D06402) is an approved drug for the treatment of renal cell carcinoma and imatinib-resistant gastrointestinal stromal tumor [83]. Moreover, previous study reported that single-agent sunitinib achieved objective response rate of 11% in MBC [84], and the combination of sunitinib and paclitaxel was also well tolerated in patients with locally advanced or MBC [85].  [87].
Leucovorin-Cancers of the breast and female genital organs. Leucovorin (D01211) is used to treat osteosarcoma after high-dose methotrexate therapy [88]. Moreover, a phase II study showed that the regimen of weekly mitoxantrone, 5fluorouracil, and leucovorin (MFL) was well tolerated and moderately effective to treat MBC [89]. And a phase 3 trial of eniluracil +5-fluorouracil+leucovorin in MBC is also ongoing [90].
Goserelin-Cancers of the breast and female genital organs. Goserelin (D00573) is a luteinizing hormone blocker, and reduces the oestrogen level. Thus, goserelin can improve the long-term survival of premenopausal women with early breast cancer [91].
Fluorouracil-Cancers of haematopoietic and lymphoid tissues. Fluorouracil (5-FU, D00584) is used to treat multiple actinic and solar keratoses [92]. Takeno et al. reported that a case with advanced esophageal cancer accompanying multiple lymph node metastases was successfully treated by the combination of docetaxel, cisplatin, and fluorouracil [93].

Three Wrong Predicted Pairs of Drug and Indication in the Validation Test Dataset
Dactinomycin-Cancers of haematopoietic and lymphoid tissues. Dactinomycin (D00214) is an antineoplastic agent, which can treat Wilms' tumor and rhabdomyosarcoma [94]. However, it is reasonable to assume this compound for the treatment of cancers of lymphoid tissues because it induced the tumor regression of childhood lymphoma [95].

Mitomycin-Cancers of haematopoietic and lymphoid
tissues. Mitomycin (D00208) is an chemotherapy drug for treating cancers of lip, oral cavity, digestive organ, and so on [96]. Mitomycin treated a case with localized conjunctival mucosaassociated lymphoid tissue lymphoma, and had minimal local controllable side effects [97]. Moreover, mitomycin was about 5 times more potent than porfiromycin (methyl mitomycin) when inhibiting the tumor growth in the lymphoma L1210 [98], but M-83 (7-N-(p-hydroxyphenyl)mitomycin) showed significantly higher therapeutic activity than mitomycin in lymphoma EL4 [99].
Etoposide-Cancers of the breast and female genital organs. Etoposide (D04107) is used to treat refractory testicular tumors, small cell lung cancer, lymphoma, non-lymphocytic leukemia, glioblastoma multiforme, and so on [100]. Poplin et al. reported that oral etoposide had a modest activity for chemonaive patients with metastatic endometrial cancer, but the minimal toxicity of this drug made it possible for the combination chemotherapy [101]. Moreover, etoposide was reported to be one of the most effective agents for trophoblastic disease [102], and the combination of etoposide, ifosfamide/mesna, and cisplatin (VIP) appeared to be active in advanced cervical cancer [103].

Thirteen Wrong Predicted Pairs of Drug and Indication in the Independent Test Dataset
Diethylstilbestrol-Cancers of the breast and female genital organs. Diethylstilbestrol (DrugBank ID: DB00255) is used for the treatment of prostate cancer [104]. Moreover, Peethambaram et al. reported that diethylstilbestrol was more effective than tamoxifen in postmenopausal women with MBC, but this treatment was usually associated with toxicity such as nausea, edema, vaginal bleeding, and cardiac problems [105].
Bleomycin-Cancers of the nervous system. Bleomycin (DB00290) is a drug for the palliative treatment of malignant neoplasm, such as lung cancers and lymphomas [106]. Moreover, Takeuchi et al. reported that bleomycin was effective for the patients with gliomas, and the response rate was more than 50% [107]. And electrochemotherapy enhanced bleomycin uptake and achieved 69% complete elimination of glial cell derived tumor cells [108].
Bexarotene-Cancers of the lung and pleura. Bexarotene (DB00307) is used orally to treat skin manifestations of cutaneous T-cell lymphoma in patients after at least one prior systemic therapy [109]. Moreover, bexarotene was effective for preventing the growth and progression of lung tumor in mice [110], and the combination of bexarotene+paclitaxel or bexarotene+vinorelbine had significantly greater antitumor effects than the single agent [111].
Dexrazoxane-Cancers of haematopoietic and lymphoid tissues. Dexrazoxane (DB00380) can reduce the incidence and severity of cardiomyopathy associated with doxorubicin administration in women with MBC [112]. Moreover, dexrazoxane was used as a cardioprotective agent that can attenuate the QT and QTc dispersion associated with epirubicin-based chemotherapy in patients with aggressive non-Hodgkin lymphoma [113], and prevent or reduce cardiac injury associated with doxorubicin administration for childhood acute lymphoblastic leukemia [114,115].
Zoledronate-Cancers of the breast and female genital organs. Zoledronate (DB00399) is used for the treatment of patients with multiple myeloma and bone metastases from solid tumors when combining standard antitumor therapy [119]. Moreover, Steinman et al. reported that zoledronate increased disease-free survival in postmenopausal and in premenopausal, hormone-suppressed breast cancer patients, but had no antitumor effect for premenopausal patients without ovarian suppression [120].
Pemetrexed-Cancers of the digestive system. Pemetrexed (DB00642) is used as a single agent to treat locally advanced or metastatic NSCLC after a prior chemotherapy, and also used for the treatment of adults' malignant pleural mesothelioma in combination with cisplatin [121]. A phase II study reported that pemetrexed disodium was effective for patients with advanced gastric cancer, and the supplementation of folic acid decreased the toxicity with no compromise in efficacy [122].
Fluoxymesterone-Cancers of haematopoietic and lymphoid tissues. Fluoxymesterone (DB01185) is used for the palliative treatment of androgenresponsive recurrent mammary cancer in postmenopausal women with more than one year but less than five years [123]. Moreover, Bai et al. reported that fluoxymesterone stimulated the proliferation and differentiation of normal erythropoietic burst-forming units that are affected by inhibitory factors produced by leukemic cells [124]. Genistein-Cancers of the lung and pleura. Genistein (DB01645) is an experimental agent for the treatment of prostate cancer [125]. Moreover, Lian et al. reported that genistein may be a promising agent to treat NSCLC because genistein induced apoptosis of NSCLC cells by a p53-independent pathway [126].
Vorinostat-Cancers of the urinary system and male genital organs. Vorinostat (DB02546) is used to treat skin manifestations of cutaneous T-cell lymphoma patients with progressive, persistent or recurrent disease on or after two systemic therapies [127]. Pratap et al. reported that vorinostat inhibited tumor growth and associated osteolysis in the prostate cancer cells, but increased normal bone loss [128].
Ixabepilone-Cancers of the digestive system. Ixabepilone (DB04845) is investigated for the treatment of breast cancer, head and neck cancer, lung cancer, and so on [129]. Moreover, ixabepilone was reported to be active against advanced or metastatic gastric cancers [130,131].
Trabectedin-Cancers of the lung and pleura. Trabectedin (DB05109) is used to treat soft tissue sarcoma and ovarian cancer, and also investigated for the treatment of gastric cancer, and so on [132]. Moreover, Massuti et al. reported that trabectedin had modest activity in NSCLC patients pretreated with platinum [133].

Conclusions
In this study, an order-prediction model for drugs and their indications was built using the chemical-chemical interaction information extracted from STITCH. The outstanding performance of our model implicated that the model was feasible for drug-indication prediction, i.e. it was more likely that interactive chemicals would treat the same cancers than non-interactive ones. Moreover, it was demonstrated that most of the 'wrong' predictions might actually right, which may help reposition drugs to their new indications according to the prediction results.

Supporting Information
Table S1 List of 68 drugs retrieved from KEGG and cancers they can treat. (PDF)