Functional Module Connectivity Map (FMCM): A Framework for Searching Repurposed Drug Compounds for Systems Treatment of Cancer and an Application to Colorectal Adenocarcinoma

Drug repurposing has become an increasingly attractive approach to drug development owing to the ever-growing cost of new drug discovery and frequent withdrawal of successful drugs caused by side effect issues. Here, we devised Functional Module Connectivity Map (FMCM) for the discovery of repurposed drug compounds for systems treatment of complex diseases, and applied it to colorectal adenocarcinoma. FMCM used multiple functional gene modules to query the Connectivity Map (CMap). The functional modules were built around hub genes identified, through a gene selection by trend-of-disease-progression (GSToP) procedure, from condition-specific gene-gene interaction networks constructed from sets of cohort gene expression microarrays. The candidate drug compounds were restricted to drugs exhibiting predicted minimal intracellular harmful side effects. We tested FMCM against the common practice of selecting drugs using a genomic signature represented by a single set of individual genes to query CMap (IGCM), and found FMCM to have higher robustness, accuracy, specificity, and reproducibility in identifying known anti-cancer agents. Among the 46 drug candidates selected by FMCM for colorectal adenocarcinoma treatment, 65% had literature support for association with anti-cancer activities, and 60% of the drugs predicted to have harmful effects on cancer had been reported to be associated with carcinogens/immune suppressors. Compounds were formed from the selected drug candidates where in each compound the component drugs collectively were beneficial to all the functional modules while no single component drug was harmful to any of the modules. In cell viability tests, we identified four candidate drugs: GW-8510, etacrynic acid, ginkgolide A, and 6-azathymine, as having high inhibitory activities against cancer cells. Through microarray experiments we confirmed the novel functional links predicted for three candidate drugs: phenoxybenzamine (broad effects), GW-8510 (cell cycle), and imipenem (immune system). We believe FMCM can be usefully applied to repurposed drug discovery for systems treatment of other types of cancer and other complex diseases.


Introduction
An important goal for biomedical research is to understand the underling genetic mechanisms of human diseases and discover therapeutic drugs for the diseases. Drug discovery is expensive; the average research and development (R&D) cost in the past 15 years for developing a new drug exceeds 1 billion US dollars [1]. Anticancer agents are especially costly [2]. The standard R&D procedure includes compound identification, toxicity tests in cell and animal models, safety evaluation on early clinical trials, and efficacy in late phase trials. The very high failure rate has led to a crisis known as innovation gap in new drug discovery [3]. The crisis is further compounded by the withdrawal of many previously thought successful drugs, mostly through issues related to harmful side effects [4][5][6]. Such issues may be a corollary of the prevailing method for new drug discovery, which is to find specific biomolecules as targets, typically membrane receptors [7]. However, a biological target may regulate more than one biological pathway, only one of which may be disease related. If this is the case, then altering the function of a biological target by a drug can lead to unintended results of disruption of healthy pathways [8].
The strategy of finding specific biological targets for drug treatment may have also contributed to the disappointing progress made in the last 40 years in reducing the overall mortality rates for most types of cancer [9]. This is partly because cancer is a disease involving the dysfunction of multiple parallel pathways controlling many fundamental processes [10]. Cancer cells accumulate multiple genetic mutations that equip it with a of myriad of survival and death-avoiding capabilities: for inducing angiogenesis; to maintain proliferative signalling; to escape suicidal apoptotic programs; for enabling replicative immortality; and to activate invasion and metastasis [11]. Evidences are emerging that the pathology of cancer is more a consequence of small abnormalities on many genes, than a major abnormality on a single gene [12,13], and that drug compounds acting on multiple targets may be a more effective treatment strategy than a single drug on a single target [14]. In short, cancer is a systems disease and ought to be dealt with by a systems treatment [15].
Here, we present a computational drug-screening procedure that addresses the issues raised above. Our program has two main aims: to surmount the innovation gap through drug repurposing, and to find drug compounds for a systems treatment of cancer. Drug repurposing is the search for novel indications for already approved drugs [1,16]. Because an approved dug has already been optimized for safety and efficacy for its originally designed indication, its route for approval for a novel indication may be significantly shorter and likely far less costly than that for a new drug.
Recently the computational screening drugs for repurposing has been greatly facilitated by the advent of the Connectivity Map (CMap), a comprehensive and continuously updated database of the genomic profiles of many existing drugs [17]. CMap provides a platform for utilizing a pattern-matching strategy to determine the similarity, or the opposite, in genomic signatures among diseases, functional gene sets, and drugs. It has been employed in many studies for discovering repurposing drugs against common diseases, including diabetes and Alzheimer's disease [17], and for treating solid tumours, including those associated with colon cancer [18], breast cancer [19], and lung adenocarcinoma [20].
The basic concept of CMap-based repurposing drugs discovery studies is the identification of disease associated genomic signatures that reversely correlate with perturbation in genomic signature associated with the administration of molecules or drugs [17,21,22]. In these studies, the essence of the protocol -the individual-gene CMap approach (IGMP) -for identifying drugs for treating a specific disease is straightforward: find a set of differentially expression genes (DEGs) obtained by, say, comparing two sets -controls and patients -of gene expression microarrays, score the match between the DEG set and genomic profiles of drugs given by CMap, and rank the drugs by score. Candidate drugs are those with the highest scores. Because it draws on the entire genomic information of the patients and of the drug, one may view this approach as an attempt at systems treatment. However, it suffers from being too crude an approach. In particular it makes no specific reference to any of the many altered states of biological functions associated with the disease. By not paying attention to individual biological functions, a ''best'' drug could very well be a compromise, chosen for having strong beneficial effects on a subset of functions at the expense of being harmful to some other functions.
Another study that utilizes variable gene signatures to screen repurposed drugs has successfully identified many heterogeneous Food and Drug Administration (FDA) approved drug candidates for breast, myelogenous leukemia, and prostate cancer [23]. This method typically yields a long list of heterogeneous drug candidates without providing details that may help in differentiating the drugs, details such as how a drug differently impact the multi-functionalities of (a specific) cancer. Other more sophisticated methods based on computational network models have been developed to identify novel therapeutic targets for the purpose of treating regulatory cellular networks [24,25]. The effectiveness of these approaches, which aim to elevate the relative activity of certain cell regulatory networks, and base their predictions on elaborate models optimally tuned to fit existing temporal and spatial data, may be restricted by the limited existing knowledge on networks and parameters describing protein activities.
Here, we present a novel analytical framework, called Functional Module Connectivity Map (FMCM), for the discovery of drug compounds for systems cancer treatment. We constructed condition-specific function-function networks (FFNs) and applied a gene-selection-by-trend-of-progression procedure (GSToP) [26] to identify complexly connected and highly expressed hub genes in the FFNs. We then used functional modules constructed around the hub genes to query CMap for the discovery of ontologyspecific repurposing drugs, and further screened the drugs by requiring that they exhibit minimum intracellular harmful side effects. Relative to the standard IGCM protocol, FMCM was more robust in its drug selection and it more consistently predicted higher hit rates (,65%) on effective drugs against early tumorigenesis in colorectal cancer. When checked against known drug indications in Therapeutic Target Database (TTD), FMCM showed significantly higher accuracy and lower false positive rates on the discovery of the anti-cancer agents than IGCM, except for the immune system. Our viability tests on eight of the candidate drugs showed three, GW-8510, ginkgolide A, and 6-azathymine, represented high inhibitory activities against the survival of cancer cell lines with specific concentrations and administration durations. Follow-up microarray experiments confirmed that both the CMap and our datasets showed consistent results on three independent drugs -phenoxybenzamine (broad effects), GW-8510 (cell cycle), and imipenem (immune system). These results demonstrated the effectiveness of FMCM, and suggested its potential for formulating repurposed drug regimes with minimum harmful side effects in cancer patients.

Data sources
Gene expression data for 32 patients with sporadic colorectal polyps (adenoma) and corresponding adjacent normal mucosa from the same individuals were obtained from Gene Expression Omnibus (GEO) database (accession number: GSE8671) [27]. We extracted 30,047 protein entries and 39,194 protein-protein interactions (PPIs) from the Human Protein Reference Database (HPRD) [28] and used Gene Ontology (GO) [29] for functional information.

External database
We used the Connectivity Map database (CMap) build 02 [17], with 6,100 treatment expression profiles representing 1,309 drugs (and compounds), to compute enrichment scores (ES) of gene set against drugs.
For reference on drug indication we used L01 class, antineoplastic agents, Anatomical Therapeutic Chemical (ATC) Classification System, World Health Organization (WHO) (http://www. whocc.no/). We extracted information on known therapeutic protein targets, relevant diseases or cancers, and corresponding drugs (787 drugs; 60% of CMap datasets) from the Therapeutic Target Database (TTD: http://bidd.nus.edu.sg/group/ttd/) [30]. In addition, we queried key words on searching engines to define relative therapeutic drugs on cancer treatment.
We downloaded the annotated 4,884 gene-sets from the Molecular Signatures Database (MSigDB: http://www. broadinstitute.org/gsea/msigdb/index.jsp) [31]. The gene-sets are of four types: C2: curated gene-sets from known pathways, online databases, and knowledge of domain experts; C3: motif gene-sets based on conservative cis-regulatory motifs from human, mouse, rat, and dog genomes; C4: computational gene-sets determined by co-expression neighbourhoods centered on 380 cancer-related genes; C5: gene-ontology gene-sets collected from the same GO annotations of genes. Gene symbols in each gene-set were combined and converted into HG-U133A Affymetrix ID according to the updated annotation file at the website http:// www.affymetrix.com/estore/.
Gene selection by individual gene analysis (IGA) and Individual gene connectivity map (IGCM) Differentially expressed genes (DEGs) were selected using the Significance Analysis of Microarrays algorithm (SAM) [32]. Unless otherwise stated, threshold values for false discovery rate (FDR) ,0.05 and fold change (FC) .2 were used. Enrichment score (ES) matching gene set to drug was computed through CMap [17].

Beneficial and harmful drugs
Given a gene set, a drug was designated beneficial or harmful if the ES is ,20.5 or .0.3. For drugs to be designated beneficial a randomization p-value,0.005 was required, unless otherwise stated.

Construction of gene-gene interaction network (GGIN) and function-function interaction network (FFN)
For a given condition -control (Nor) or adenoma patient (Ade) -and a Pearson p-value (see below) threshold p 0 , we included a pair of genes in the GGIN if: (1) the p-value for the pair was not greater than p 0 ; (2) the protein pair encoded by the gene pair was linked in the PPI data. For a given set of n microarrays, a Pearson's correlation coefficient (PCC) between a pair of genes was calculated using the two sets of n intensities of the pair. Each PPC is assigned a Pearson p-value based on permutation tests and t-statistics. Genes in each type-specific GGIN were assigned to over-represented biological functions, called functional modules, through Gene Ontology term association [29]. Enrichment analyses based on conditional hypergeometric test [33] were made using the R package GOstats downloaded from the Bioconductor website. Each GGIN was reduced to functionfunction network (FFN) using functional modules as nodes.

GSToP and the functional module connectivity map (FMCM) framework
The FMCM framework for selecting therapeutic drug compound consisted of two segments, selection of functional modules of predicted cancer genes based on the GSToP procedure [26] (steps 1-5 below), and multiple queries, one for each functional module, of the CMap for drug identification (steps [6][7][8]. Steps in the selection procedure ( Figure 1 [17] to obtain for each function two lists respectively for predicted beneficial (ES ,20.5) and harmful (ES .0.3) drugs (see above for requirement on randomization p-value). (7) Function-drug association map (FDAM). Use the drug lists to construct a map with two kinds of nodes, function module and drug, and two kinds of function-drug links, beneficial and harmful. Include in FDAM only drugs that have at least one beneficial link. (8) Construct from FDAM all predicted drug compounds, where a compound is minimum set of purely beneficial drugs that covered all functions.

Accuracy, specificity, and reproducibility in performance tests
Positives NB and negatives NA were drugs predicted to be beneficial and harmful, respectively; true positives TP and false negatives FN were known anti-tumor agents predicted to be beneficial and harmful, respectively; true negatives is TN = NA-FN, and false positive is FP = NB-TP. Accuracy was defined as (TP+TN)/(NB+NA), and specificity as TN/(FP+TN).
For reproducibility a drug prediction procedure (FMCM or IGCM) was repeated 10 times, each time working on a set of 40 randomly chosen microarrays, 20 each from controls and patients, and reproducibility was measured over the 1069/2 = 45 pairs of results. For each pair reproducibility was (the size of) the intersection of the two sets of selected drugs divided by the geometric mean of the two sets.

Cell cultures and reagents
Human colon cancer cell lines (HCT116, RKO, SW403, and SW620), and breast cancer cell lines (MCF7) were obtained from ATCC (American Type Culture Collection, Manassas, VA) and maintained as suggested by ATCC. The growth media for all cell lines were supplemented with 10% fetal bovine serum (FBS), 50 units/ml of penicillin and streptomycin, and incubated at 37uC with 5% carbon dioxide. In experiments, cells were treated with ethanol, water or DMSO as corresponding vehicle control. Phenoxybenzamine, GW-8510, etacrynic acid, ginkgolide A, triflusal, imipenem, 6-azathymine were purchased from Santa cruz (CA). Phthalylsulfathiazole was purchased from Sigma (St. Louis, MO). Phenoxybenzamine, phthalylsulfathiazole, etacrynic acid, ginkgolide were dissolved in ethanol. Imipenem was dissolved in water. The remaining drugs were dissolved in Dimethyl sulfoxide (DMSO).

Cell Proliferation assay
Proliferation activities of five cell lines -colon cancer, HCT116, RKO, SW403, and SW620, and breast cancer, MCF7 -were monitored by Alamar Blue (Molecular Probes, Invitrogen Corporation), an oxidation-reduction reagent, and determined by measuring the reduction of resazurin (blue, non-fluorescent) to resorufin (red, highly fluorescent). Cells were seeded in 96-well cultured plates and, following the study design of the CMap [17], treated with single drugs with concentration of 0, 0.1, 1,10, 30 mM for 5 days, then assayed for proliferation activities. One-tenth volume of alamar blue reagent was added and plates were incubated at 37uC for 2-3 hours. Cell viability was determined by measuring fluorescence with excitation at 550 nm and emission at 590 nm on Synergy HT (BioTek Instruments, Winooski, VT). Cell survival was calculated as relative value of the difference between the reductions of Alamar Blue in treated versus controls.

RNA extraction and microarrays
Cells were seeded in 100 mm dishes and treated with drugs. After drug treatment for 6 hours, total cellular RNA was isolated using mirVana miRNA Isolation Kit (Ambion, Austin, TX) according to the manufacturer's instructions. 250 ng of total RNA was used for microarray experiments. Extracted RNA was labelled with GeneChipH 39 IVT Express Kit (Affymetrix, Santa Clara, CA, USA) and hybridized onto Affymetrix GeneChipH Prime View Human Gene Expression Array. This array contained approximately 530,000 probes covering more than 36,000 transcripts and variants. Raw images were analysed by Affymetrix GeneChipH Operating Software. We performed microarray analysis of the effect of imipenem and phenoxybenzamine (PB) (treated and non-treated) on HCT116 and MCF7, and GW-8510 on HCT116 under the U219 (primeview) platform, all in duplicates. Treatment dosages and duration times were the same as in [17].

Microarray experiments and analysis by IGA and gene-set approach (GSA)
Genome-wide gene expression profiles from drug-perturbed tumour cells evaluated by the Affymetrix GeneChipH Prime View platform were analyzed in R environment (version 2.15.1). Two cell lines, HCT116 and MCF7, were treated with three drugs, GW-8510, phenoxybenzamine (PB), and imipenem, with the same dosages (10 uM, 11.8 uM, 13.4 uM, respectively) and time (6 hours after overnight culture) as in [17]. The microarray profiles were compared with ten profiles from the CMap for MCF7 treated with the three drugs. Gene expression intensities were normalized by Robust Multi-array Average (RMA) [34]. In the IGA approach DEGs were identified by one-way ANOVA using the eBayes function in the limma package [35]. In a gene-set approach (GSA), given a list of ranked differential gene expressions, we used GSEA [31] to convert the 4,884 annotated gene sets in MSigDB [31] to a list of 4,884 ranked ESs, then applied one-way ANOVA to find differential gene sets. In IGA (or GSA) a gene (or a gene-set) with false discovery rate (FDR) less than 0.01 was considered significant and selected for two-way hierarchical clustering of the microarray set. GO terms for overrepresented gene (or gene-set) clusters in the IGA (or GSA) heatmap were determined using DAVID [36].

Function-function networks
High quality of microarray data was indicated by the clean separation of control (from 32 normal tissues) and sample (32 patients) in Principle Component Analysis ( Figure S1A). The 2,164 DEGs selected by SAM (with thresholds FDR ,0.01 and FC .2) correctly classified the controls and sample in hierarchical clustering ( Figure S1B). Clustering results were not sensitive to moderate variations in threshold values (not shown). Gene-gene interaction networks (GGINs) were constructed with a threshold of Pearson p,0.001 from the control and adenoma cohort microarray data ( Figure S2). The adenoma GGIN has 6.4% more genes (1,792 vs. 1,684) and 32% more links (2,656 vs. 2,017) than the control GGIN. The difference between the two cases became evident when the GGINs were reduced to function-function networks (FFNs) having functional modules as nodes ( Figure 2, Table S1). Cell cycle, DNA replication, and DNA repair functional modules were much larger in the adenoma FFN and exhibited much high levels of intra-function activity. There were also more inter-module activities in adenoma than in control. In a noted exception, the inter-module activity between immune system process and cell proliferation was weaker in adenoma than in control.

Repurposed therapeutic drugs selected by IGCM
CMap gives enrichment scores (ES) to gene lists not longer than 1,000 entries. We complied with this constraint (i.e., restricting the size of the DEGs) by requiring FDR ,0.01. Five DEG lists with FC thresholds of 3.0 to 5.0 with 0.5 intervals were generated and their ES's for the 1,308 drugs (or small compounds) were obtained by querying CMap. The list of beneficial (i.e., anti-adenoma) drugs was sensitive to (the threshold value of) FC, with the size of the list decreasing with increasing FC ( Figure 3A). The number of beneficial validated drugs decreased with increasing FC ( Figure  S3). According to TTD, many known therapeutic anti-cancer drugs, such as chrysin (pink, TTD id: DNC004715), GW-8510 (red, TTD id: DNC004631), daunorubicin (cyan, TTD id: DAP000788), apigenin (light purple, TTD id: DNC004714), resveratrol (yellow green, TTD id: DNC001205), coincidentally all changed from beneficial at FC = 3 to harmful at FC = 3.5 ( Figure 3A). At FC = 5.0, the most stringent threshold that we used mostly, AG-012559 was the only beneficial drug under permutation p,0.005 ( Figure S3).

Repurposed therapeutic drugs selected by FMCM
In the FMCM program, genes selected in each functional module (Table S2) were used separately to query CMap, yielding separate functional specific drug lists. Each functional module was the union of the control and adenoma functional modules given by the respective FFNs, filtered by the GSToP procedure (see Methods). In FMCM the gene size of the module had a much stronger dependence on the value of FC than IGCM ( Figure 3). In IGCM, size of DEG dropped from just above 600 to 200 when the value of FC threshold was raised from 3 to 5. In FMCM module gene size dropped from about 600 to about 30 as (the threshold) FC was raised from 1 to 3, and became too small for CMap application when FC was raised above 3. Even so, within the respective range of FC used, FMCM provided a much more stable and robust environment for drug screening by CMap than IGCM; in FMCM the character of selected drugs (i.e., beneficial or harmful) changed very little (21 out of 256, Figure 3B) while in IGCM changed occurred to 54.5% of selected drugs (12 out of 22, Figure 3A). Function-drug association map (FDAM) and therapeutic drug compounds Purely beneficial and harmful drugs (see Material and Methods) that were beneficial to at least one FM were identified in the FMCM program (FC .2) and used to construct FDAM. The 46 drugs in the FDAM (Table 1) were much more numerous than the corresponding list found in traditional IGCM approach (which had 22 drugs). Thirty of the 46, or 65%, have either been studied individually as anti-tumour agents or have been certified to have preventive effects against a broad range of cancers (Table 1 and  Table S3). The five drugs, thapsigargin, pyrvinium, trifluoperazine, ellipticine, and 0297417-0002B, which in our FDAM were harmful to at least one module, have been reported to show evidence for carcinogenesis/immune suppression activities (Figure 4, Table 1 and Table S3). We view the 41 drugs on the FDAM with no harmful links as candidate therapeutic drugs. The number of modules, or degrees (Table 1), to which a candidate drug was beneficial varied from 1 to 7. There were two degree-7 drugs, phenoxybenzamine and GW-8510, and three degree-5 drugs, thapsigargin, phthalylsulfathiazole, and medrysone (see Table 1 for full details on degrees and drug-module relation). A therapeutic drug compound is a minimum set of drugs culled from the list of candidate therapeutic drugs that covered all the modules. Many compounds could be constructed from the candidate drug list. There were two 2-compnent compounds, phenoxybenzamine+ISP and GW-8510+ISP, and 20 compounds with up to six drug components ( Table 2). Barring drug-drug interaction, we predict these compounds to be free of harmful side effects at the intracellular level.

Comparison between IGCM and FMCM
Stability. As mentioned earlier, the characterization of a drug, namely beneficial or harmful, was much more stable FMCM than in IGCM (Figure 3).
Accuracy and specificity. We used anti-tumor agents in the Therapeutic Target Database (TTD) to evaluate the accuracy and specificity (Material and Methods) of the FMCM and IGCM predictions. The ''true'' drug set in the test was the intersection of TTD and the CMap list, which included about 40% of TTD. For simplicity, we denote by IG3 the IGCM query at FC = 3 and the rest (queries with FC .3) by IGL. We found that FMCM had overall accuracy ( Figure 5A) and specificity ( Figure S4) similar to IG3 and higher than IGL, except for the immune system process module, where FMCM was worse than IGL.
Reproducibility. We tested the reproducibility (Material and Methods) of the drug predictions by repeating 10 times the FMCM and IGCM procedures, each time working on a set of 40 randomly chosen microarrays, 20 each from controls and patients. In the FMCM procedure, GGINs were constructed using DEGs selected by SAM at FDR ,0.01 and FC .2, and the selected drug set was the sum of beneficial and harmful drugs. FMCM had significantly higher reproducibility (on average ,80%) than IGCM (on average ,50%), except for the module immune system process (on average ,60%) ( Figure 5B, two-sample t-test p,10 215 ).
Clinical application I. We took four known anti-cancer drugs, irinotecan (no. 29 in Table 1 Figure 6). The first drug, irinotecan (trade mark Camptosar), is in current use, in particular in combination with other chemotherapy agents such as 5-fluorouracil and leucovorin in a common colorectal cancer regimen called FOLFIRI [37], was significantly beneficial only to the apoptosis module by FMCM ( Figure 6A). Thapsigargin, a known endoplasmic reticulum Ca2+ ATPase inhibitor [38], was both significantly beneficial and harmful to colorectal cancer in FMCM (apoptosis and RNA metabolic process, respectively, Figure 6B), but was neutral IGCM. The third example, 8-azaguanine, a purine analog that exhibits anti-neoplastic activity and have been used in the treatment of acute leukemia [39], was significantly beneficial to the apoptosis and cell proliferation modules in FMCM, but was harmful (not significantly) in all IGCM tests ( Figure 6C). Vorinostat (trade mark Zolinza), a member of a larger class of compounds that inhibit histone deacetylases (HDACs), and known to arrest cancer cells epigenetically, was significantly and broadly beneficial in both IGCM (FC from 3 to 4.5) and FMCM (cell proliferation, signal transduction, and transcription) and showed no harmful effects ( Figure 6D). These results suggested that FMCM have higher resolution to detect known ant-cancer agents than IGCM.
Clinical application II. Using the IGCM and FMCM gene lists we examined the ES versus FC patterns of 27 chemo-drugs, not necessarily specific to colon cancer treatment, listed in the Anatomical Therapeutic Chemical (ATC) classification system (L01 class; anti-neoplastic agents), and partitioned the group by pattern into six types ( Figure S5A-F). An overall characterization of our results was that IGL, which represent queries using more stringently selected DEGs, on the one hand and IG3/FMCM on the other tended to give contrast indications to many of the antineoplastic agents. In Figure S5A, to which the drug irinotecan belongs, drugs were mostly harmful in the IGL queries but were beneficial in the IG3 and most of the FMCM queries. In Figure  S5B, the IGL queries were mostly beneficial and the IG3/FMCM mostly harmful. In Figure S5C and S5D, all IGCM queries (IGL and IG3, with one exception) were beneficial while the FMCM queries had significant harmful components. The most beneficial drug was vorinostat, the single entry in Figure S5E, where all queries were beneficial. The most harmful drugs were the allharmful celecoxib and paclitaxel, with carmustine and imatinib close behind ( Figure S5F). Vorinostat, doxorubicin, daunorubicin, irinotecan -all in Figure S5A -satisfied our stringent criteria for inclusion in Table 1 as components for therapeutic drug compounds.

Cell viability on single predicted drugs
Eight beneficial drugs from Table 1, phenoxybenzamine (PB; beneficial to 7 functional modules), GW-8510 (GW; 7), phthalylsulfathiazole (PS; 5), etacrynic acid (EA; 2), ginkgolide A (GA; 1), triflusal (TF; 1), imipenem (IM; 1), and 6-azathymine (6-AT; 1), were selected for their commercial availability and degree of beneficence for preliminary cell model experimental validation on five cell lines: colon cancer, HCT116, RKO, SW403, and SW620, and breast cancer, MCF7. GW had the strongest effect on the cell lines and MCF7 was the cell line most susceptible to the tested drugs ( Figure 7). GW, EA, GA, and 6-AT could selectively or broadly inhibit cell viability on the cell lines. GW, a known CDK2 inhibitor used in protection of hair-loss in chemotherapy, exhibited strong inhibitions against HCT116 and MCF7, moderate effects against RKO and SW620, and weak effects against SW602. EA, GA, and 6-AT moderately to weakly inhibited the viability of MCF7 (Table 3).

Microarray results and test of the perturbagen concept
We tested the implicit perturbagen assumption [17] that CMap data on gene expression profiles from drug treatments on one cell line (MCF7) are useful for drawing inferences more generally on the effects of the drug, in particular, its effect on different cell lines. We generated five microarray global gene expression profiles, of     PB and IM on HCT116 and MCF7, and of GW on HCT116 (the efficacy of GW on MCF7 is similar). As control we extracted from the CMap database 10 corresponding datasets of the three drugs on MCF7 (4, 4, and 2 datasets from PB, GW, and IM, respectively). We carried out separate two-way hierarchical clustering of the 15 profiles employing the IGA and GSA procedures. In spite of different cell lines, microarray platforms, and laboratory conditions, the samples, especially the five GW administered samples, clustered more according to drugs than not in both procedures (Figure 8). The outstanding qualitative difference between the IGA and GSA procedures was that the IGA heatmap was dominated by a single GO term, cell cycle ( Figure 8A, Table S4, S5, S6), whereas the GSA heatmap was characterized by four terms: monosaccharide metabolic process, response to hormone stimulus, inflammatory response, and cell cycle phases ( Figure 8B, Table S7, S8, S9, S10). The IGA heatmap corroborates the result mentioned earlier, that GW, alone among all the drugs, had high negative impact on cell survival. The GSA heatmap allowed the effects of smaller gene sets to be displayed and had a closer correspondence to our FMCM approach to drug selection. For example, it provided an independent support that IM, but not the other drugs tested, exhibited a strong beneficial effect on the immune system process in colorectal cancer cells (Table 1).

Discussion
CMap has been widely applied to drug discovery for treatment of complex diseases, including cancer. Methodologies employed for applying CMap for this purpose are simply variations of a [17]. The important differences between the FMCM procedure used here and IGCM include: (i) In addition to differential expression, genes selected from querying CMap were screened by the GSToP procedure making use of GGINs [26]; (ii) functional modules (FMs) were built from selected genes and used for separate querying of CMap to select function specific beneficial drugs (Table 1); (iii) querying results were used to form drug compounds, each compound being a minimum set of drugs that collectively were beneficial to all the FMs and not harmful to any FM ( Table 2).
Our tests showed FMCM to perform better than IGCM in terms of prediction stability (Figure 3), accuracy ( Figure 5A), specificity ( Figure S4), and reproducibility ( Figure 5B). In short, FMCM was much more robust than IGCM in drugs selected. One reason for the relative robustness of FMCM prediction over IGCM may be that most modern drugs were designed to affect a specific biological function, say, by targeting a transcription factor, not to affect all functions. An FM-drug association is therefore expected to be more stable than that between the whole DEG set and the drug. Because IGCM does not test drugs for function specific harmfulness, drugs selected by IGCM may be deemed overall beneficial yet still harmful to some functions, or vice versa. This was the case for three well-known drugs used in the cancer chemotherapy listed in the CMap database: thapsigargin, 8azaguanine, and irinotecan ( Figure 6). Drugs selected by FMCM had high sensitivity and low falsepositive rate. Of the total 46 candidate drugs (Table 1), 41 were entirely beneficial and five were harmful to some FMs. Thirty of 46 drugs have been reported in the literature to have properties related to cancer. Of the 41 putative entirely beneficial drugs, 25 have been reported to have anticancer properties and one has been reported to be carcinogenic (or mutagenic). Of the five putative partly harmful drugs, four have been reported to be both anticancer and carcinogenic.
Phenoxybenzamine, an a-adrenergic-antagonist, was our only false-positive case. We identified it as a degree-7 beneficial drug, but there has been no literature suggesting its anti-tumorigenicity. Instead, there are two clinical studies suggesting possible carcinogenic effects in patients [40,41].
A novel feature of the FMCM approach was its ability to discover intracellular harmful side effect in agents known to be anti-cancer. This is not surprising given the widespread practice of targeting specific biomolecule in drug designs when it is now known that the typical regulatory relation between transcription factors and biological networks is many-to-many. We identified pyrvinium, trifluoperazine, ellipticine, and 0297417-0002B to be harmful to immune system process but otherwise beneficial. The first three -there is no literature on 0297417-0002B -has been reported to have anti-tumor properties but also have lethal effects in cell or animal models (Table 1). We identified thapsigargin to be beneficial to the apoptosis FM but broadly harmful to the signal transduction, transcription, cell proliferation, and RNA metabolic process FMs. This drug has been reported to have the ability to promote apoptosis on prostate and breast cancer cells but also to stimulate cell growth in mouse keratinocyte models (Table 1) [42][43][44][45][46][47].
Based on our in silico screening and cell viability experiments, we selected the four drugs, GW-8510 (GW, no. 2 in Table 1), etacrynic acid (EA, no. 16), ginkgolide A (GA, no. 39), and 6azathymine (6-AT, no. 44), as potential therapeutic agents for colorectal adenoma. GW, which exhibited clear inhibitory effects against colon cell growth (Figure 7) in our viability experiments, is a cyclin-dependent kinase 2 (CDK2) inhibitor used for preventing hair loss in chemotherapy. It was suggested that the observed antitumor efficacy of GW's derives from its inhibition of tumor growth via cell cycle control [48], a suggestion supported by our in silico study. EA is a potent inhibitor of glutathione S-transferase (GST) family members and has been used to treat high blood pressure and swelling caused by kidney failure. It has been suggested that EA may inhibit cell growth and induce cancer cell death through apoptosis [49][50][51], a notion that correlated well with our findings (Figure 4). In our analysis the ES's for EA were 20.891 and 20.875 versus the apoptosis and cell cycle modules, The  respectively (Table 1). EA has also been reported to have therapeutic potential in cancer therapy by reversing drug resistance [52][53][54]. GA, a Ginkgo biloba leaf extract, has been used for treatment in a wide variety of cognitive and vascular disorders, including dementia and peripheral arterial occlusive diseases [55][56][57]. Its structural homolog Ginkgolide B has been reported to possess anti-inflammatory anti-allergic, anti-oxidant, anti-cancer, and neuroprotective effects [58]. In our analysis GA had ES = 20.834 against the immune system process module (Table 1). Recent studies conducted with various molecular, cellular, and animal model experiments have concluded that Ginkgolide B may have chemopreventive abilities associated with anti-angiogenic, antioxidant, and gene-regulatory events [59][60][61][62]. 6-AT, an analog of thymine, has been shown to inhibit the pathway of biosynthesis of nuclear acids in cancer tissues [63,64]. It had ES = 20.813 against the signal transduction module (Table 1) and inhibited tumor growth in the cell line MCF7 (Figure 7). Our study showed that the pair GW and GA combined would yield beneficial effects on all eight FMs (Table 2). We were not able to find another gene expression microarray set on colon adenoma that passed our quality test (Principal Component Analysis, Figure S1). We did find a dataset on colorectal cancer (CRC) that did, from GEO database (accession number: GSE32323). The dataset, Affymetrix HG-U133 plus 2.0 arrays, contained 17 pairs of cancer and healthy colon tissues from CRC patients [65]. We subjected the set to the same IGCM and FMCM analyses as we did the adenoma data. For the predicted repurposed candidate drugs, FMCM performed better than IGCM in stability and reproducibility, and was comparable with IGCM in accuracy and specificity. Through FMCM we selected 43 candidate drugs for CRC. Among this drug set, 20 also belonged to the set of 46 drugs for adenoma (Figure 9). This  Table 1). Given that IGCMbased drug screening for disease specific but different data sets is known to lead to drug sets that are highly variable, and that colon adenoma and CRC are related but not the same conditions, the degrees of overlap between the colon adenoma and CRC drug sets are encouraging. The overlap set contained 5 of the 8 adenoma drugs selected for cell viability tests. This is significant because the tests were conducted prior to the analysis of CRC data.
There are limitations inherent to our approach. It depends on the availability of gene expression profiles of drugs; many FDAapproved drugs are not among the more than 1,000 compounds with profiles given in CMap database 2.0 release, the version used in this work; the implicit perturbagen assumption that the drugspecific genomic profiles given by CMap are only mildly dependent on the specific (mostly breast cancer) cell lines requires validation in every application; the reliability of our results depends on the quality and accuracies of patient gene expression profiles, PPI, and GO data; and, specific to this work, how the drug screening results may depend on the selection of FMs for drug screening has not been thoroughly studied.
Our results need further experimental validation because therapeutic efficacy of a drug is always more complex than just a simple matching of expression profiles, even when conducted in the refined fashion of FMCM. In addition, not known at this stage are how components in a compound (Table 2) would interact with each other and how the interaction would impact its predicted property. Such effects can only be assessed in animal model tests when the selected drugs are applied in compound form.
These limitations are common to all CMap-based or similar drug discovery methods, yet our method has merits not possessed by other screening methods. We believe our method is useful for drug discovery for therapeutic systems treatment not only for colorectal adenoma, but also for other types of cancer as well as for other complex diseases and conditions.  Table 3. Inhibitory effects of single predicted drugs on colon cancer and breast cancer cell lines.   Figure S4 Specificity of predicted drugs. Specificity is true negative (known cancer-inducing agent predicted to be harmful) over all drugs predicted to be harmful; higher specificity implies lower false positive. Seven of the eight FMCM results (red), except immune systems process (cyan), have higher specificities than the five IGCM results (black). (TIF) Figure S5 Enrichment scores of 27 chemo-drugs. The 27 chemo-drugs, selected from the L01 class (antineoplastic agents) in the Anatomical Therapeutic Chemical system, are not specific to colon cancer treatment. The ES is those from five IGCM (FC threshold 3 to 5) and eight FMCM runs (FC .0.2). Solid symbol indicates an ES with permutation p value,0.05. The 27 drugs are clustered into six groups according to overall pattern. (TIF) Table S1 Gene ontology enrichment analysis for functional modules.
(XLS) Table S5 GO terms analysis for genes in the pink block in the IGA heatmap ( Figure 8A). Top-10 gene ontology annotation clusters were determined by DAVID [36].
(XLS) Table S6 GO terms analysis for genes in the purple block in the IGA heatmap ( Figure 8A). Top-10 gene ontology annotation clusters were determined by DAVID [36].
(XLS) Table S7 GO terms analysis for genes in the green block in the GSA heatmap ( Figure 8B). Top-10 gene ontology annotation clusters were determined by DAVID [36].
(XLS) Table S8 GO terms analysis for genes in the blue block in the GSA heatmap ( Figure 8B). Top-10 gene ontology annotation clusters were determined by DAVID [36].
(XLS) Table S9 GO terms analysis for genes in the orange block in the GSA heatmap ( Figure 8B). Top-10 gene ontology annotation clusters were determined by DAVID [36].
Table S10 GO terms analysis for genes in the purple block in the GSA heatmap ( Figure 8B). Top-10 gene ontology annotation clusters were determined by DAVID [36].  . Overlap of candidate repurposed drug sets curated from colon adenoma and colorectal cancer data sets. Numbers in brackets correspond to those given in the first column of Table 1, which lists the drug set for colon adenoma. The overlap includes 9 of the 13 drugs in Table 1 with degrees not less than 3, and 5 of the 8 drugs selected for cell viability tests marked by ''*''. doi:10.1371/journal.pone.0086299.g009