Large-scale interpretation of chemical-genetic interaction profiles using a genetic interaction network

Genetic interactions provide a key for interpreting the functional information contained in chemical-genetic interaction profiles. However, they have remained underutilized in this capacity across recent chemical-genetic interaction screening efforts and their ability to interpret chemical-genetic interaction profiles on a large scale has not been tested. We developed a method, which we refer to as CG-TARGET (Chemical Genetic Translation via A Reference Genetic nETwork), that integrates the data from large-scale chemical-genetic interaction screens with genetic interaction data to predict the biological processes perturbed by compounds. CG-TARGET compared favorably to a standard enrichment approach across a variety of benchmarks, achieving similar performance on measures of accuracy and substantial improvement in the ability to control the false discovery rate of its predictions. We found that one-third to one-half of gene mutants in the data contribute to the highest-confidence biological process predictions and that these contributions overwhelmingly come from negative chemical-genetic interactions. This method was used to prioritize over 1500 out of over 13,000 compounds for further study in a recently-completed chemical-genetic interaction screen in Saccharomyces cerevisiae, enabling the rapid functional annotation of unknown compounds to biological processes through targeted biological validations. We present here a detailed characterization of the method and further biological validations to demonstrate the utility of genetic interactions in the interpretation of chemical-genetic interaction profiles and the effectiveness of our implementation of this concept.


INTRODUCTION 39
The ability to discover chemical compounds with desirable and/or interesting biological 40 activity is essential to understanding the way compounds and biological systems interact. One 41 way to characterize the biological activity of a compound in an unbiased manner is to profile its 42 activity across a genome-wide array of genetic mutants, also known as chemical-genetic 43 interaction screening [1]. In the resulting chemical-genetic interaction profiles, the identities of 44 the gene mutations that confer sensitivity or resistance to a compound provide functional 45 information regarding the actions performed by that compound inside the cell. 46 Genetic interaction profiles provide analogous information regarding gene function, and as 47 such can be used to interpret the functional information contained in chemical-genetic interaction 48 profiles [2]. Specifically, shared interactions between chemical-genetic and genetic interaction 49 profiles may implicate a particular gene or group of genes (e.g. a biological process or protein 50 complex) as the target of a compound's actions in the cell (Figure 1). This scheme for 51 interpreting chemical-genetic interaction profiles does not depend on the existence of chemical-52 genetic interaction profiles for well-characterized compounds, and thus enables the discovery of 53 compounds with novel modes of action. 54 Recent advances in whole-genome chemical-genetic interaction screening technology have 55 opened the possibility of using chemical genomics as a high-throughput screening approach [3-56 5]. This would, for example, enable functional profiling of compound bioactivity at earlier points 57 in the drug discovery process, providing an additional means of prioritizing promising 58 compounds (and discarding compounds with less obvious, yet undesirable activities) before 59 investing into them large amounts of resources. However, genetic interactions remained 60 essentially unused in these screens for the systematic interpretation of chemical-genetic 61 interaction profiles. As such, a systematic interpretation of chemical-genetic interaction profiles 62 using genetic interaction profiles has not been demonstrated on a large scale (1,000s to 10,000s 63 of compounds). A study of this type would provide insights into the compatibility between 64 chemical-genetic and genetic interaction profiles and the ability of a genetic interaction-based method to prioritize compounds with high-confidence predictions while controlling for issues 66 typically associated with high-throughput chemical screening. 67 In this manuscript, we present the use of genetic interaction profiles to systematically 68 interpret chemical-genetic interaction profiles on a large-scale. Specifically, we developed a 69 method, called CG-TARGET (Chemical Genetic Translation via A Reference Genetic 70 nETwork), that incorporates genetic interaction data and different sources of experimental 71 variation to predict the biological processes perturbed by compounds. We applied this method to 72 a high-throughput chemical-genetic interaction screen of more than 13,000 compounds in S. 73 cerevisiae [6], using profiles from the corresponding yeast genetic interaction network [7,8] to 74 interpret the chemical-genetic interaction profiles. CG-TARGET recapitulated known 75 information for well-characterized compounds and showed a marked improvement in the ability 76 to control the false discovery rate -and as a result, prioritize interesting compounds -compared 77 to a baseline approach. We also confirmed, through a global analysis, the compatibility between 78 chemical-genetic and genetic interaction profiles for the purpose of predicting perturbed 79 biological processes. CG-TARGET is available, free for academic use and licensed for 80 commercial use, at github.com/csbio/CG-TARGET. 81

RESULTS 82
Predicting perturbed biological processes from chemical-genetic interaction profiles 83 When developing a method that uses genetic interaction profiles to interpret chemical-genetic 84 interaction profiles obtained at scale, it was important to consider scenarios in which 85 experimental artifacts or common signatures in the chemical-genetic interaction profiles could 86 strongly influence the similarities between chemical-genetic and genetic interaction profiles, 87 leading to biased and inaccurate process predictions. For example, common similarity measures 88 are blind to the variance of individual gene mutants across all chemical-genetic interaction 89 profiles. As a result, gene mutants with highly variable interaction scores in chemical-genetic 90 interaction experiments possess the potential to drive the prediction of processes in a nonspecific 91 manner. While this can be addressed with negative experimental controls, it is not inconceivable 92 that certain gene mutants would respond nonspecifically only in the presence of compound, 93 requiring a correction derived from the dataset itself. Additionally, spurious correlations 94 introduced by normalized similarity measures on weak chemical-genetic interaction profiles (Pearson correlation coefficient, cosine correlation) can be further amplified by the redundancy 96 in the genetic interaction network, leading to false discoveries. 97 We developed CG-TARGET (Chemical Genetic Translation via A Reference Genetic 98 nETwork) to address these concerns surrounding the prediction of perturbed biological processes 99 at scale (Figure 1). Predicting a compound's perturbed biological processes using CG-TARGET 100 requires three input datasets (chemical-genetic interaction profiles, genetic interaction profiles, 101 and a mapping from the query genes in the genetic interaction profiles to biological processes) 102 and involves four distinct steps. First, a set of resampled chemical-genetic interaction profiles is 103 generated, each of which consists of one randomly sampled interaction score for each gene 104 mutant across all compound treatment profiles in the chemical-genetic interaction dataset. 105 Second, scores reflecting both the strength of each compound's chemical-genetic interaction 106 profile and its similarity to the profile of each gene mutant are obtained by computing a dot 107 product between all chemical-genetic interaction profiles (comprising compound treatment, 108 experimental control, and resampled profiles) and all L 2 -normalized query genetic interaction 109 profiles. These "gene-level" prediction scores, which possess per-compound ranks equivalent to 110 those obtained using cosine similarity but prioritize compounds with stronger profiles, are then 111 aggregated into process predictions; the z-score and empirical p-value for each compound-112 process prediction are obtained by mapping the gene-level prediction scores to the genes in the 113 process of interest and comparing these scores to those from shuffled gene-level prediction 114 scores and to distributions of the scores derived from experimental control and resampled 115 profiles. Finally, the false discovery rates for these predictions are estimated by calculating the 116 frequency at which experimental control and resampled profiles predict processes across a range 117 of significance thresholds, compared to the compound treatment profiles. 118

Application to and evaluation on large-scale chemical-genetic interaction screening
119 data 120 We applied CG-TARGET to the problem of predicting biological target processes from two 121 recent large-scale chemical-genetic interaction screens in S. cerevisiae. The first screen was 122 performed on 9850 compounds from the RIKEN Natural Product Depository [9] (the "RIKEN" 123 screen) and the second was performed on 4116 compounds from the NCI Open Chemical 124 Repository's plated compound libraries, the NIH Clinical Collection, and GlaxoSmithKline's 125 genetic interaction profiles obtained from each screen was 8418 and 3565, respectively. In both 127 screens, the chemical-genetic interaction profiles were determined across a diagnostic set of 128 approximately 300 haploid gene deletion mutants. Genetic interaction profiles were obtained 129 from a compendium of genetic interaction profiles in S. cerevisiae [7], with the query genes 130 mapped to propagated Gene Ontology biological process terms [11,12] to define the process 131 targets. 132 To provide a baseline approach for benchmarking the performance of CG-TARGET on these 133 large screens, we implemented a standard enrichment approach that tests for the enrichment of 134 processes in each compound's top-k gene-level prediction scores. Using experimental control 135 and resampled profiles, we assessed the ability of the enrichment-based prediction method to 136 control the false discovery rate at various values of k. The best-performing value of k was then 137 used to further benchmark the accuracy of the process predictions made by CG-TARGET. 138 CG-TARGET was successful in controlling the false discovery rate across both chemical-139 genetic interaction screens, identifying 848 out of 8418 compounds from the RIKEN screen 140 (10%) and 705 compounds from the NCI/NIH/GSK screen (20%) with at least one prediction 141 that achieved false discovery rates of 25 and 27%, respectively (Table 1, Figure 2A-D). In 142 contrast, the best-performing top-k enrichment approach (k=100) identified only 57 compounds 143 with an equivalent false discovery rate when applied to the profiles from the RIKEN screen 144 ( Figure 2E-F). In all cases, the false discovery rates derived from resampled profiles were more 145 conservative than those derived from experimental controls, suggesting that some sources of 146 variance in each gene mutant's interaction scores arise only upon treatment with compound and 147 therefore cannot be corrected using only negative experimental controls. The compounds from 148 the RIKEN and NCI/NIH/GSK screens with at least one prediction at or below the respective 149 false discovery rate cutoff set for each screen will be referred to as the RIKEN and 150 NCI/NIH/GSK "high-confidence sets," respectively. 151 We benchmarked the accuracy of CG-TARGET's process predictions against a set of 35 152 gold-standard compound-process annotations curated from the literature and observed favorable 153 performance to that on predictions generated using the top-100 enrichment approach. More 154 specifically, we computed the rank of each compound's gold-standard process within its set of 155 process predictions, and compared this rank to those obtained from randomly shuffled 156 predictions. CG-TARGET performed slightly worse overall when comparing the ranks of the gold-standard processes (12 versus 14 in the top ten, Figure 3A) and the number of compounds 158 with significant ranks (22 versus 23, Table 2). However, only 2 out of 23 significantly-ranked 159 gold-standard predictions made using the top-100 enrichment approach achieved a false 160 discovery rate of 25% or less, while 16 out of 22 predictions from CG-TARGET achieved this 161 same false discovery rate. As such, CG-TARGET discovered 8-fold more compounds with 162 significantly-ranked gold-standard process annotations within the RIKEN high-confidence set. 163 This result provides further evidence supporting the utility of genetic interaction profiles in the 164 interpretation of chemical-genetic interaction profiles, while simultaneously demonstrating that 165 the predictive power of genetic interaction profiles improved when combined with additional 166 experimental data. 167 In a more global benchmarking effort, we also observed that CG-TARGET improved the 168 prioritization of process predictions when applied to simulated chemical-genetic interaction 169 profiles. The set of simulated profiles was designed contain three compounds that target each 170 query gene in the genetic interaction dataset; each simulated profile thus inherited the process 171 annotations of its parent genetic interaction profile, providing a gold standard with which to 172 evaluate their predictions. Evaluation performed on the top process prediction for each simulated 173 compound revealed that CG-TARGET captured 15% more gold-standard annotations than did 174 top-100 enrichment. While both methods only captured a gold-standard annotation in the top 175 process prediction for approximately 30% of the simulated compounds, this still represents a 56-176 fold enrichment over the background expectation of 0.00533. In addition, CG-TARGET more 177 successfully prioritized its true positive annotations, as shown by the consistent improvement in 178 precision over top-100 enrichment, especially at low recall values ( Figure 3B). The diversity of 179 the true positive process predictions, when mapped to a set of 17 broad functional 180 neighborhoods, was also improved using CG-TARGET (Shannon index = 2.58 for CG-TARGET 181 versus 2.33 for top-100 enrichment predictions), likely due to a substantial reduction in the 182 number of compounds mapped to the "vesicle traffic" neighborhood (150 for CG-TARGET vs. 183 367 for top-100 enrichment). 184 In addition to benchmarking, we investigated the potential to expand the use cases of CG-185 TARGET to the prediction of perturbed protein complexes. For protein complex prediction on 186 the RIKEN screen data, 714 compounds were identified with at least one prediction achieving a 187 false discovery rate of 25% or less. 603 of these 714 compounds were also identified in the high confidence RIKEN process predictions, suggesting the potential to map prioritized process 189 predictions to more specific, defined predictions of compound targets in the cell. For example, 190 the top protein complex prediction for NPD1409 was "Kornberg's mediator (SRB) complex," 191 which plays an important role in the initiation of transcription as well as chromatin looping 192 [13,14]. This prediction agrees with the compound's top process predictions from the RIKEN 193 screen to perturb "chromosome organization," "DNA metabolic process," and/or "RNA 194 polymerase II transcriptional preinitiation complex assembly," and points to a more direct target 195 for testing in experimental validation efforts. 196

197
Given previous demonstrations [2,7] and the evaluations presented here, it should be clear 198 that the use of genetic interaction profiles to interpret chemical-genetic interaction profiles is 199 both appropriate and useful. However, a further investigation of the inner workings of this 200 approach was warranted to more comprehensively understand the extent to which these two 201 types of profiles can be combined and how this affects the prediction of processes. Here we 202 present visualizations that reveal insights into 1) the interpretation of a chemical-genetic 203 interaction profile to predict a biological process and 2) the different ways in which a process can 204 be predicted using chemical-genetic and genetic interaction profiles. Finally, we quantify, across 205 the RIKEN high confidence set of compounds, the relationship between chemical-genetic 206 interactions and their importance to the prediction of perturbed biological processes. 207 To better understand the interpretation of chemical-genetic interaction profiles, we 208 quantified, for every compound, the contribution of each gene mutant to the prediction of 209 individual biological processes. For a single compound and predicted process, these "importance 210 scores" were obtained by 1) computing the Hadamard product (elementwise multiplication) 211 between the compound's chemical-genetic interaction profile and each L 2 -normalized query 212 genetic interaction profile mapped to the predicted process and 2) for each gene mutant, 213 computing the mean of this product across the genetic interaction profiles. These scores can be 214 positive, indicating agreement in the sign of chemical-genetic and genetic interactions for a 215 particular gene mutant, or they can be negative, indicating that the interactions do not agree for 216 that gene mutant. As such, the importance scores summarize the concordance between chemical-217 genetic and genetic interaction profiles, conditioned on an individual compound and a perturbed 218 process of interest. 219 The prediction of NPD4142, a compound from the RIKEN Natural Product Depository, to 220 the "mRNA transport" process can be used to illustrate how the overlap between chemical-221 genetic and genetic interactions leads to process predictions ( Figure 4A). A qualitative 222 examination revealed that, indeed, NPD4142 possesses a pattern of chemical-genetic interactions 223 similar to the genetic interactions for the query genes annotated to mRNA transport. However, a 224 quantitative assessment achieved more nuance in this comparison. While the POM152 deletion 225 mutant possessed the strongest negative interaction with NPD4142, the importance scores 226 revealed that it was not the most important gene mutant for making this prediction; instead, the 227 deletion mutant for NUP133, which possessed a weaker chemical-genetic interaction score but 228 more genetic interactions with the mRNA transport-annotated query genes, emerged as the most 229 important for predicting mRNA transport. 230 We also compared the concordance of chemical-genetic and genetic interaction profiles 231 across multiple compounds predicted to the same process, revealing that individual processes 232 were predicted by both homogenous and heterogeneous sets of chemical-genetic interaction 233 profiles. For example, all predictions made to "proteasome assembly" depended almost entirely 234 on a strong negative chemical-genetic interaction with RPN4, which was captured most clearly 235 by the relevant importance scores ( Figure 4B). This uniformity in the prediction of a process is 236 contrasted by the diversity of profiles captured within "fungal-type cell wall organization" 237 predictions ( Figure 4C). Here, filtering on the importance scores showed that chemical-genetic 238 interactions with four genes -GAS1, SMI1, ABP1, and DFG5 -were primarily responsible for 239 predictions to this term, but with low agreement regarding their relative importance for each 240 compound's prediction. In the lattermost case, the concordance of chemical-genetic and genetic 241 interactions was not particularly obvious, yet was sufficient to enable the prediction of a 242 perturbed process. 243 More globally, we found broad contribution across a large fraction of observed chemical-244 genetic interactions -primarily negative interactions -to the prediction of perturbed processes 245 ( Figure 4D). By comparing the chemical-genetic interactions for each compound to their 246 corresponding importance scores for that compound's top process prediction, we observed that 247 nearly one-third (5398 / 16464) of chemical-genetic interactions contributed to top process 248 predictions, the fraction of which increased to nearly one-half (5087 / 10281) when considering 249 only negative interactions. While positive chemical-genetic interactions were much less 250 frequently observed overall (and only 7% positively contributed to a top process prediction), they 251 were 3.5 times more likely to contribute negatively to a process prediction than were negative 252 interactions (1.47% vs. 0.42%). Overall, 199 gene mutants (72%) contributed to at least one top 253 process prediction, while half (143) contributed to at least five predictions (importance score > 254 0.1). Based on a more stringent threshold on importance scores (> 1.0), 65 gene mutants (23% of 255 mutants) were observed to be very strong contributors to certain process predictions, with 17 of 256 these contributing strongly to the top process prediction of at least five compounds. While some 257 gene mutants certainly contributed disproportionately to a subset of predictions, the broad 258 majority of predictions required contributions from a much larger fraction of gene mutants. 259

Experimental validation of compound-process predictions
260

Phenotypic analysis of cell cycle progression 261
Several compounds from the RIKEN dataset were predicted to perturb the process related ot 262 the cell cycle. We chose to test 19 of these compounds, 14 of which had high-confidence to the 263 "spindle assembly checkpoint" process active in the M phase of the cell cycle, to determine if 264 our predictions captured the biological activity of these compounds. Indeed, we observed that 7 265 of the 19 compounds induced a cell cycle phenotype, with 6 of the 14 compounds annotated to 266 spindle assembly checkpoint inducing abnormally large buds on cells, changes in the budding 267 index of cells, and increases in cellular DNA content, indicative of arrest in G2/M phase ( Figure  268 5A-C). These phenotypes were not observed when performing the same experiments on a set of 269 10 active compounds from the high-confidence set whose predictions were to processes 270 unrelated to the cell cycle. This difference in the rate of validation between predicted active 271 compounds and negative controls (6 / 14 vs. 0 / 10) was statistically significant (p < 0.03, 272 proportion test). Two of the selected compounds were predicted to perturb "cell cycle phase," 273 one of which induced phenotypes consistent with G1 arrest (Fig 5A-C). This provided one 274 demonstration of CG-TARGET's ability to prioritize compounds that perturb a particular 275 function in the cell. 276

Inhibition of tubulin polymerization 277
Compounds that disrupt microtubules are useful for studying cell organization and division, 278 and remain promising candidates as antitumor agents [15][16][17]. We therefore chose to 279 experimentally validate our predictions in a way that might identify compounds that possess such activities. All compounds with the strongest predictions (FDR = 0%) to "tubulin complex 281 assembly" were selected for biochemical validation in an in vitro tubulin polymerization assay 282 ( Figure 5D). Similar to the previous validation, a negative control set of compounds was selected 283 to contain active compounds (process predictions with FDR ≤ 25%) whose predictions were not 284 related to microtubules or related processes. We observed that the novel compound NPD2784 285 strongly inhibited tubulin polymerization nearly as well as the drug nocodazole and more 286 strongly than the microtubule probe benomyl. In addition, the entire set of compounds predicted 287 to perturb tubulin complex assembly showed significantly increased inhibition of tubulin 288 polymerization when compared to the negative control compounds (p < 0.005, Wilcoxon 289 ranksum test). These confirmatory results showed the translation of our predictions to a specific 290 biochemical validation, even in the context of a different species. 291

DISCUSSION 292
The scaling of chemical-genetic interaction screens from tens or hundreds of compounds to 293 tens of thousands of compounds has provided the opportunity, and the necessity, to more 294 comprehensively characterize appropriate methods for interpreting the interaction profiles and 295 prioritizing high-confidence compounds. We developed a method, CG-TARGET, to address this 296 need and used it to predict perturbed biological processes for more than 13,000 interaction 297 profiles from a recent high-throughput chemical-genetic interaction screen [6]. CG-TARGET 298 demonstrated the ability to recapitulate known compound function while controlling the false 299 discovery rate, prioritizing 1522 compounds for further study. Further investigation of the 300 profiles from these high-confidence compounds revealed broad compatibility between chemical-301 genetic and genetic interaction profiles. In addition to these findings, the predictions made using 302 CG-TARGET were experimentally validated on a large scale for 67 compounds in an orthogonal 303 cell cycle assay and revealed insights into the distribution of functions perturbed by compounds 304 in large compound libraries [6]. 305 In high-throughput chemical screens, it is important to prioritize the compounds most likely 306 to demonstrate desired biological activity in further studies. While CG-TARGET and a baseline 307 approach performed similarly on the task of ranking gold-standard compound-process 308 annotations, CG-TARGET was 8 times better at prioritizing these compounds as high-confidence 309 predictions. Surprisingly, CG-TARGET outperformed the same baseline method when 310 predicting and prioritizing perturbed processes for simulated chemical-genetic interaction profiles derived from genetic interaction profiles across the genome, providing evidence for its 312 value in discovering compounds with modes of action not previously characterized in the 313 literature. This is particularly important, as our gold standard set of compound-process 314 annotations, consisting of 35 compounds across 17 biological processes (4 of which are DNA-315 related), does not enable prediction across the range of biological processes present in the cell. 316 Genetic interactions thus provide the most comprehensive reference for interpreting chemical-317 genetic interaction profiles in an unbiased, genome-wide manner. 318 While we demonstrated the ability to predict perturbed processes for compounds and 319 prioritize the highest-confidence predictions, many further steps are required to identify lead 320 compounds and ultimately develop molecular probes or even pharmaceutical agents. Perturbing a 321 biological process does not necessarily require perturbing a specific protein target, and as such,

Mapping biological processes to functional neighborhoods 389
We expanded an initial standard of 488 Gene Ontology biological process terms annotated to 390 17 functional neighborhoods [31] using a k-nearest-neighbors approach. For each previously 391 unannotated process in our set of processes, we assigned similarity scores for the 3 most similar 392 (Jaccard overlap on gene annotations) processes in the process-neighborhood standard to their 393 respective functional neighborhoods and annotated the new process to the functional 394 neighborhood with the highest sum of similarity scores. In the case of a tie, the process was 395 annotated to both functional neighborhoods. 396

397
Our method to predict biological processes perturbed by compounds is described in the 398 recent study from which the chemical-genetic interaction profiles were obtained [6]. We describe 399 here the modifications to this approach to implement the top-k enrichment method for 400 benchmarking. 401 Given a set of gene-level similarity scores for each compound and a set of gene-process 402 annotations, the enrichment of each process in the set of the top-k most similar genes for each 403 compound was computed. This is reflected in an enrichment factor (the fraction of the k selected 404 genes annotated to a particular process divided by the fraction of total genes annotated to that 405 process) and a p-value obtained using a hypergeometric test. These enrichment factors and p-406 values were substituted in place of the z-scores and p-values obtained using CG-TARGET for 407 subsequent analyses. 408

409
Performance on gold-standard compounds 410 The predicted perturbed processes for each of the gold standard compounds were sorted, first 411 by their p-value (ascending) and then by their z-score (for CG-TARGET, descending) or 412 enrichment factor (top-100 enrichment, descending), and the rank of each of their gold-standard 413 process annotations was recorded. To assess the significance of each rank, each pair of p-value 414 and z-score was assigned to a new process, the lists re-ordered, and the ranks of each 415 compound's target process re-computed. The empirical p-value for each gold-standard 416 compound-process pair was computed as the number of times the rank from the shuffled 417 processes achieved the same or better rank as the observed rank. 418

Performance on genetic interaction profiles 419
We generated a set of simulated chemical-genetic interaction profiles derived from the 420 genetic interaction profiles [6]. Each simulated chemical-genetic interaction profile was a query 421 genetic interaction profile augmented with noise sampled from a Gaussian distribution with a 422 mean of 0 and a variance for each array gene twice that of the same array gene in the genetic 423 interaction dataset. Three simulated profiles were generated based on each query gene, resulting 424 in 4515 total profiles. Because each simulated chemical-genetic interaction profile was derived 425 from a query genetic interaction profile, it inherited the gold standard process annotations from 426 its parent genetic interaction profile in subsequent benchmarking efforts. 427 We then used CG-TARGET and the top-100 enrichment method to predict perturbed 428 processes for this set of 4515 simulated chemicals x 289 deletion mutants. For each simulated 429 chemical, its top process prediction was compared to the set of inherited gold-standard process 430 annotations, counting as a true positive if the top prediction matched an existing annotation and a 431 false positive if it did not. Precision-recall curves were then generated by sorting the list of each 432 simulated chemical's top process predictions (p-value ascending, z-score or enrichment factor 433 descending) and computing the precision (true positives / (true positives + false positives)) and 434 recall (true positives) at each point in this list. 435 The set of true positive process predictions from both methods was mapped to functional 436 neighborhoods via the expanded process-neighborhood mapping ("Mapping biological processes 437 to functional neighborhoods"). The proportion of processes mapped to each neighborhood was 438 used to compute diversity via Shannon Index. 439

Analysis of process prediction drivers in chemical-genetic interaction data 440
Given a compound and a predicted process, a profile of "importance scores" describes the 441 contribution of each gene mutant that compound's process prediction. To obtain this score, a 442 Hadamard product (elementwise multiplication) is first computed between the compound's 443 genetic interaction profile and each L 2 -normalized genetic interaction profile for which the dot 444 product between them is 2 or greater and the genetic interaction profile is annotated to the 445 process of interest. The final importance profile consists of the mean of each gene's elementwise 446 products across all selected genetic interaction profiles. 447

448
Phenotypic assessment of cell cycle 449 To examine the effect of compounds on arresting cells in G2/M phase, we looked for 450 differences in budding index and cell DNA content between compounds predicted to perturb the 451 cell cycle versus negative control compounds. Nineteen compounds with high-confidence 452 predictions to cell cycle-related biological processes, 14 of them to "spindle assembly 453 checkpoint" were selected for validation, while ten compounds with predictions of false 454 discovery rate ≤ 25% to processes not mapped to the functional neighborhoods of "Cell Cycle 455 Signaling and Progression" and "Mitosis and Chromosome Segregation" (see "Mapping 456 biological processes to functional neighborhoods") were selected as bioactive negative controls. 457 Two compounds predicted to perturb "cell cycle phase" were also tested in these experiments. 458 All compounds were tested at a concentration of 10 µg/mL, which was also the concentration 459 used to obtain their chemical-genetic interaction profiles [6]. The proportions of predicted active compounds and negative controls with positive 472 phenotypic results were compared using the prop.test function in R to assess significance. 473

Tubulin polymerization assay and analysis 474
We performed in vitro tubulin polymerization assays using the Cytoskeleton fluorescent-475 based porcine tubulin polymerization assay (BK011P) following manufacturer specifications. 476 We tested the compounds at a concentration of 10 µg/ml, which was identical to the 477 concentration at which they were screened to generate their chemical-genetic interaction profiles. 478 Nine out of the 10 compounds predicted to perturb "tubulin complex assembly" with an 479 estimated false discovery rate of 0% were selected for testing in the tubulin polymerization 480 assay. Twelve compounds with predictions of false discovery rate ≤ 25% to processes not 481 mapped to the "Mitosis and Chromosome Segregation" functional neighborhood were selected 482 as bioactive negative controls. 483 We used the V max of tubulin polymerization between the tubulin-predicted compounds and 484 the negative controls to determine if the tubulin-predicted compounds inhibited polymerization 485 to a greater degree than the controls. interactions on a large scale. Chemical-genetic interaction profiles, obtained by measuring the 587 sensitivity or resistance of a library of gene mutants to a particular compound, are compared 588 against genetic interaction profiles consisting of double mutant interaction scores. The resulting 589 similarities are aggregated at the level of biological processes to predict the process(es) perturbed 590 by the compound. Better agreement between chemical-genetic and genetic interaction profiles 591 leads to stronger process predictions. 592 593 Figure 2. Rate of compound discovery and control of the false discovery rate for the 594 prediction of biological processes from chemical-genetic interaction profiles. Biological 595 processes were predicted for compounds, negative controls (DMSO), and resampled compound 596 profiles from the RIKEN and NCI/NIH/GSK datasets. (A,C,E) The number of compounds, 597       IRR1  PRS4  LST8  SMA2  TUS1  TOR1  SIM1  SOD1  CWH43  MYO5  YPS6  IRE1  BEM2  CHS5  CIS3  ACT1  GAS1  SLA1  CNB1  MID2  KRE1  CHS3   RPN4  PAR32  MEH1  SER1  SER2  RUD3  UBR2  ALG8  ALG6  ALG5  DIE2  ALG12  SCJ1  GAS1  CWH41  DFG5  FLC2  BST1  GUP1  CSF1  RGD1  RLM1  HOC1  BEM2  CAP2  ABP1  SMI1  RFM1  SUM1  UME1   NP293  NP413  NP455  NPD8049  NPE67  NPD7992  NP835 Table 1. Comparison of number of compounds discovered at selected false discovery rate thresholds for CG-TARGET vs. the best-performing enrichment method (top 100 gene target candidates). The CG-TARGET method for predicting chemical-process targets was applied to two large-scale chemical-genetic interaction screens, one of compounds from the RIKEN Natural Product Depository (RIKEN) and the other consisting of 6 chemical compound collections from the National Cancer Institute, National Institutes of Health, and GlaxoSmithKline (NCI/NIH/GSK).