Development of a Drug-Response Modeling Framework to Identify Cell Line Derived Translational Biomarkers That Can Predict Treatment Outcome to Erlotinib or Sorafenib

Development of drug responsive biomarkers from pre-clinical data is a critical step in drug discovery, as it enables patient stratification in clinical trial design. Such translational biomarkers can be validated in early clinical trial phases and utilized as a patient inclusion parameter in later stage trials. Here we present a study on building accurate and selective drug sensitivity models for Erlotinib or Sorafenib from pre-clinical in vitro data, followed by validation of individual models on corresponding treatment arms from patient data generated in the BATTLE clinical trial. A Partial Least Squares Regression (PLSR) based modeling framework was designed and implemented, using a special splitting strategy and canonical pathways to capture robust information for model building. Erlotinib and Sorafenib predictive models could be used to identify a sub-group of patients that respond better to the corresponding treatment, and these models are specific to the corresponding drugs. The model derived signature genes reflect each drug’s known mechanism of action. Also, the models predict each drug’s potential cancer indications consistent with clinical trial results from a selection of globally normalized GEO expression datasets.


Introduction
It is well known that many cancer drugs are effective only in a subset of cancer patients, and selection of drug responsive biomarkers is crucial to find the right patients for the right drug. Also, early identification of biomarkers that can select sensitive and resistant patient subpopulations is particularly important in order to be able to test biomarker hypotheses in early clinical drug development and utilize those biomarkers during later clinical development to stratify patients into appropriate treatment arms. This will facilitate the development of companion and Sorafenib response using a 240 cancer cell line panel (OncoPanel: http://www. Eurofinspanlabs.com). IC50 values were used as dependent variables and gene expression data from untreated cells was used as independent variables to build the predictive models. A Partial Least Squares Regression (PLSR) modeling approach was utilized because it can effectively handle a high number of independent variables with minimal demands on sample size [24][25][26]. Importantly, a special splitting strategy was implemented to capture consensus features in the training dataset, followed by a pathway-based filtering step to highly reduce the signature gene set without losing model performance. In the present study, performance of the functional signature was validated to predict Erlotinib or Sorafenib patients' response and linked it to progression free survival (PFS) in lung cancer patients from the BATTLE clinical trial [27]. The in vitro-derived predictive models demonstrated significant in vivo accuracy, were mechanistically linked to the drugs' mechanism of action and were highly drug-specific.

Results
A PLSR-based modeling framework to build predictive models for drug sensitivities The main goal of this study was to test whether drug sensitivity models derived from cell line data could be used to predict patient response to the drug. The whole cell line panel was used as the training dataset, and evaluation of the model performance was carried out using gene expression data generated from tumor samples of patients treated with the same drug. No information from the testing dataset was used in training the drug sensitivity predictive models.
Erlotinib was selected as a case study target as it has clear mechanism of action as an EGFR inhibitor, and because molecular data sets coupled with patient response to treatment are publicly available (the MD Anderson BATTLE clinic trial). A 240 tumor cell line panel (Oncopanel) was used to identify cell lines that were sensitive or resistant to Erlotinib treatment (using the median IC50 value as a cutoff if the IC50 distribution is normal-like, or using a data driven cutoff value if the IC50 distribution follows a bi-model distribution). The Oncopanel contains cancer cell lines covering a variety of cancer indications (S1 Fig). For each drug against each cell line in the panel, a cell proliferation assay was conducted using 10 doses of Erlotinib (3-fold dilution) and IC50 values were generated. For model building on drug sensitivity, baseline gene expression was used as the independent variable and IC50 as the dependent variable.
A PLSR model workflow was developed and an Erlotinib sensitivity model was trained using OncoPanel data (Fig 1). The key steps shown on the left include multiple steps of data reduction, feature selection, a special splitting strategy to capture consistent features across the dataset, selection of least-overlapping top models, calculation of consensus genes weights followed by selection of the core signature gene set, and ontology enrichment filtering to obtain the pathway-based PLSR model (Fig 1A).
Data reduction. First, an RMA normalization of the baseline gene expression of the Onco-Panel cell lines was conducted. Next, an intensity cutoff of 40% of the whole genome to remove genes with low intensities was applied. Third, a variance cutoff of 1 to keep genes with the highest variability of expression among the panel cell lines was applied. This data reduction step narrowed down the number of probesets by an order of magnitude from 54,675 to 3,787.
Feature selection. Feature selection was conducted on the 3787 probesets based on correlation between gene expression level and drug response (log 2 (IC50)) for all cell lines. For each probeset, permutation testing was run by randomly assigning a response sign to the cell lines panel. After Fisher's transformation (f = 0.5 Ã ln ((1+r) / (1-r), where r is the correlation coefficient on each permutation), the permutation resulted in f-values that follow a normal distribution and a raw p-value of each gene was calculated based on the fitted mean and standard deviation from the permutation data. An adjusted p-value was calculated using Benjamini-Hochberg control of false discovery rate [28]. Feature probesets had to pass a raw p-value cutoff of 0.01. Then, the highest intensity probesets were selected as representative for each gene; typically these were also the highest variance probesets for the gene. As a result, 485 genes were selected and used as the input gene set for PLSR model training.
A splitting strategy to capture consensus features in the training dataset. In order to select the top-performing models that also capture consensus features in the dataset, a special splitting approach was developed. First, a "balanced split" was conducted to divide the data into training (70%) and balance validation (30%) subsets: we first equally divided the cell lines into three groups by IC50 values, then conducted random splits on each group to get training and validation subsets. When merging the training and validation subsets from these three groups, both training and validation subsets having some sensitive and some resistant cell lines (termed "balanced split") were generated. The training subset was then further divided into random training (60% of training) and random validation (40% of the training) subsets. Overall, the training set was divided into three parts: random training (42%), random validation (28%) and balance validation (30%) subsets ( Fig 1B). Hundreds of thousands of splits were generated, then the top models were selected by comparing model performance on both random validation and balance validation results.
Identifying the not-significant-overlapping models. The best models were selected based on multiple parameters, using both random validation and balance validation results ( Fig 1C). The selection criteria were as follows: 1. the models should have good performance on random validation on both correlation and Area Under Curve (AUC) measures (red spots in Fig 1C). The correlation is defined as Pearson correlation between experimental and predicted IC50s. The AUC measure uses a data-driven cutoff to divide cell lines into sensitive-and resistant-groups using experimental IC50s, and use a series of cutoffs on predicted IC50s to define predicted sensitive-and resistant groups, to generate the Receiver Operating Characteristic (ROC) curve and calculate AUC values. In this sense, although a regression-based modeling framework was built, some classification measures to help select top models were still used; 2. Among all splits, a correlated AUC and correlation measures on random validation (Fig 1C-left panel) were preferred; 3. Balance validation (from balanced split) should have much more narrow performance distribution than the random validation ( Fig 1C- Five top balance split resulted model sets (each contains 1000 random splits) were selected based on all five criteria out of 150 initial balance split model sets. Then, one top model with best correlation between predicted and experimental IC50 in balance validation set was selected as a top model in each of the top five model sets, resulting in 5 top models from 150 Ã 1000 = 150000 total splits. Pairwise overlapping scores among these models were then calculated (the overlapping score was defined as Jaccard similarity index), on the real random training cell line sets as well as generated an overlapping score distribution on randomly select two sets with the same size of cell lines as in the random training (a permutation test). In order to remove top models that were significantly overlapping to each other, a 90% quantile cutoff on pairwise overlapping scores was used as a significant overlapping cutoff.
Calculating consensus genes weights and selecting the core signature gene set. There are multiple potential ways to identify key biomarker genes from a single training/testing dataset, for example, several sparse PLSR approaches were developed by Chun and Keles [29,30], Lê Cao et al. [31,32], and Witten and co-workers [33], which can effectively perform internal variable selections. On the other hand, since a key goal in the current work is to capture consensus information among a pan-cancer panel of cell lines, we chose to use a forward selection procedure to identify consistent gene weights among not-significantly overlapping top models. After ranking models by the split strategy described above, genes' consensus weights were calculated using the Singular Value Decomposition method (SVD) [34] to summarize genes' loading values from each top model. Then, genes' loading values from each top model were compared against genes' consensus weightings, as well as all pairwise comparisons of genes' loadings between each pair of top models (S2 Fig). An individual top model that showed the highest similarity to the consensus weightings was selected as a representative model for later steps.
Starting from the five highest weighted genes and adding one gene at a time, the representative PLSR model was retrained/retested by increasing signature gene size one at a time (a forward selection approach). The core PLSR model was identified as an early plateau point on the model performance curves on both AUC and correlation measures (Fig 1D). The resulting core PLSR model contained 191 genes.
Generation of a pathway-based "functional" PLSR model. A subset of the core PLSR model as the pathway-based PLSR model was selected using the functional analysis platform MetaCore (Thomson Reuters) [35]. The 191 genes of the core model were subjected to enrichment analysis in the canonical pathway maps ontology. The 51 genes situated on statistically significant (p-value <0.01) pathway maps were selected as the pathway-based classifier and retested on the same OncoPanel cell lines. Importantly, the EGFR ligand (NRG1) was among the signature genes, which is consistent with the fact that Erlotinib is an anti-EGFR compound.
In the 51-gene signature (S1 Table), 22 genes correlated positively and 29 genes negatively with IC50 data. Further the genes were referred to as resistance-specific and sensitivity-specific, correspondingly. In order to understand how the Erlotinib signature genes are related to the biology of drug mechanism of action, we connected the corresponding functional signatures genes into molecular networks, using data on canonical pathways and interactions from Meta-Core (Thomson Reuters) as network building blocks (Fig 2; please see S3 Fig and S1 Text for details). The networks contained genes, whose variants are significantly associated with the corresponding drug response. Specifically, EGFR mutations and amplifications of PI3KA class A are associated with sensitivity and K-Ras mutations with resistance to Erlotinib (Fig 2A, please, see S3 Table and S2 Text for details). This is consistent with the results from the BAT-TLE study, where EGFR mutations were associated with better response to Erlotinib [27]. Causal network to depict functional relations between sensitivity-specific and resistance-specific signature genes. The network was reconstructed from canonical signaling pathways regulated by signature genes and a signature specific direct interaction network. Sensitivity-specific signature genes are highlighted with blue thermometers, resistance-specific genes with red thermometers. doi:10.1371/journal.pone.0130700.g002

Building a Sorafenib sensitivity model
To test for drug-specificity of the Erlotinib PLSR model, a predictive model to Sorafenib was built using data from Oncopanel cell lines treated with Sorafenib to generate IC50 values as well as the same baseline gene expression data and modeling framework as for Erlotinib. Therefore, the data reduction is the same for both Erlotinib and Sorafenib and the models are directly comparable.
Unlike Erlotinib's IC50 distribution (S5 Fig(A)), the Sorafenib IC50 values follow a normallike distribution (S5 Fig(B), upper panel). To better separate Sorafenib sensitive vs resistant cases, the middle one-third of the cell line panel was removed before the model training process (S5 Fig(B), lower panel). Even with this filtering, it was still more difficult to identify a good predictive Sorafenib model compared to Erlotinib. An example of the random validation and balance validation performance of the Sorafenib model (S6 Fig) demonstrates it does not perform as well as the Erlotinib models ( Fig 1C). For Sorafenib predictive model building, 903 feature genes were identified, with 550 core model genes, and eventually 113 genes for the final pathway signature (S2 Table).
A network of Sorafenib signature genes was generated using protein-protein interactions and canonical pathway information ( Fig 2B, please see S1 Text and S4 Fig for details). The networks contained genes with variants significantly associated with the corresponding drug response. The genes positively correlated with IC50s and associated with resistance, tend to populate signaling pathways parallel or cross-talking to the drug target signaling pathway (Fig 2). Activation of parallel pathways by overexpression or mutations conveys the common resistance mechanism for Sorafenib. One of the Sorafenib-resistance pathways, EGFR signaling, is represented by 4 overexpressed ligands. Moreover, different genetic events in EGFR signaling are associated with Sorafenib resistance among OncoPanel cell lines (S3 Table and S2 Text for details). Importantly, EGFR mutations were associated with worse response to Sorafenib in the BATTLE study.
Testing model performance of Erlotinib and Sorafenib predictive models using the BATTLE clinical trial as an independent testing dataset The BATTLE clinical trial data was used as an independent testing dataset, to evaluate the performance of the OncoPanel cell line data derived drug sensitivity models. In this Phase II clinical trial [27], subsets of 255 NSCLC patients were treated with either Erlotinib, Vandetanib, Sorafenib or Erlotinib + bexaroten combination. Among the 255 patients, there were 131 patients with tumor samples sufficient for molecular profiling and clinically evaluable (GSE33072). Among them, data from 25 patients in the Erlotinib arm and 39 patients in the Sorafenib arm was usable for PLSR model testing.
The baseline tumor gene expression data in patients was generated using the Affymetrix HG Gene 1.0 ST array, which is a different platform than the U133plus2 array used in the OncoPanel experiments. To address the issue of platform incompatibility, the Affymetrix "U133PlusVsHuGene_BestMatch" file was used to identify matching probesets between U133plus2 and HG Gene 1.0 ST arrays, then a quantile normalization of the BATTLE data against OncoPanel gene expression was performed. All 51 genes from the Erlotinib functional pathway-based model were present in the BATTLE dataset, as well as all 113 Sorafenib signature genes.
The cutoffs of Erlotinib or Sorafenib predicted scores were data-driven by re-predicting IC50s on the Oncopanel cell lines, using the corresponding drug sensitivity models (S7 Fig). The distribution of Sorafenib predicted IC values do not have a clear separation point, so a somewhat arbitrary cutoff of 2 was selected. This happens to be the same boundary value for experimental IC50s after removing the middle one-third cell lines for Sorafenib predictive model training, providing some supporting evidence on selecting this cutoff. Since the BAT-TLE patients' baseline gene expression was normalized against the OncoPanel gene expression using the whole genome, the cutoffs defined from the OncoPanel dataset were applied to the BATTLE dataset.
The patients' PFS cutoffs for Erlotinib and Sorafenib were selected at 2.4 and 4.2 months based on their PFS distributions, respectively (S8 Fig). We are aware that these cutoff value selections are somewhat arbitrary, since the Erlotinib or Sorafinib arm in the BATTLE clinical trials has small patient numbers (25 and 39 patients for Erlotinib and Sorafenib arms, respectively). However, as shown as red horizontal lines in S8 Fig and Fig 3, these cutoffs showed reasonable PFS separation on patients treated by Erlotinib or Sorafenib, respectively. Erlotinib model predicts Erlotinib response. As shown in Fig 3A, the Erlotinib model built from the OncoPanel Erlotinib screen data (IC50s) predicted response for the Erlotinib treated patients reasonably well. The Erlotinib model performance for accuracy, sensitivity, specificity, positive predictive value, and negative predictive value are 84%, 63%, 94%, 83% and 84%, respectively.
Erlotinib model does not predict Sorafenib response. The Erlotinib model was further tested to predict the response of Sorafenib treated patients. For these heavily pre-treated patients in the BATTLE clinical trial, both the PFS and model-predicted scores suggested that the majority of the patients are not Erlotinib sensitive. The overall accuracy of the Erlotinb model in predicting Sorafenib treatment outcome was 64% (Fig 3C). Interestingly, the Erlotinib model predicted four patients in the Sorafenib treated arm to be Erlotinib sensitive but none of them was actually sensitive to Sorafenib treatment, which corresponds to a positive predictive value of 0%. In comparison, when the Erlotinib model was used to predict Erlotinib treated patients' response, the positive predictive value was 83%.
Sorafenib model does not predict Erlotinib response. The Sorafenib model failed to predict the BATTLE Erlotinib treated patients' response, with a positive predictive value of 14% and an overall accuracy of 48% (Fig 3D). It is worth noting that even in the case of using the Sorafenib model to predict Sorafenib treated patients' response, the positive predictive value was only 53% (much lower than that of the Erlotinb model's 83%). On the other hand, the Sorafenib's model had a very high negative predictive value at 96%, so that the overall model accuracy was 79% (still slightly lower than Erlotinib model's 84%).
Putting together, the Erlotinib or Sorafenib models trained from Oncopanel data can be used to predict patients' response upon corresponding treatments. However, the models performed poorly in cross-evaluation (i.e. Sorafenib signature for predicting Erlotinib response and vice versa). This suggests that both the Erlotinib and the Sorafenib models are drug specific.
Using Erlotinib and Sorafenib drug sensitivity models to predict progression free survival (PFS) in the BATTLE clinical trial Another way to assess cell line data derived drug sensitivity models was to test the models for their ability to predict PFS. Patients in the BATTLE clinical trial were assigned to a marker positive (drug sensitive) or a marker negative (drug resistant) sub-group based on the corresponding PLSR drug predicted sensitivity scores, then marker +/-patient groups were compared based on the clinical output (PFS). When the Erlotinib model is used to stratify BATTLE patients in the Erlotinib arms, the median PFS for the model-predicted Erlotinib-sensitive patient group was 3.84 month while the PFS for model-predicted Erlotinib-resistant patients was 1.84 month, corresponding to a pvalue of 0.09 and Hazard ratio of 0.43 (95% CI, 0.16 to 1.14, Fig 4A). The median PFS for all the patients in the BATTLE trial was 1.90 month, suggesting that the Erlotinib model, indeed, selected the patient group with twice as long survival. However, the survival difference between model-predicted Erlotinib sensitive vs resistant groups is not statistical significant (p-value at 0.09 and 95% CI of Hazard ratio at 0.16 to 1.14). Since all the patients in the Erlotinb treatment arm were EGFR wild-type, it was not possible to assess the Erlotinib models ability to predict outcome relative to the EGFR mutation biomarker. On the other hand, the current gene expression based biomarker works reasonably well on separating Erltonib sensitive versus resistant patients.
When using the Sorafenib model to predict PFS of BATTLE patients in the Sorafenib arm, the model identified marker-sensitive group had a survival benefit of 2.66 months (PFS) over the marker-resistant group (Fig 4B), with a p-value of 0.006 and a Hazard-ratio of 0.32 (95% CI, 0.15 to 0.72). The median PFS survival was 4.53 and 1.87 months, for marker-sensitive and marker-resistant groups, respectively. Importantly, the signatures were drug-specific and did not predict across arms. The Erlotinib predictive model failed to separate marker-sensitive vs. marker-resistant groups for Sorafenib treatment arm and vice versa, suggesting that the predictive models are drug specific (Fig 4C and 4D). Applying drug sensitive models to predict percentage of drug sensitive samples in different cancer indications After the Erlotinib and Sorafenib sensitivity models were tested using the BATTLE clinical trial data, they were further used to predict patients' tumor samples from gene expression data in the public domain. The percentage of predicted Erlotinib or Sorafenib sensitive samples were generated based on cancer indications, and a higher percentage of predicted drug sensitivity for a cancer type was considered as a potential indication that the drug may work well on treating that type of cancer.
An internal data repository was built based on the tranSMART translational medicine platform [36] using data from public GEO datasets. As GEO microarray datasets are not uniformly processed, various methods were proposed to remove batch effects among studies (please see Lazar's recent review [22] for more information). We choose the fRMA approach [37] for globally normalizing GEO datasets for two reasons: (1). fRMA uses a "frozen" reference set that makes it very easy to incrementally normalize new datasets; (2). fRMA uses the same approach as in RMA that is used for single dataset normalization. After global normalization, samples in the GEO datasets can be merged into cancer indications (see Materials & Methods for details). In total, 484 GEO studies with 16096 samples were normalized on Affymetrix microarray platform U133plus2 and merged into various cancer indications.
Extensive manual curation and text mining were performed to standardize metadata on patient and clinic features. For the purpose of applying the drug sensitivity models on patients' cancer samples, our quality control contains the following aspects: (1). Remove cell line data; (2). Remove normal tissue samples; (3). In some solid tumor studies, the samples were actually collected from blood or saliva-we also removed data from these studies. After these quality control steps, the resulting 11331 samples was categorized into various cancer indications.
Each sample's baseline gene expression was used to predict its potential Erlotinib or Sorafenib response, and percentage of predicted drug sensitive samples were calculated upon each cancer indication. Overall, differential pattern of drug sensitive indications is generally consistent with clinical trials outcomes for Erlotinib and Sorafenib, respectively (Table 1). Lung cancer was predicted to have 15.81% of Erlotinib sensitive samples, which is consistent with the fact that Erlotinib was approved to treat lung cancer patients. On the other hand, kidney cancer was predicted to have 0.46% of Erlotinib sensitive samples, which is consistent with a recent Erlotinib Phase III clinical trial that showed single agent Erlotinib treatment failed to show benefits for kidney cancer patients. For Sorafenib, 31.76% and 24.77% of kidney and liver cancer samples were predicted to be Sorafenib sensitive, respectively (Table 1), which are consistent with FDA approval of Sorafenib on treating kidney and liver cancer patients.
Several interesting predictions on other cancer types were also observed. For example, 94.05% of head & neck samples were predicted to be Erlotinib sensitive. Interestingly, there are a number of ongoing Erlotinib trials for this cancer type (http://www.clinicaltrials.gov/). In addition, Sorafenib model predictions suggested there are a high percentage of drug sensitive samples in bone cancer and haematopoietic neoplasms (Table 1). There were 5.64% and 6.47% of breast cancer samples predicted to be sensitive to Erlotinib or Sarafenib treatment, respectively. Given the high total number of breast cancer patients, these sub-population of patients may still be clinical meaningful. On the other hand, colorectal or pancreatic cancer was predicted to have very low percentage of Erlotinib or Sorafenib sensitive cases (Table 1).

Key features of the modeling framework reported in this work
Using cell line derived drug sensitivity models to predict patients' response. Systematic differences between expression patterns in human tumors and in vitro tumor cell line models [38,39] contributes to uncertainty of predictive performance of signatures generated from one source of samples (cell lines) and applied to another source (primary tumors). This issue was addressed in a number of previous cell line based signature generation studies. For example, in one NCI-60 study, the "co-expression extrapolation" (COXEN) algorithm was developed for selecting the few genes expressed in sync between cell lines and primary tumors [3]; both tumor and cell lines expression patterns were used for deriving a multi-gene predictor. One potential limitation for the COXEN algorithm was its feature selection approach: clustering analysis was conducted on both preclinical and clinical data to select input genes for model training [3]. In contrast, our model was built entirely on cell line panel data, so the BATTLE clinical dataset can be used as an independent testing dataset.
Recently, Blumenschein and coworkers identified a Sorafenib sensitivity signature from 68 NSCLC cell lines, then applied the signature to retrospectively stratify Sorafenib sensitive patients in the BATTLE clinical trial [40]. Our Sorafenib sensitivity model has similar performance in predicting patients' response to Sorafenib. However, we built our model using a mixture of cancer cell types instead of NSCLC cell lines only. As a result, our Sorafenib model was further applied to predict potential Sorafenib sensitive cancer indications (Table 1), which was consistent with FDA approved Sorafenib indications.
A specially designed splitting strategy to help us identify consensus information for predictive model building. It was reasoned that if one can identify the consensus information within a training dataset, the resulting predictive model should be robust and have high predictive power. An important feature of our approach is a specially designed splitting strategy (division on random training, random validation and balance validation sample subsets), which The predictive models were derived from cell line Oncopanel expression data. Patient data normalization is described in the result section. doi:10.1371/journal.pone.0130700.t001 creates balance validation subsets as well as random validation subsets. The feature selection was done on the whole dataset, so that the same feature genes could be used to train and compare models among different splits. This also enabled finding the best model that represented consensus information among the whole dataset. By identifying minimally overlapping high score models and calculating consensus weighting among them, the common features in the training dataset were captured. Besides the Erlotinib and Sorafenib case studies described in the current work, the method was further validated through our participation in the NCI--DREAM challenge. The NCI-DREAM challenge was a community effort comprised of 44 teams to predict drug-sensitivity to 28 drugs [41].
Cell line data derived drug sensitivity models that accurately predict patients' PFS-a translational medicine case study A translational medicine case study consisting of generating predictive models of Erlotinib and Sorafenib sensitivity from a cell line panel dataset, followed by model validation in a clinical setting is presented. The models predict clear clinical benefits: for example, the Erlotinib drug sensitivity model separated marker-sensitive and marker-resistant patient groups with the former demonstrating twice as long survival than the latter (2 months difference with a Hazard ratio of 0.43, Fig 4A).
The signature gene set comprising the Erlotinib predictive model correctly captured Erlotinib's mechanism of action as an EGFR inhibitor (Fig 2A). Moreover, the reconstructed signaling network for the Erlotinib signature featured several highly expressed growth factors linked to Erlotinib resistance (Fig 2A, please see S3 Fig and S1 Text for details) [42,43]. Similarly, the network built for the Sorafenib signature was also clearly linked to its mechanism of action ( Fig  2B, please see S4 Fig and S1 Text for details).
The training and testing datasets and the end-points were different: the models were built from a 2D in vitro cell line panel with IC50's representing drug sensitivity and the validation study was conducted on primary tumors from BATTLE clinical trial patients with progress free survival (PFS) as the clinical end-point. Moreover, the training data (Oncopanel) covers a mixture of cancer indications, while non-small cell lung cancer (NSCLC) was the only cancer indication in the BATTLE trial. Also, the signatures were generated on the Affymetrix U133plus2 platform and tested on the data generated on Affymetrix Human Gene 1.0 ST platform. Despite all these differences, the cell line derived Erlotinib and Sorafenib sensitivity models predicted BATTLE trial PFS outcomes with high accuracy of 84% and 79%, respectively (Fig 3).
Given the fact the Erlotinib is an EGFR inhibitor while Sorafenib has multiple tyrosine kinase receptors as targets, one would expect that model specificity between Erlotinib and Sorafenib would be difficult to achieve. Strikingly, the Erlotinib and Sorafenib drug sensitivity models were drug specific, i.e. the Erlotinib model failed to predict Sorafenib patients' PFS and vice versa (Figs 3 and 4).
The drug sensitivity models were used to predict percentage of drug sensitive samples per cancer indication for a large number of cancer expression datasets in various indications. The predicted Erlotinib or Sorafenib sensitive cancer indications were generally consistent with known FDA approvals on Erlotinib or Sorafenib (Table 1). Therefore, it indicates that cell-line derived models could be used to infer potential drug sensitive cancer indications. To the best of our knowledge, this is the first report on successful application of a predictive model generated from a mixture of cancer cell line screen dataset to predict potential drug-sensitive cancer indications.

Baseline gene expression datasets and data preprocess
Among the 240 Eurofin cancer cell lines, baseline gene expression data was downloaded from 170 cell lines from ArrayExpress (E-MTAB-37). The remaining 70 cell lines gene expression data was purchased from Eurofins (https://www.eurofinspanlabs.com). For testing the Erlotinib predictive model, the BATTLE trial patients' baseline gene expression [27] (GSE33072) from the GEO database was downloaded. A RMA normalization was done on each dataset separately, using Affy package from Bioconductor [44].
Since gene expression data was generated from different microarray platforms, it is necessary to normalize them before model training/testing. A "U133PlusVsHuGene_BestMatch" file was downloaded from Affymetrix website (http://www.affymetrix.com) and was used to identify good matching probesets in Affymetrix U133plus2 array (for training dataset) and in Human Gene 1.0 ST array (for testing dataset). Based on the observations from both MAQC II project [16,45] and other reports [46,47], gene expression data from different studies and/or platforms can be first normalized and then analyzed. In the current work, a quantile-based reference-RMA normalization approach [47] was used to normalize the testing dataset against the training dataset.

Compound sensitivity screen data on OncoPanel cancer cell line panel
Erlotinib and Sorafenib sensitivity screen data were purchased from Eurofins (https://www. eurofinspanlabs.com). There are 213 cell line IC50 data on both Erlotinib and Sorafenib datasets.
Two-layers of cell line data QC were conducted that reduced the number of cell lines from 213 to 183 before model training and testing. The first layer involved curve-fitting on doseresponse data for each compound. The following criteria were used to QC each compound's dose-response curve: (1). Cell lines were removed if there were three or more missing data points on the dose-response curve; (2). Cell lines were removed if there were four or more data points with CV > 30%; (3). Cell lines where the dose-response curve displays unusual behavior (e.g. has a stronger inhibitory effect at lower concentrations then at higher concentrations) were removed; (4). Cell lines that only demonstrate inhibitory effects at the highest tested concentration were removed. The second layer of QC evaluates whether cell line screen results actually reflect a drug-treatment effect. Clustering analysis was conducted among a group of internal and standard of care compounds. Compounds that have similar mechanism of action were clustered together, suggesting that cell line screen data actually reflects real drug-treatment effect and therefore could be possible to identify drug-sensitive predictive models from cell line data (data not shown).

The PLSR based modeling framework
We designed and implemented a PLSR based modeling framework on drug-sensitivity predictive model building. More details on the PLSR method and its biological applications have been previously published [24][25][26]. We selected PLS regression from various regression approaches based on the following reasons: (1). PLS regression project both independent variables (gene expression values) and dependent variables (IC50s) in new directions to find maximum co-variance between these variables, while approaches like PCA analysis only capture variance in the independent variable space; (2). PLS regression is not subject to the multiple collinearity issue; (3). PLSR works very well on high number of independent variables vs limited number of dependent variables (number of feature genes >> number of samples), which fits very well with gene signature identification for cell line screens and clinical patient samples.
In the current work, we use the "pls" package from R (version 3.0.2) for PLS regression. The loading numbers from first and second PLS components were used to calculate gene weights as sqrt(PLS1^2 + PLS2^2). The gene weights were used to compare top splitting cases and forward feature selection steps.

Reconstruction of causal network from signature genes
To reveal functional connectivity between the genes from Sorafenib and Erlotinib predictive models, we reconstructed the causal molecular network by connecting these genes via canonical molecular pathways. The network was built in several steps.
First, in addition to the predictive models' genes, we identified "topologically significant" genes for the signature genes. The algorithm of selection of topologically-significant direct regulators, or "overconnected" genes, is described in [48] and remote regulators, or "hidden" nodes, is described in [49,50].
Next, we selected molecular pathways significantly enriched with both genes of predictive signature and topologically significant genes using enrichment analysis in the Pathway Maps ontology in MetaCore (Thomson Reuters).
Finally, the identified enriched molecular pathways were manually combined in the causal molecular network. The resulting network included both sensitivity (negative correlation of expression with IC50) and resistance-specific (positive correlation of expression with IC50) genes. The networks were further populated with direct interactions between proteins encoded by signature genes. Also, we included genes with genetic alterations correlated with the drug response phenotype in OncoPanel cancer cell line panel in the reconstructed causal networks.  Table. The 51 Erlotinib signature genes. A raw p-value of each gene was calculated during feature selection step based on the fitted mean and standard deviation from the permutation data (see Methods in the main text section). Adjusted p-value was calculated using Benjamini-Hochberg control of false discovery rate. (DOCX) S2 Table. The 113 gene Sorafenib signature genes. A raw p-value of each gene was calculated during feature selection step based on the fitted mean and standard deviation from the permutation data (see Methods in the main text section). Adjusted p-value was calculated using Benjamini-Hochberg control of false discovery rate. (DOCX) S3 Table. Association analysis of genetic events and drug response. Pearson's correlation coefficient was calculated for a vector of log2(IC50) and a vector of genetic events. Positive value of correlation for a given gene was interpreted as correlation with resistance. Negative value can was interpreted as an association with sensitivity. Correlation P-values were calculated by a permutation test with 1000 shuffles. Fisher's exact test was used for association analysis between resistance/sensitivity classes and genetic events. "R" stands for resistance, "S" stands for sensitivity. (DOCX) S1 Text. Description of causal models reconstructed from Erlotinib and Sorafenib predictive models. Description and discussion of reconstructed causal models omitted from main manuscript and shown in S3 and S4 Fig and Fig 2 in the main manuscript section. (DOCX) S2 Text. Association analysis of genetic events. Additional results and methods omitted from the main manuscript. We provide results of association analysis between genetic events and drug response supporting major observations in the manuscript. (DOCX)