Fig 1.
(a) Flowchart of our general classification approach, showing the network smoothing procedure applied to multiple data types: somatic mutations, differentially expressed genes, and protein targets for a particular drug. Smoothed mutations, differential expression, and drug targets are combined into network proximity measures by computing the element-wise minimum of the smoothed scores. Correlation is computed between LINCS expression profiles and tumor gene expression measurements. UPMC and TCGA samples are handled identically for most of the analysis pipeline until performing cross-validation: UPMC samples are used both in isolation and in combination with TCGA samples. (b) shows feature availability for data types used in this analysis.
Fig 2.
(a) Leave-one-out cross-validation results for prediction of non-response to all aromatase inhibitors, using random forests and probabilistic support vector machines. (b) Feature importance from the random forest cross-validation results, showing which constructed features contribute most to the random forest fit. Features prefixed with “Min.” denote elementwise minimum of pairs of matrices, e.g. smoothed (“sm.”) drug targets of Arimidex and smoothed binary differential expression as shown in the first feature listed. “sm. {ESR1, ESR2}” denotes network proximity to the ESR1 and ESR2 genes. Sample×gene matrices are collapsed across genes in various ways to produce feature values for samples: mean or standard deviation across all genes, or through PCA decomposition. Categorical clinical features are represented with one-hot encoding, and are shown as “feature name_column name”, e.g. “er_cell_percentage_90-99%”.
Fig 3.
(a) shows cell line growth ratios (day 5 cell count / day 0 cell count), with and without 1nM serum estrogen. (b) shows the cell line growth measure defined in Eq 6. Threshold 1.0 was used to denote cell lines as responsive (green) or non-responsive (red).
Fig 4.
Cross-validation prediction results for non-response to anastrozole.
(a) shows results for non-response to anastrozole with all available TCGA samples, and (b) shows prediction results restricted to UPMC patients.
Fig 5.
Performance comparison between multiple prediction strategies, for prediction of aromatase inhibitor non-response.