Network-guided prediction of aromatase inhibitor response in breast cancer

Prediction of response to specific cancer treatments is complicated by significant heterogeneity between tumors in terms of mutational profiles, gene expression, and clinical measures. Here we focus on the response of Estrogen Receptor (ER)+ post-menopausal breast cancer tumors to aromatase inhibitors (AI). We use a network smoothing algorithm to learn novel features that integrate several types of high throughput data and new cell line experiments. These features greatly improve the ability to predict response to AI when compared to prior methods. For a subset of the patients, for which we obtained more detailed clinical information, we can further predict response to a specific AI drug.


Cell Lines
shows information for the cell lines studied in this work.

Comparison with other methods
Reijm et al. Reijm et al. [1] describe an eight-gene classifier for prediction of aromatase inhibitor response, and present t-statistic values for association of these genes with patient response. Though there are conceptual differences between t-statistic values and logistic regression coefficients, we nonetheless can use these t-values to produce continuous predictions of patient response with log-fold gene expression data.
We extract a subset of this gene expression data E with rows as patients and columns as these eight genes, and collect the t-statistic values into a vector τ with matching ordering of genes. Note that Reijm et al. denote higher association with response as higher t-statistic values, whereas we predict non-response -we therefore use the additive inverse of their values in the vector τ . We then compute the prediction score pi for patient i as: (1)

Hofree et al., network-based stratification of tumor mutations
We apply Hofree et al.'s network-based stratification of tumor mutations [2] to this mutation data, with parameter k chosen to produce two unordered clusters of patients. We evaluate both permutations of these two clusters, and use the numeric cluster assignments c ∈ {1, 2} from the optimal clustering as patient predictions. We see that the optimal permutation of clusters is not particularly informative for either prediction of response to all aromatase inhibitors or anastrozole specifically (ROC AUC 0.5316 and 0.5630, respectively).

Leiserson et al., WExT mutational exclusivity
We also use Leiserson et al.'s WExT method [3] to identify sets of significantly mutually exclusive mutations across the entire patient cohort. Genes common to the same pathway often show mutual exclusivity in somatic mutations [4], and identification of such gene sets shows promise in de novo pathway identification [5]. We therefore treat gene sets identified by WExT with significant mutational exclusivity (P < 0.002 as reported by the tool) as potential pathways. For each of the 18 such gene sets, we compute a Boolean vector for our patient cohort, with 1 denoting that the patient has a somatic mutation in that gene set, and 0 otherwise. This produces a binary gene set membership matrix G with rows as patients and 18 columns, corresponding to these exclusive gene sets. We compute the row-wise sum of this matrix, assigning each patient a score ∈ [0, 18] equal to the number of putative de novo pathways that are mutated in that patient. We hypothesize that a higher mutational load will correspond with drug non-response, and indeed this "WExT Mutation Set Count" score is reasonably informative in prediction of non-response to anastrozole (ROC AUC 0.6212), though is less informative in the "all aromatase inhibitor" prediction task (ROC AUC 0.5509).

Wang et al., similarity network fusion
We apply Wang et al.'s similarity network fusion [6] method to our data, using a concatenation of the binary somatic mutation matrix M and the log-fold expression matrix E. We use parameters suggested in the SNF paper and package documentation: number of neighbors K = 20, affinityMatrix hyperparameter α = 0.5, SNF iterations T = 20. As in SNF example code, we perform spectral clustering on the SNF results to separate the samples into two groups. As in our usage of Hofree et al.'s NBS method, we use the group assignments as class labels, and select the optimal permutation of these class labels as response predictions for samples.  Cell Line Experiments Figure L shows our predictions of cell line non-response to serum estrogen, using classifiers trained on patient response to aromatase inhibitors. Figure M shows correlation between these predictions and the cell line growth measure defined in the main text.   Figure Q shows a consolidated ROC plot for non-response to all aromatase inhibitors, including additional variants of our analysis that are not shown in Figure 5. Figure R shows a similar ROC plot specifically for non-response to anastrozole, including only samples which were administered that drug.  TP53  CDH1  EGFR  LINC01194  BLM  RNF39  SLC2A12  TDRD12  FCAMR  AJAP1  PADI1  AIFM2  MMP13  GPX2  BRCA1  KDM4D  RPS15P2  USP29  DUSP26  TTLL5  KIAA0087  TEP1  RRM2B  TSPAN10  CYP2C19  NEIL3  C10orf90  NRF1  ZMIZ1  ZEB1  ZIC3  613037  AGBL2  DMTF1  VEZT  GREB1  ITPKC  MAP9  ZNF385A  TRIM59  CDC14B  STAT5A  TP53INP1  JMY  CDON  PLK3  TOP1MT  CABLES2  CA11