Advertisement
  • Loading metrics

A regularized functional regression model enabling transcriptome-wide dosage-dependent association study of cancer drug response

  • Evanthia Koukouli ,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    e.koukouli@lancaster.ac.uk

    Affiliation Department of Mathematics and Statistics, Fylde College, Lancaster University, Bailrigg, Lancaster, UK

  • Dennis Wang,

    Roles Investigation, Validation, Visualization, Writing – review & editing

    Affiliations Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, UK, Department of Computer Science, University of Sheffield, Sheffield, UK

  • Frank Dondelinger ,

    Contributed equally to this work with: Frank Dondelinger, Juhyun Park

    Roles Conceptualization, Investigation, Methodology, Project administration, Supervision, Writing – review & editing

    Affiliation Centre for Health Informatics and Statistics, Lancaster Medical School, Lancaster University, Bailrigg, Lancaster, UK

  • Juhyun Park

    Contributed equally to this work with: Frank Dondelinger, Juhyun Park

    Roles Conceptualization, Investigation, Methodology, Project administration, Supervision, Writing – review & editing

    Affiliation Department of Mathematics and Statistics, Fylde College, Lancaster University, Bailrigg, Lancaster, UK

A regularized functional regression model enabling transcriptome-wide dosage-dependent association study of cancer drug response

  • Evanthia Koukouli, 
  • Dennis Wang, 
  • Frank Dondelinger, 
  • Juhyun Park
PLOS
x

Abstract

Cancer treatments can be highly toxic and frequently only a subset of the patient population will benefit from a given treatment. Tumour genetic makeup plays an important role in cancer drug sensitivity. We suspect that gene expression markers could be used as a decision aid for treatment selection or dosage tuning. Using in vitro cancer cell line dose-response and gene expression data from the Genomics of Drug Sensitivity in Cancer (GDSC) project, we build a dose-varying regression model. Unlike existing approaches, this allows us to estimate dosage-dependent associations with gene expression. We include the transcriptomic profiles as dose-invariant covariates into the regression model and assume that their effect varies smoothly over the dosage levels. A two-stage variable selection algorithm (variable screening followed by penalized regression) is used to identify genetic factors that are associated with drug response over the varying dosages. We evaluate the effectiveness of our method using simulation studies focusing on the choice of tuning parameters and cross-validation for predictive accuracy assessment. We further apply the model to data from five BRAF targeted compounds applied to different cancer cell lines under different dosage levels. We highlight the dosage-dependent dynamics of the associations between the selected genes and drug response, and we perform pathway enrichment analysis to show that the selected genes play an important role in pathways related to tumorigenesis and DNA damage response.

Author summary

Tumour cell lines allow scientists to test anticancer drugs in a laboratory environment. Cells are exposed to the drug in increasing concentrations, and the drug response, or amount of surviving cells, is measured. Generally, drug response is summarized via a single number such as the concentration at which 50% of the cells have died (IC50). To avoid relying on such summary measures, we adopted a functional regression approach that takes the dose-response curves as inputs, and uses them to find biomarkers of drug response. One major advantage of our approach is that it describes how the effect of a biomarker on the drug response changes with the drug dosage. This is useful for determining optimal treatment dosages and predicting drug response curves for unseen drug-cell line combinations. Our method scales to large numbers of biomarkers by using regularization and, in contrast with existing literature, selects the most informative genes by accounting for drug response at untested dosages. We demonstrate its value using data from the Genomics of Drug Sensitivity in Cancer project to identify genes whose expression is associated with drug response. We show that the selected genes recapitulate prior biological knowledge, and belong to known cancer pathways.

This is a PLOS Computational Biology Methods paper.

Introduction

Cancer is a heterogeneous disease, with individual tumours showing sometimes very different mutational and molecular profiles. The genetic makeup of a tumour influences how it reacts to a given anti-cancer drug. However, due to lack of predictive markers of tumour response, often patients with very different tumour genetic makeup will receive the same therapy, resulting in high rates of treatment failure [1]. Large clinical trials in rapidly lethal diseases are expensive, complex and often lead to failure due to lack of efficacy at a given dosage [2]. One major issue for some cancer treatments, e.g. chemotherapies, are cytotoxic effects that result in collateral damage of the healthy host tissue [3]. Patient remission depends not only on the selection of the right drug but also on the determination of the optimal dosage, especially when drugs with small therapeutic range, high toxicity levels or both are administered. Genetic factors can help fine-tune the dosage for individual patients, so that the minimal effective dosage can be delivered [4].

Treatment response in patients with specific cancers had been intensely examined in relation to the molecular characteristics of the tumours [5]. However, cellular heterogeneity within the tumour and the lack of standard metrics for quantifying drug response in patients can make it difficult to computationally model response as a function of molecular features. Cancer cell line drug screens can provide valuable information about the effect of genetic features on drug dose-response in a controlled setting. During the last decade, there have been several systematic studies that examined the relationship between genetic variants and drug response in cell lines [610]. There have also been studies that measured transcriptional profiles [11, 12] and drug response in cancer cells after administering anticancer drugs at various dosages [13, 14]. By comparing multiple genomic features of cell lines to drug response, the investigators were able to identify gene signatures for drug responsiveness in specific cancer types. However, these signatures were selected based on a single summary statistic of response, usually IC50, that may not always be the most useful metric for differentiating drugs [15], and only provides information on one dose concentration. While these existing signatures of drug response provide a way towards selecting the right drug for a patient, none of them characterize gene-dose relationships that may ultimately identify the optimal dose for a drug to use in the clinic.

With regards to the high-dimensional nature of genomic data sets, it is worth noting that highly-complex data sets with non-stationary trends are not easily amenable to analysis by classic parametric or semi-parametric mixed models. Such effects, e.g. the effect of genes on drug response over different drug dosages (dose-varying effect), can be examined using varying coefficient models which allow for the covariate effect to be varying instead of constant [16]. Methods to estimate varying covariate effects include global and local smoothing, e.g. kernel estimators [17, 18], basis approximation [19] or penalized splines [20]. Although non-parametric techniques can reduce modeling biases [21], they often suffer from the “curse of dimensionality” [22]. Inference in these models becomes impossible as the number of predictors increases, and often selecting a smaller number of important variables for inclusion into the model is clinically beneficial. Sparse regression has enabled a more flexible and computationally “inexpensive” way of choosing the best subset of predictors [23]. However, these methods cannot handle ultra-high dimensional problems without losing statistical accuracy and algorithmic stability, since they handle all of the predictors jointly. Consequently, there is a need of prior univariate tests focused on filtering out the unimportant predictors by estimating the association of each predictor to the outcome variable separately [21, 24, 25]. The advantage of using varying coefficient models along with a variable screening algorithm on genomic data sets was first introduced to explore the effect of genetic mutations on lung function [24]. Recently, Wang et al. [26] and Tansey et al. [27] independently proposed methods for modeling drug-response curves via Gaussian processes and linking them to biomarkers. In both cases, the authors did not use their models for dosage-dependent inference of biomarker effects. Additionally, the highly non-linear neural network model in Tansey et al. [27] makes interpretation of biomarker effects challenging.

Here, we extended the methodology of Chu et al. [24] to the objective of assessing the transcriptomic effect on anti-cancer drug response, where our coefficient functions were allowed to vary with dosage. We developed a functional regression framework to study the effectiveness of multiple anticancer agents applied in different cancer cell lines under different dosage levels, adjusting for the transcriptomic profiles of the cell lines under treatment. We considered a dose-varying coefficient model, along with a two-stage variable selection method in order to detect and evaluate drug-gene relationships, and then applied this method to data extracted from the Genomics for Drug Sensitivity in Cancer (GDSC) project [7]. To compare and differentiate similar treatments, we examined a case study of five BRAF targeted compounds under different dosages to almost 1000 cancer cell lines. We used baseline gene expression measurements for the cancer cell lines to investigate gene-drug response relationships for almost 18000 genes. Gene rankings were obtained based on the estimated effects of the genes on the drug response. The resulting model describes the whole dose-response curve, rather than a summary statistic of drug response (e.g. IC50), which allowed us to identify trends in the gene-drug association at untested dose concentrations.

Materials and methods

The Genomics of drug sensitivity in cancer data

Drug sensitivity data and molecular measures derived from 951 cancer cell lines used for the screening of 138 anticancer compounds were downloaded from the GDSC database (https://www.cancerrxgene.org/). We specifically focused on cell lines of cancers of epithelial, mesenchymal and haematopoietic origin treated by five BRAF targeted inhibitors (PLX-4720, Dabrafenib, HG6-64-1, SB590885 and AZ628; GDSC1 data). The maximum screening concentration for each different drug was: 10.00 uM for PLX-4720 and Dabrafenib, 5.12 uM for HG6-64-1, 5.00 uM for SB590885 and 4.00 uM for AZ628. Additionally, we used the independently generated GDSC2 data set to validate our approach on drugs targeting MEK1, MEK2 genes (Trametinib−1.00 uM; Selumetinib−10.00 uM, and; PD0325901−0.250 uM) and the PI3K/MTOR signalling pathway (Alpelisib−10.00 uM; AMG-319−10.00 uM, and; AZD8186−10.00 uM). The drug sensitivity measurement was obtained via fluorescence-based cell viability assays 72 hours after drug administration [7]. Approximately 66% of drug sensitivity responses were measured over nine dose concentrations (2-fold dilutions) and 34% were measured over five drug concentrations (4-fold dilutions). In total, we considered 3805 cancer cell line-drug combinations (experimental units). The distribution of different tissues of origin treated were similar across the different drugs tested (for additional information see S1 Fig). Paired microarray gene expression data (17737 genes) were available together with the dose-response data.

The dose-response data also included a blank response for cells on the experimental plate that had not been seeded with cells or treated with a drug. Blank responses have been used to adjust for the magnitude of the observation error while measuring the amount of cells in each plate. We used an affine transformation to the reported responses in order to normalise them within the drug concentration interval, 0 (0% of the maximum dosage) to 1 (100% of the maximum dosage). In particular, for the normalising procedure, we have used the formula: (1) where Rij is the response of the ith experimental unit at the jth dosage level, CRi is the response under no drug administration (zero dose, ni = 1), BRi is the blank response of the ith experimental unit as described above and NRij is the new score taken from the transformation, i = 1, …, 3805, j = 1, …, ni.

A two-stage algorithm for identification of gene-drug associations

Non-parametric techniques are a great tool for reducing modeling bias and producing data driven inference. However, flexible modeling techniques applied on high-dimensional genomic data sets can often cause real problems in statistical inference. Sparse regression techniques, such as the LASSO, can be used as dimensionality reduction techniques, but cannot handle ultra-high dimensional problems without introducing statistical inaccuracies, algorithmic instability and a huge computational burden [23]. Hence, the need for a feature screening algorithm which will marginally filter unimportant variables becomes essential. Below we further explain the two-stage algorithm that has been built in order to detect and explore dose-dependent gene-response associations.

Let the repeated measures data {(dij, yij, zi, xi):j = 1, …, ni, i = 1, …, n}, where yij is the response of the ith experimental unit (corresponds to a drug sensitivity assay of a specific drug on a specific cell line) at the jth drug dosage level dij and zi along with xi are the corresponding vectors of scalar (dose-invariant) covariates. The covariate vector zi = (1, zi1, …, zip)T is a low-dimensional vector of predictors that should be included in the model, whereas xi = (xi1, xi2, …, xiG)T is a high-dimensional vector, i.e. 17737 gene expression measurements, that needs to be screened. We assumed that only a small number of x-variables (in our case, genes) are truly associated with the response while most of them are expected to be irrelevant (sparsity assumption).

To explore potential dose-varying effects between the covariates and the drug response, we consider the following varying coefficient model: (2) where {βk(⋅), k = 0, …, p} and {γg(⋅), g = 1, …, G} are smooth functions of dosage level , where is a closed and bounded interval of . The errors εij were assumed to be independent across subjects and potentially dependent within the same subject with conditional mean equal to zero and variance Var(ε) = σ2(d) = V(d).

Methods for estimating the coefficient functions in Eq (2) include local and global smoothing methods, such as kernel smoothing, local polynomial smoothing, basis approximation smoothing etc. For computational convenience, in this application we used basis approximation smoothing via B-splines.

Let the sets of basis functions {Blk(⋅):l = 1, …, Lk} and and constants {ζlk: l = 1, …, Lk} and {ηlg: l = 1, …, Lg} where k = 0, …, p and g = 1, …, G such that, , βk(d) and γg(d) can be approximated by the expansion (3) (4) Substituting βk(⋅) and γg(⋅) of Eq (2) with Eqs (3) and (4), we approximated Eq (2) by (5) If Bk(⋅) and are groups of B-spline basis functions of degree qk and qg respectively, and and are the corresponding knots, then Lk = Kk + qk and Lg = Kg + qg.

Using the approximation Eq (5), the coefficients ζ = (ζ0, ζ1, …, ζp)T and η = (η1, η2, …, ηG)T can be estimated by minimizing the squared error (6) where wij are known non-negative weights.

In cases where p + G >>n though, minimisation of Eq (6) is infeasible. Our aim was to identify factors of the covariate vector x = (x1, x2, …, xG)T (genes) that are truly associated with the response (cancer cell line sensitivity to the drug). In addition, we wanted to explore potential dose-varying effects on the drug response.

We make the following sparsity assumption: any valid solution will have for the majority of components g. To detect non-zero coefficient functions, we applied a two-stage approach which incorporated a variable screening step and a further variable selection step.

Screening.

The sparsity assumption applies only to components of x, the high-dimensional covariate vector in Eq (2).

Let the set of indices (7) where ‖⋅‖2 is the L2-norm. In order to rank the different components of x, we fitted the marginal non-parametric regression model for the gth x-predictor: (8) where: and are sets of coefficient functions; and are constants to be estimated, k = 0, …, p; and, ε(g) is the error term similar to Eq (5). We then computed the following weighted mean squared error for each g ∈ {1, …, G}, (9) to quantify the importance of the gth variable. Here, (10) where is the ni × ni diagonal matrix consisting of the dose-varying variance (11) and Ri(ϕ) = (Rjk) the ni × ni working correlation matrix for the ith subject. By ϕ, we denoted the s × 1 vector that fully characterizes the correlation structure. The estimate of ϕ, , was obtained by taking the moment estimators for the parameters ϕ in the correlation structure based on the residuals obtained from fitting the following model (12) The variance function V(d) in Eq (11) was estimated using techniques described in [24].

After having obtained {}, we sorted gene utilities in an increasing order, where smaller values indicate stronger marginal associations. The x-predictors included in the screened submodel are, then, given by (13) where τn(ν) corresponds to the size of the submodel which is chosen to be smaller than the sample size n.

Variable selection using a group SCAD (gSCAD) penalty.

Screening algorithms aim to discard all unimportant variables but tend to be conservative. In order to preserve only the most important x-predictors in the final model, we considered a model including the first τn(ν) outranked genes and we applied a gSCAD penalty by minimising the following criterion: (14) (15) where α is a scale parameter, λ controls for the penalty size and ‖⋅‖ is the Euclidean ℓ2-norm. At this point, note that grouping is applied for the coefficients ηg that correspond to the same coefficient function. In addition, in order to reduce the bias introduced when applying a LASSO penalty, we alternatively chose the SCAD, which coincides with the LASSO until u = λ, then transits to a quadratic function until u = αλ and then it remains constant ∀u > αλ, meaning that it retains the penalization and bias rates of the LASSO for small coefficients but at the same time relaxes the rate of penalization as the absolute value of the coefficients increases. In Fig 1 the reader can find a brief overview of the employed methodology.

thumbnail
Fig 1. The two-stage algorithm for identifying dose-dependent associations between genes and drugs.

Gene expression and drug response data from a drug screening study (e.g. GDSC) are used to fit our dose-varying coefficients model to estimate the dose-varying effect between covariates and drug response. A two-stage variable screening and selection algorithm is applied to rank gene-drug associations. The selected genes can then be used to predict dose-dependent response for the drugs of interest.

https://doi.org/10.1371/journal.pcbi.1008066.g001

Tuning parameter selection

We used knots placed at the median of the observed data values along with cubic B-splines with 1 interior knot. The suitable number of interior knots was calculated using the formula proposed and applied by [19, 28, 29]. Due to the computational burden this would add, we did not apply cross-validation.

As for the screening threshold τn, its magnitude could be determined by the fraction , ν ∈ {1, 2, 3, …}. We conducted a pilot simulation study in order to decide the most appropriate size (for further details see S1 Text). We also considered an automated algorithm for its selection (Greedy Iterative Non-parametric Independence Screening-Greedy INIS, [21]). Finally, the penalty size for the gSCAD step λ was determined using a 5-fold cross-validation.

Simulation study

Monte Carlo simulations were conducted to examine the ability of our model to detect the genes that are truly associated with the drug response. This had a key role in tuning model parameters and simultaneously assessing model goodness-of-fit using a fraction of the original data set in order to reduce the computational burden of conducting a simulation study under the original dimensions of the data. Responses over different dosage levels were generated based on a subset of genes, the corresponding low-dimensional GDSC data covariates (drug and cancer type) and some prespecified smooth coefficient functions (see S1 Text). In particular, we repeatedly sampled without replacement 190 experimental units and 886 genes based on which the simulated responses have been generated. The performance of the employed methodology has been assessed based on 1000 simulations using three screening thresholds ( and τn(ν) chosen using the greedy-INIS algorithm [21]) and two estimated covariance structure scenarios (independence and rational quadratic covariance structure). Cubic B-splines and knots placed at the median of the observed data values have been used for estimating the coefficient functions.

To evaluate the performance of the proposed procedure we used the following summary measures: TP−number of genes correctly identified as active; FP−number of the genes incorrectly identified as active; TN−number of the genes correctly identified as inactive; FN−number of the genes incorrectly identified as inactive.

Simulation results suggested that our method accurately detects the drug associated genes from the simulated responses under most of the examined scenarios (S1 Text). A screening threshold of size and regression weights adjusted for the covariance structure of the data were identified as the scenario where our method reached its maximum accuracy. Consequently, for the GDSC application, we chose the screening threshold to be the maximum possible, i.e. 923 genes derived from the formula , and weights derived by assuming a rational quadratic covariance structure for the repeated measures.

Software availability

The analysis has been conducted using R version 3.6.3. Code for applying the two-stage variable selection algorithm is available online as an R package at https://github.com/koukoulEv/fbioSelect.

Results and discussion

Dose-dependent associations with gene expression in a large-scale drug sensitivity assay

We applied the two stage variable selection algorithm under the dose-varying coefficient model framework described above. Gene rankings and predicted mean drug effects over different dosage levels were obtained. Our algorithm identified 230 candidate genes associated with drug response. The effect of each of those genes was assessed with respect to:

  1. the area under the estimated coefficient curve (AUC) and its corresponding standard deviation (estimated using bootstrapping);
  2. the effect on cell survival (overall positive, overall negative, mixed);
  3. Spearman correlation between the coefficient function value and the dosage level;
  4. the mean fold change of the expression of cell lines carrying BRAF mutations with respect to wild type; and,
  5. the protein-protein interaction network distance between the BRAF gene and the selected genes using the Omnipath database [30].

The 230 genes were ranked based on the estimated AUC value (S1 Table), and the top 30 genes were highlighted for further analysis (Table 1). The higher the AUC, the larger the effect of the gene on the drug response. The overall effect on cell survival can be either positive, negative or vary over the different dosage levels as determined by the range of the estimated coefficient function. Spearman’s rank correlation was used as an indicator of the coefficient function’s monotonicity by characterising the progress of the genetic effect over different dosage levels. For instance, high expression of the C3orf58 gene at baseline has a positive effect on cancer cell survival, which becomes stronger as the dosage increases (Spearman correlation = 0.922). In other words, high expression of this gene can be an indicator of drug resistant cell lines. On the other hand, the DLC1 gene has an overall decreasing and negative effect on cancer cell survival (Spearman correlation = -0.928) which suggested that as the dosage increases, higher baseline expression of this gene can indicate higher drug sensitivity at higher dosage. Elevated expression of DLC1 has been observed in melanoma and is a well known tumour suppressor that could be a novel marker of BRAF inhibition [31]. Finally, in cases where the overall effect varies (changes between positive and negative), the effect of gene expression on the drug response depends on the drug dosage. In particular, the effect of DLX6 increases and then decreases at higher dosages (Fig 2). Given the biological and technical variation in drug screens, we should treat the mean effect estimates with caution and consider the confidence intervals of the coefficient functions in order to derive conclusions about the exact effect of the selected genes on the dose response (Fig 2).

thumbnail
Table 1. Top 30 gene rankings based on the estimated area under the coefficient function curve.

https://doi.org/10.1371/journal.pcbi.1008066.t001

thumbnail
Fig 2. Estimated coefficient functions for the low-dimensional predictors and three of the selected genes.

Estimated coefficient functions for the intercept, different drugs, tissue of origin and three of the selected genes along with 95% bootstrap confidence intervals. Baseline corresponds to BRAF mutant cell lines treated with Dabrafenib in skin tumours.

https://doi.org/10.1371/journal.pcbi.1008066.g002

Coefficient function estimates provide a lot of information about the dosage, cancer type and genetic effects on drug response. Fig 2 illustrates the estimated coefficient functions for different drugs, cancer types and three genes in relation to the model intercept, Dabrafenib response in BRAF mutant cell lines originating from the skin (melanoma). Except from HG6-64-1, all other BRAF inhibitors (AZ628, SB590885 and PLX4720) showed no additional effect compared to the intercept. Similar patterns can be observed for cancer cell lines coming from most of the tissues examined. This result indicates that the examined drugs may have similar or worse behaviour over the different dosages for most of the examined cancer types. We observed greater efficacy (negative values of the coefficient function) for cell lines originating from the endocrine system, autonomic ganglia and heamatopoietic and lymphoid tissues at lower dosages. The observed effect in endocrine system cell lines reflects the Dabrafenib responses observed in anaplastic thyroid cancer patients [32]. Interestingly, the drug Trametinib, taken in combination with Dabrafenib is a MEK inhibitor, and genes interacting with MEK (MAP2K1) were selected features from our model (Fig 3A). Together these results provide important insights into the effectiveness of the five BRAF targeted drugs examined on different cancer types, highlighting the potential for effective treatment of a wide range of cancers given the tumour genetic characteristics.

thumbnail
Fig 3. Protein-protein interaction network for the genes selected from the two-stage variable selection algorithm.

(A) Undirected protein-protein interaction network between the 230 selected (blue) and the BRAF (red) genes (full scale analysis). (B) Undirected protein-protein interaction network between the 65 genes selected from the two-stage variable selection algorithm for the cell lines resistant to BRAF inhibitors (blue) and the BRAF (red) gene. In both panels genes depicted with black are the interaction mediators. Common mediators include the HRAS, MAPK1, MAP2K1 and BAD genes.

https://doi.org/10.1371/journal.pcbi.1008066.g003

Since the BRAF gene is the target of the drugs, mean fold change and protein-protein interaction network distance were used to examine whether and how the selected genes are related to the target of the inhibition. From the selected genes, 120 genes had a mean fold change greater than 1 whereas the rest had a mean fold change between 1 and 0.792. Some of the genes with the highest mean fold change of BRAF mutation were PSMC3IP, KIF3C, UBE2Q2, SERPIND1 and PLAT, however only PLAT is displayed in Table 1. From the genes identified through the two-stage algorithm, 35% of them encode proteins interacting with the BRAF gene, though none of them directly. Most of the selected genes interact with the BRAF gene via pathways mediated by HRAS, MAPK1 (ERK), MAP2K1 (MEK) and BAD (Fig 3A).

Since HRAS mutations are frequent in patients receiving BRAF targeted therapies [33], we examined the mean estimated trajectory over different dosages under treatment with BRAF inhibitors tested in six cancer cell lines with and without BRAF and HRAS mutations (Fig 4). As stated previously, we observed that in most cases HG6-64-1 seems to be the most effective drug. The estimated coefficient functions facilitate drug examination and response prediction under the different dosages. In some instances, we observed different drugs having similar behaviour for lower drug dosages and larger divergence for higher dosages. In most cases, regardless of the cell line origin, our method successfully estimates the expected survival rates of the cancer cell lines for the different drugs given their gene expression information.

thumbnail
Fig 4. Estimated mean drug response trajectories for six cancer cell lines with BRAF and HRAS mutations.

Observed responses (points) and estimated mean trajectory (lines) of cell concentration for cancer cell lines with and without BRAF and HRAS mutations after treatment with the five anticancer compounds examined.

https://doi.org/10.1371/journal.pcbi.1008066.g004

For validation purposes, we performed the same analysis using the independently-generated GDSC2 data set, with a different set of drugs. Note that the drug set in GDSC2 only partially overlaps with the one used in GDSC1. The results are reported in supplementary S2 Fig, and show similar properties to the analysis of the BRAF drugs in Fig 4. As the GDSC2 dose-responses are produced from independently-generated experiments, the measured drug response is different for some of the drug-cell line combinations. We observe some divergences between GDSC1 and GDCS2 estimated trajectories for Dabrafenib and PLX-4720, which can be explained due to measurement error in the GDSC experiments themselves.

Variable selection algorithm identifies cancer pathways associated with BRAF inhibitor response

Using our functional regression approach, we identified 230 genes that were selected via the SCAD step (observed gene set). We used the Enrichr [34, 35] and WikiPathways [36] databases to see if the selected genes can be grouped into common functional classes or pathways. In total, 183 were identified, of which 11 were statistically significant at 5% level, including apoptosis modulation, NOTCH1 regulation, and MAPK signaling (S2 Table). The model identified genes (IKBKB, RASGRF1, DUSP16, DUSP8, DUSP6, MAPT and IL1R2) downstream of the MAPK signaling pathway targeted by BRAF inhibitors.

Previous studies of these pathways have found associations with tumorigenesis and cancer treatment [3740]. Genes in more than one of these pathways include IKBKB, PLAT, IL1R2 and PDPK1. The IKB kinase composed of IKBKB had previously been suggested as a marker of sensitivity for combination therapy with BRAF inhibitors [41]. Taken together, these results suggest that the identified associations between the drug response and the observed genes may reveal new predictive markers of tumour response to the examined BRAF inhibitors.

In addition to the pathway enrichment analysis, we used the Molecular Signatures Database (MSigDB database v7.0 updated August 2019: [42]) to compute overlaps between the observed gene set and known oncogenic gene sets. Fig 5A and S3 Table display the 29 overlaps found. Interestingly, we identified three instances where the observed gene set significantly overlapped with gene sets over-expressing an oncogenic form of the KRAS gene.

thumbnail
Fig 5. Overlaps between the observed gene set and oncogenic signatures in the Molecular Signatures Database (full data analysis); signalling pathways enriched for genes predictive of BRAF inhibitor response (resistant cell lines).

(A) Full gene set names can be found in S3 Table. Overlaps have been detected using gene set enrichment analysis performed using a hypergeometric distribution. The false discovery rate analog of the hypergeometric p-value is displayed after correction for multiple hypothesis testing according to Benjamini and Hochberg [43]. (B) Top 20 enriched signalling pathways along with the adjusted p-values and the number of overlapping genes obtained after pathway enrichment analysis to the resistant cell line analysis results (for the full list of the pathways identified see S6 Table).

https://doi.org/10.1371/journal.pcbi.1008066.g005

We further explored potential biologically relevant pathways using the Reactome database [44, 45]. More than 40 enriched pathways were identified at a 5% significance level. The top 40 pathways are depicted in Fig 6 along with the pathway-gene network of the top 5 pathways (for the full list, see S4 Table). Interestingly, axon guidance and VEGF signalling were among the enriched pathways, confirming relevance of the selected genes to the intended role of the examined compounds, since BRAF kinase activity drives axon growth in the central nervous system [46] and VEGF blockade has potential anti-tumour effects when combined with BRAF inhibitors [47]. Note that axon growth has not, thus far, been directly implicated in BRAF inhibitor activity, and, consequently, our analysis provides important evidence towards this theory.

thumbnail
Fig 6. Pathway enrichment analysis using the Reactome database.

(A) Top 40 enriched signalling pathways along with the adjusted p-values. For the full list, see S4 Table. (B) Pathway-gene network of the top 5 enriched signalling pathways as found using Reactome [44, 45].

https://doi.org/10.1371/journal.pcbi.1008066.g006

Identifying dose-dependent genes in drug-resistance conditions

Acquired resistance to BRAF inhibitors is often observed in the clinic [48]. To further examine the utility of the employed methodology, we applied the variable selection algorithm to a data subset containing only cell lines with mutations activating resistance mechanisms to BRAF inhibitors [49]. Out of the 951 cell lines in the data, 191 had some mutation in any of the following: RAC1 gene, NRAS gene, cnaPANCAN44 or cnaPANCAN315. We identified 65 genes associated with dose-response, though none of them were directly associated with the MAPK/ERK pathway. However, from these, 25 genes have been found to indirectly interact with the BRAF gene (Fig 3B) and 21 to overlap with three oncogenic gene sets in the Molecular Signatures Database (genes down-regulated in NCI-60 panel of cell lines with mutated TP53; genes up-regulated in Sez-4 cells (T lymphocyte) that were first starved of IL2 and then stimulated with IL21, and; genes down-regulated in mouse fibroblasts over-expressing E2F1 gene; S5 Table). Finally, we found 34 pathways enriched for genes predicting drug response of the mutated cell lines to the examined BRAF inhibitors, of which the top 20 are depicted in Fig 5B (for the full list of the pathways identified see S6 Table).

Table 2 presents gene rankings based on the AUC and the overall coefficient function effect (sign) for the 42 genes in either the enriched pathways, the three oncogenic gene sets discussed above or the protein-protein interaction network with the BRAF gene (full list available in S7 Table). Eight of the selected genes in the current implementation were also selected from the algorithm implemented on the full data: ASB9, PRSS33, GJA3, PLAT, KLF9, BFSP1, MTARC1 and UCN2.

thumbnail
Table 2. Rankings of the genes identified from the pathway and oncogenic gene set enrichment analysis.

https://doi.org/10.1371/journal.pcbi.1008066.t002

Predictive performance of dose-dependent models

As discussed above, the employed methodology gives a good overview of the baseline genetic effect on drug response. We assessed the overall predictive performance of our method using 10-fold cross validation under two different scenarios. For the first, we split the data into training and test set holding out the experimental units (cancer cell line-drug combinations) and for the second, holding out cancer cell lines. The absolute mean error for both cases was around 0.12. Our analysis showed robust cross-validated performance when it comes to predicting sensitivity to the administered drugs, as shown in S3 Fig which displays the correlation between predicted and true response. This result was further validated by repeating the analysis on the independently-generated GDSC2 data set, using a different set of drugs (S4 Fig), which demonstrated comparable predictive performance.

Predictive accuracy for the dose-response curves was evaluated under four different sub-scenarios: prediction of the most effective drug-dosage combination for the 951 cell lines in the data set; prediction of the most effective drug given a cell line; prediction of the most effective dosage given treatment with a particular drug and prediction of the most effective dosage range given treatment with a drug (Table 3). The proposed model performs well when it comes to predicting the most effective drug or dosage range (≈79% in both scenarios). Results are less reliable when it comes to prediction of the exact dosage or drug-dosage combination (≈48-49% and ≈57-58% in both scenarios) but this can be due to either the large variability observed in the observed responses or due to the small number of cell lines for some predictor level combinations. Results were similar for both cross-validation scenarios (differences range from 0 to <2%, Table 3), meaning that as long as a cell line has similar genetic characteristics to those observed, the model can be reliable in predicting the outcome after anticancer drug administration.

thumbnail
Table 3. Predictive performance of the employed model (mean absolute error = 0.121).

https://doi.org/10.1371/journal.pcbi.1008066.t003

We additionally compared the performance of our two-stage algorithm approach to a penalized linear (LASSO) regression for predicting the IC50 and area under the dose-response curve (AUC). Note that our functional regression model is not directly predicting either of these values, but rather predicts the full drug-response curve. As this is a harder problem, we would expect the LASSO to have a natural advantage; however, our method has the added benefit of being able to detect dose-dependent associations, which is not possible when predicting summary statistics of the dose-response curve directly. We employed 10-fold cross-validation to evaluate the predictive error in terms of root mean squared error (RMSE), and we used a sigmoid curve fit for estimating the IC50 values from the predicted dose-response curves with our two-stage method. Our method outperformed standard LASSO in terms of predicting the AUC (RMSE2−stage = 0.176; RMSELASSO = 0.347) and performed well on predicting the IC50, although the LASSO performed better (RMSE2−stage = 1.969; RMSELASSO = 1.134). This could be expected, as estimating the IC50 from the predicted dose-response curve adds a further level of complexity, compared to directly predicting this value using the LASSO.

Conclusion

Genetic alternations and gene expression in tumours are known to affect disease progression and response to treatment. Here, we studied dosage-dependent associations between gene expression and drug response, using a functional regression approach which adjusts for genetic factors. We analysed data from the Genomics of Drug Sensitivity in Cancer project relating to drug effectiveness for suspending cancer cell proliferation under different dosages, and examined five BRAF targeted inhibitors, each applied in a number of common and rare types of cancer cell lines. Our implementation of a two-stage screening algorithm revealed a number of genes that are potentially associated with drug response. Gene, drug and cancer type trajectories have been modeled using a varying coefficient modeling framework. The proposed methodology allows for dose-dependent analysis of genetic associations with drug response data. It enables us to study the effect of different drugs simultaneously, which results in high accuracy of drug response prediction. Drug comparisons using the proposed methodology could support drug repositioning, especially in diseases where existing treatment options are limited. In addition, our methodology can help to reveal unknown potential relationships between genetic characteristics and drug efficacy. Hence, the good predictive performance of our method could be due to the fact that some genes may act as proxies for unmeasured phenotypes that are directly relevant to drug sensitivity.

Our work relies on two major assumptions. First, that out of tens of thousands genes regulating protein composition only a small proportion is actually associated with cancer cell survival in a dosage-dependent manner. In other words, transcriptomic profiles exert influence on disease progress after drug administration in a sparse and dynamic way. However, if a large number of genes is associated with the drug response, our method may produce biased results, and some important information about the biological mechanisms can be lost. Secondly, we assume that the different drugs are comparable on the scale of maximum dosage percentage level for our joint model. We acknowledge that different drugs have different chemical structure and maximum screening concentrations. Our focus is to identify genetic components that could be informative for dose response given drugs that belong to a particular family, for example BRAF targeted therapies. However, our methodology is flexible enough to allow each drug to be examined separately if it appears to be clinically appropriate.

Drug response prediction from gene expression data has been widely studied in the literature. Sparse regression methods, gene selection algorithms such as the Ping-pong algorithm [50], or a combination of network analysis and penalized regression, e.g. the sparse network-regularized partial least squares method [51], have all been employed to simultaneously predict drug response and select genetic factors that seem to be associated with the drug response. However, none of these methods are able to quantify the effect of drug dosage on the response, which is one of the main contributions of this work. Employing the proposed dose-varying model gives a detailed picture of different drug effects and can be extremely valuable in predicting drug response for agents with small therapeutic range and high toxicity levels. Our algorithm showed moderate predictive performance due to the complexity of predicting whole drug-response curves. Methods for further enhancing the performance of the proposed methodology, such as judicious use of prior information and leveraging information sharing across multiple data sources should be explored in the future in order to overcome this issue and make good use of its full potentials.

To conclude, the main purpose of this paper is to examine the dose-dependent associations between genes and drugs. The proposed methodology, by using the raw data to infer the effects of interest, allows to obtain a more comprehensive picture of the biological mechanisms that undergo cancer treatment and the role of drug dose on that. In addition, due to its simple structure, it allows extension to different types of molecular data (e.g. RNA-seq gene expression, methylation or mutational profiles) and enrichment with further information, such as drug chemical composition.

Supporting information

S1 Text. Accurate detection of drug associated genes from simulated responses.

Simulated responses have been generated to examine the accuracy of the employed method in detecting the genes that are truly associated to drug response. Three screening thresholds, three active gene sets and two covariance structure scenarios for the repeated measurements simulation have been considered. This text includes all the details of the simulation study that we conducted.

https://doi.org/10.1371/journal.pcbi.1008066.s001

(PDF)

S1 Fig. Distribution of tissue of origin across the five BRAF compounds used for cell line screening in the Genomics of Drug Sensitivity in Cancer data.

Overall, similar proportion of cell lines have been treated with all of the compounds examined with smaller number of cell lines been treated with AZ628, Dabrafenib and PLX-4720. Larger number of cell lines in the data set were originated from the lungs, the gastrointestinal tract and the haematopoietic and lymphoid tissues.

https://doi.org/10.1371/journal.pcbi.1008066.s002

(TIF)

S2 Fig. Estimated mean drug response trajectories for BRAF and HRAS mutated and non-mutated cancer cell lines: analysis performed on GDSC2 data.

Observed responses (points) and estimated mean trajectory (lines) of cell concentration for cancer cell lines with and without BRAF and HRAS mutations after treatment with the eight anticancer compounds examined using data from GDSC2.

https://doi.org/10.1371/journal.pcbi.1008066.s003

(TIF)

S3 Fig. Prediction accuracy for each different drug and scenario.

Pearson correlation was estimated across observed and predicted AUC values. AUC values have been computed by calculating the area under the coefficient function curve (both observed and predicted). Training and test sets have been considered based on either the experimental units or on cancer cell lines only.

https://doi.org/10.1371/journal.pcbi.1008066.s004

(TIF)

S4 Fig. Prediction accuracy for each different drug and scenario: analysis performed on GDSC2 data.

Pearson correlation across observed and predicted AUC values. AUC values have been computed by calculating the area under the coefficient function curve (both observed and predicted) using the GDSC2 data. Training and test sets have been considered based on either the experimental units or on cancer cell lines only.

https://doi.org/10.1371/journal.pcbi.1008066.s005

(TIF)

S1 Table. Full gene rankings based on the estimated area under the coefficient function curve (analysis on the full data set).

Gene rankings of all selected genes based on the magnitude of the genetic effect on drug response. A positive (+) sign translates to a positive effect on cells survival after drug administration, a negative (-) sign translates to a negative effect on cells survival and a mixed (0) effect translates to a varying effect on cells survival which depends on drug dosage. Spearman’s correlation is calculated between drug dosage and gene estimated coefficient function values as an indicator of the magnitude change of the gene effect over the increasing dosage. Area corresponds to the area under the estimated coefficient curve and the SD corresponds to the standard deviation of the area based on bootstrapping. Mean fold change is calculated between the selected gene expression values of the cell lines carrying BRAF mutations with respect to wild type. Protein-protein interaction network distance is computed based on the shortest interaction path between the BRAF gene and each of the selected genes. Here, NI denotes absence of any interaction.

https://doi.org/10.1371/journal.pcbi.1008066.s006

(XLSX)

S2 Table. Signalling pathways linked to genes predictive of BRAF inhibitor response (analysis on the full data set).

Signalling pathways along with the adjusted p-values and the number of overlapping genes obtained after pathway enrichment analysis applied to the full scale analysis results.

https://doi.org/10.1371/journal.pcbi.1008066.s007

(XLSX)

S3 Table. Overlaps between the observed gene set and oncogenic signatures in the Molecular Signatures database (analysis on the full data set).

Overlaps have been detected using gene set enrichment analysis performed using a hypergeometric distribution. The false discovery rate analog of the hypergeometric p-value is displayed after correction for multiple hypothesis testing according to Benjamini and Hochberg.

https://doi.org/10.1371/journal.pcbi.1008066.s008

(XLSX)

S4 Table. Signalling pathways linked to genes predictive of BRAF inhibitor response (analysis on the full data set)-Reactome.

Signalling pathways along with the adjusted p-values and the number of overlapping genes obtained after pathway enrichment analysis applied to the full scale analysis results using the Reactome database.

https://doi.org/10.1371/journal.pcbi.1008066.s009

(XLSX)

S5 Table. Overlaps between the observed gene set and oncogenic signatures in the Molecular Signatures database (resistant cell lines analysis).

Overlaps have been detected using gene set enrichment analysis performed using a hypergeometric distribution. The false discovery rate analog of the hypergeometric p-value is displayed after correction for multiple hypothesis testing according to Benjamini and Hochberg.

https://doi.org/10.1371/journal.pcbi.1008066.s010

(XLSX)

S6 Table. Signalling pathways linked to genes predictive of BRAF inhibitor response (analysis on resistant cell lines).

Signalling pathways along with the adjusted p-values and the number of overlapping genes obtained after pathway enrichment analysis applied to the resistant cell line analysis results.

https://doi.org/10.1371/journal.pcbi.1008066.s011

(XLSX)

S7 Table. Full gene rankings based on the estimated area under the coefficient function curve (analysis on resistant cell lines).

Gene rankings of all selected genes based on the magnitude of the genetic effect on drug response. A positive (+) sign translates to a positive effect on cells survival after drug administration, a negative (-) sign translates to a negative effect on cells survival and a mixed (0) effect translates to a varying effect on cells survival which depends on drug dosage. Spearman’s correlation is calculated between drug dosage and gene estimated coefficient function values as an indicator of the magnitude change of the gene effect over the increasing dosage. Area corresponds to the area under the estimated coefficient curve and the SD corresponds to the standard deviation of the area based on bootstrapping. Mean fold change is calculated between the selected gene expression values of the cell lines carrying BRAF mutations with respect to wild type. Protein-protein interaction network distance is computed based on the shortest interaction path between the BRAF gene and each of the selected genes. Here, NI denotes absence of any interaction.

https://doi.org/10.1371/journal.pcbi.1008066.s012

(XLSX)

Acknowledgments

We acknowledge Emily Chambers and the Sheffield Bioinformatics Core Facility for their assistance in GDSC2 data pre-processing. Their input to this study is greatly appreciated.

References

  1. 1. Chang JC, Wooten EC, Tsimelzon A, Hilsenbeck SG, Gutierrez MC, Elledge R, et al. Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. The Lancet. 2003;362(9381):362–369. pmid:12907009
  2. 2. Cook D, Brown D, Alexander R, March R, Morgan P, Satterthwaite G, et al. Lessons learned from the fate of AstraZeneca’s drug pipeline: a five-dimensional framework. Nature Reviews Drug Discovery. 2014;13(6):419–431. pmid:24833294
  3. 3. Corrie PG. Cytotoxic chemotherapy: clinical aspects. Medicine. 2008;36(1):24–28.
  4. 4. Relling MV, Dervieux T. Pharmacogenetics and cancer therapy. Nature Reviews Cancer. 2001;1(2):99. pmid:11905809
  5. 5. Zhang W, Liu Y, Sun N, Wang D, Boyd-Kirkup J, Dou X, et al. Integrating genomic, epigenomic, and transcriptomic features reveals modular signatures underlying poor prognosis in ovarian cancer. Cell Reports. 2013;4(3):542–553. pmid:23933257
  6. 6. Iorio F, Knijnenburg TA, Vis DJ, Bignell GR, Menden MP, Schubert M, et al. A landscape of pharmacogenomic interactions in cancer. Cell. 2016;166(3):740–754. pmid:27397505
  7. 7. Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, et al. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Research. 2012;41(D1):D955–D961. pmid:23180760
  8. 8. Hyman DM, Taylor BS, Baselga J. Implementing genome-driven oncology. Cell. 2017;168(4):584–599. pmid:28187282
  9. 9. Ben-David U, Siranosian B, Ha G, Tang H, Oren Y, Hinohara K, et al. Genetic and transcriptional evolution alters cancer cell line drug response. Nature. 2018;560:325–330. pmid:30089904
  10. 10. Tavassoly I, Hu Y, Zhao S, Mariottini C, Boran A, Chen Y, et al. Genomic signatures defining responsiveness to allopurinol and combination therapy for lung cancer identified by systems therapeutics analyses. Molecular Oncology. 2019;13(8):1725–1743. pmid:31116490
  11. 11. Ji RR, de Silva H, Jin Y, Bruccoleri RE, Cao J, He A, et al. Transcriptional profiling of the dose response: a more powerful approach for characterizing drug activities. PLoS Computational Biology. 2009;5(9). pmid:19763178
  12. 12. Delpuech O, Rooney C, Mooney L, Baker D, Shaw R, Dymond M, et al. Identification of pharmacodynamic transcript biomarkers in response to FGFR inhibition by AZD4547. Molecular Cancer Therapeutics. 2016;15(11):2802–2813. pmid:27550940
  13. 13. Falcetta F, Lupi M, Colombo V, Ubezio P. Dynamic rendering of the heterogeneous cell response to anticancer treatments. PLoS Computational Biology. 2013;9(10):e1003293. pmid:24146610
  14. 14. Silverbush D, Grosskurth S, Wang D, Powell F, Gottgens B, Dry J, et al. Cell-specific computational modeling of the PIM pathway in acute myeloid leukemia. Cancer Research. 2017;77(4):827–838. pmid:27965317
  15. 15. Keshava N, Toh TS, Yuan H, Yang B, Menden MP, Wang D. Defining subpopulations of differential drug response to reveal novel target populations. NPJ Systems Biology and Applications. 2019;5:36. pmid:31602313
  16. 16. Hastie T, Tibshirani R. Varying-coefficient models. Journal of the Royal Statistical Society: Series B (Methodological). 1993;55(4):757–779.
  17. 17. Wu CO, Chiang CT, Hoover DR. Asymptotic confidence regions for kernel smoothing of a varying-coefficient model with longitudinal data. Journal of the American Statistical Association. 1998;93(444):1388–1402.
  18. 18. Wu CO, Chiang CT. Kernel smoothing on varying coefficient models with longitudinal dependent variable. Statistica Sinica. 2000; p. 433–456.
  19. 19. Huang JZ, Wu CO, Zhou L. Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Statistica Sinica. 2004; p. 763–788.
  20. 20. Qu A, Li R. Quadratic inference functions for varying-coefficient models with longitudinal data. Biometrics. 2006;62(2):379–391. pmid:16918902
  21. 21. Fan J, Feng Y, Song R. Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association. 2011;106(494):544–557. pmid:22279246
  22. 22. Geenens G, et al. Curse of dimensionality and related issues in nonparametric functional regression. Statistics Surveys. 2011;5:30–43.
  23. 23. Song R, Yi F, Zou H. On varying-coefficient independence screening for high-dimensional varying-coefficient models. Statistica Sinica. 2014;24(4):1735. pmid:25484548
  24. 24. Chu W, Li R, Reimherr M. Feature screening for time-varying coefficient models with ultrahigh dimensional longitudinal data. The Annals of Applied Statistics. 2016;10(2):596. pmid:27630755
  25. 25. Fan J, Ma Y, Dai W. Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. Journal of the American Statistical Association. 2014;109(507):1270–1284. pmid:25309009
  26. 26. Wang D, Hensman J, Kukaite G, Toh TS, Dry JR, Saez-Rodriguez J, et al. A statistical framework for assessing pharmacological response and biomarkers with confidence. BioRxiv. 2020.
  27. 27. Tansey W, Li K, Zhang H, Linderman SW, Rabadan R, Blei DM, et al. Dose-response modeling in high-throughput cancer drug screenings: An end-to-end approach. arXiv preprint arXiv:181205691. 2018.
  28. 28. Xue L, Qu A, Zhou J. Consistent model selection for marginal generalized additive model for correlated data. Journal of the American Statistical Association. 2010;105(492):1518–1530.
  29. 29. Xue L, Qu A. Variable selection in high-dimensional varying-coefficient models with global optimality. Journal of Machine Learning Research. 2012;13(Jun):1973–1998.
  30. 30. Türei D, Korcsmáros T, Saez-Rodriguez J. OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nature Methods. 2016;13(12):966. pmid:27898060
  31. 31. Yang X, Hu F, Liu JA, Yu S, Cheung MPL, Liu X, et al. Nuclear DLC1 exerts oncogenic function through association with FOXK1 for cooperative activation of MMP9 expression in melanoma. Oncogene. 2020;39(20):4061–4076. pmid:32214200
  32. 32. Subbiah V, Kreitman RJ, Wainberg ZA, Cho JY, Schellens JH, Soria JC, et al. Dabrafenib and trametinib treatment in patients with locally advanced or metastatic BRAF V600–mutant anaplastic thyroid cancer. Journal of Clinical Oncology. 2018;36(1):7. pmid:29072975
  33. 33. Sharma SP. RAS mutations and the development of secondary tumours in patients given BRAF inhibitors. The Lancet Oncology. 2012;13(3):e91.
  34. 34. Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics. 2013;14(1):128. pmid:23586463
  35. 35. Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research. 2016;44(W1):W90–W97. pmid:27141961
  36. 36. Slenter DN, Kutmon M, Hanspers K, Riutta A, Windsor J, Nunes N, et al. WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Research. 2017;46(D1):D661–D667.
  37. 37. Rangaswami H, Bulbule A, Kundu GC. Osteopontin: role in cell signaling and cancer progression. Trends in Cell Biology. 2006;16(2):79–87. pmid:16406521
  38. 38. Sharma N, Jha S. NLR-regulated pathways in cancer: opportunities and obstacles for therapeutic interventions. Cellular and Molecular Life Sciences. 2016;73(9):1741–1764. pmid:26708292
  39. 39. Whyte J, Bergin O, Bianchi A, McNally S, Martin F. Key signalling nodes in mammary gland development and cancer. Mitogen-activated protein kinase signalling in experimental models of breast cancer progression and in mammary gland development. Breast Cancer Research. 2009;11(5):209. pmid:19818165
  40. 40. Mortezaee K, Salehi E, Mirtavoos-mahyari H, Motevaseli E, Najafi M, Farhood B, et al. Mechanisms of apoptosis modulation by curcumin: Implications for cancer therapy. Journal of Cellular Physiology. 2019;234(8):12537–12550. pmid:30623450
  41. 41. Colomer C, Margalef P, Villanueva A, Vert A, Pecharroman I, Solé L, et al. IKKα kinase regulates the DNA damage response and drives chemo-resistance in cancer. Molecular Cell. 2019;75(4):669–682. pmid:31302002
  42. 42. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences. 2005;102(43):15545–15550. pmid:16199517
  43. 43. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 1995;57(1):289–300.
  44. 44. Wilke CO. Bringing molecules back into molecular evolution. PLoS Computational Biology. 2012;8(6):e1002572. pmid:22761562
  45. 45. Jassal B, Matthews L, Viteri G, Gong C, Lorente P, Fabregat A, et al. The reactome pathway knowledgebase. Nucleic Acids Research. 2020;48(D1):D498–D503. pmid:31691815
  46. 46. O’Donovan KJ, Ma K, Guo H, Wang C, Sun F, Han SB, et al. B-RAF kinase drives developmental axon growth and promotes axon regeneration in the injured mature CNS. Journal of Experimental Medicine. 2014;211(5):801–814. pmid:24733831
  47. 47. Comunanza V, Corà D, Orso F, Consonni FM, Middonti E, Di Nicolantonio F, et al. VEGF blockade enhances the antitumor effect of BRAFV 600E inhibition. EMBO Molecular Medicine. 2017;9(2):219–237. pmid:27974353
  48. 48. Solit DB, Rosen N. Resistance to BRAF inhibition in melanomas. New England Journal of Medicine. 2011;364(8):772–774. pmid:21345109
  49. 49. Manzano JL, Layos L, Bugés C, de los Llanos Gil M, Vila L, Martinez-Balibrea E, et al. Resistant mechanisms to BRAF inhibitors in melanoma. Annals of Translational Medicine. 2016;4(12). pmid:27429963
  50. 50. Kutalik Z, Beckmann JS, Bergmann S. A modular approach for integrative analysis of large-scale gene-expression and drug-response data. Nature Biotechnology. 2008;26(5):531. pmid:18464786
  51. 51. Chen J, Zhang S. Integrative analysis for identifying joint modular patterns of gene-expression and drug-response data. Bioinformatics. 2016;32(11):1724–1732. pmid:26833341