^{1}

^{2}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{2}

^{1}

^{3}

^{4}

^{5}

^{4}

^{6}

^{7}

^{8}

^{8}

^{8}

^{1}

^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: EJM AK WW CS . Performed the experiments: WW MLM NPG XJ PK QH. Analyzed the data: EJM AK. Contributed reagents/materials/analysis tools: GM DBS CAP. Wrote the paper: EJM AK WW AP CS. Reverse Phase Protein Array Platform: GM. Cell lines: DBS CAP. Development and optimization of code: EJM AK AB AP. Statistical physics: MW AB AP RZ. Network modeling with synthetic data: EJM. Network modeling with biological data: AK.

We present a powerful experimental-computational technology for inferring network models that predict the response of cells to perturbations, and that may be useful in the design of combinatorial therapy against cancer. The experiments are systematic series of perturbations of cancer cell lines by targeted drugs, singly or in combination. The response to perturbation is quantified in terms of relative changes in the measured levels of proteins, phospho-proteins and cellular phenotypes such as viability. Computational network models are derived

Drugs that target specific effects of signaling proteins are promising agents for treating cancer. One of the many obstacles facing optimal drug design is inadequate quantitative understanding of the coordinated interactions between signaling proteins.

Abnormal biomolecular information flow as a result of genetic or epigenetic alterations may lead to tumorigenic transformation and malignancy and is classically modeled as changes in signaling pathways

High throughput measurements on response profiles of living cells to multiple perturbations such as drug combinations provide a rich set of information to construct quantitative cell biology models. In this paper, we construct context specific

Previous mathematical models of molecular signaling in cells have been effective in modeling pathways and enhancing drug discovery

We take a unique modeling approach to construct context specific,

Perturbing cancer cells with targeted drugs singly and in pairs (A) reveals context-specific response to therapies and illuminates protein interactions. We construct dynamic mathematical models of the cells' response to drugs that have both quantitative parameters (B) and a qualitative network interpretation (C). We use an inference algorithm called Belief Propagation (BP) to construct a set of good, i.e., predictive models (D).

The largest obstacle to

An ingenious, two-step approach to deal with network inference in larger systems is based on first calculating probability distributions for each possible interaction in the model and then computing distinct solutions by sampling these probability distributions. For this purpose, we employ a probability model of network configurations inspired from statistical physics principles. Following a set of approximations to simplify the probability model, we apply a custom adaptation of an iterative algorithm called Belief Propagation (BP). BP involves local optimization updates to probability distributions of individual model parameters that converge to a stable set of probability distributions, which collectively describe a set of good network model solutions

Our algorithmic network pharmacology approach involves four major steps: (i) perturbation experiments with combinations of targeted compounds; (ii) high-throughput quantitative measurements of proteomic changes (e.g., reverse phase protein arrays or mass spectrometry) and phenotypic changes (e.g., cell viability or apoptosis); (iii) inference of quantitative network models of protein signaling that explain and link these changes; and (iv) use of the network models to predict cellular and molecular responses to diverse perturbations, beyond the conditions on which the network models are derived.

In this work, we adapt BP to construct quantitative network models of signaling pathways from systematic perturbation experiments. We evaluate the speed and accuracy of BP on toy data generated from biologically inspired network structures. The inference on this toy data reveals that BP offers a significant improvement in computational efficiency compared to traditional Monte Carlo simulations without a sacrifice in accuracy. Furthermore, we construct network models of signaling in a RAF inhibitor resistant melanoma cell line (SKMEL-133), which has the BRAFV600E mutation

Key decisions in modeling a biological cellular system include the choice of variables and the mathematical framework for representing system dynamics. Here, we work with a fairly simple but powerful _{i}(t)_{i} ^{μ}

The system variables

Theoretically, the model variables can quantify any measure of interest. While absolute protein concentrations are one option, such data is difficult to acquire in high throughput assays. In this study, we focus on log_{2}-ratios of abundances in perturbed conditions to abundances in the unperturbed condition. Consequently, model variables can take both negative and positive values, which denote decreased or increased quantities of the corresponding biological entity. We choose to normalize all measurement levels against unperturbed levels in order to focus on signaling differences due directly to perturbation.

The rate of change of any variable, in this formulation, is predominantly influenced by the additive linear combination of upstream nodes {

The network models are parameterized by the square interaction matrix _{ij}^{2}_{j}_{i}_{ij}^{2}-

The problem of deriving useful models of a (biological) system is called ‘model inference’. The objective of model inference, given a mathematical framework like that described above, is to find a set of parameters such that the model equations best reproduce a training set of experimental data and have predictive power beyond the training set. In the present modeling framework, we aim to find numerical values for the N^{2}–_{l}}

In principle, to infer optimal network configurations one has to compute the cost of all possible network configurations. However, explicit enumeration and cost calculation of all possible parameter configurations _{ij}^{190}, obviously a very large number, making explicit enumeration prohibitive.

The solution space refers to the set of all possible model configurations. A reasonably clever strategy to traverse this enormous solution space is guided random exploration, e.g., by a traditional Monte Carlo search, in which random moves in multi-dimensional parameter space are kept or rejected based on the cost of the resulting configuration, with a non-zero but small probability of accepting higher cost configurations in order to facilitate the escape from local minima. In an earlier study, we successfully used a Monte Carlo search followed by a modified gradient descent method to derive a set of low cost models for a relatively small system. This earlier algorithm achieved a reasonable exploration of solution space for a system of 14 variables, as assessed by the recurrence of dominant interactions across the set of a few hundred low-cost models and the agreement of those interactions with well-established knowledge of signaling pathways in cell biology. However, the

In search of a more efficient algorithm, we adopt an idea originally developed in statistical physics, and widely used in solving complicated optimization problems in computer science and other areas. Instead of sampling a prohibitively large, unrestricted solution space by traversing a set of individual configurations, the idea is to first calculate high probability regions and then restrict exploration to this smaller solution space. In particular, we describe high probability regions by calculating probability distributions of individual model parameters over possible value assignments. Then, we can generate distinct model configurations by sampling from the calculated probability distributions.

Models with a large error (or cost) have low probability, while those with a low error have high probability. More precisely, the probability of any particular model can be computed from its cost, which depends on the parameters in the interaction matrix

The variable Z is the partition function, which ensures that the sum of the probabilities over all model configurations is equal to one. In the statistical physics analogy, the exponents contain interaction energies and the parameter β is an inverse temperature (1/T), such that higher values of

An effective solution is to use an iterative algorithm to approximate the probability distributions of the individual parameters by themselves, often called marginal probability distributions, or simply ‘marginals’. From these marginals, we can describe high probability model configurations for the full system. This iterative algorithm begins with a set of random marginals. In each iteration step, one assumes approximate knowledge of all parameter marginals (‘global information’) and then performs optimization updates on an individual marginal (‘local update’). The local update takes immediate effect and becomes part of the ‘global information’ for successive iterations as the algorithm traverses over all marginals for individual updating. The iteration terminates when it converges to a stable set of marginals. The nature of any local update to a single parameter (e.g., a node-node interaction parameter) is a calculation optimizing a balance of fitness to experimental data and consistency with the global information. The iterative application of this ‘global to local and back’ optimization strategy results in marginals for all system parameters given a probabilistic model. Such optimal marginals are informative by themselves, but are also useful for constructing a population of explicit individual high probability model configurations, which are useful for model simulation studies.

This type of probabilistic method originates in statistical physics and has been generalized to a number of hard optimization problems in statistical physics and computer science. An early application of such probabilistic inference was inverse parameter inference for disordered diluted spin systems

The BP approach, also known as the Bethe-Peierls approximation or cavity method in statistical physics, provides an approximate method for computing marginal probability distributions on a class of probabilistic graphical models called factor graphs. In general, a joint probability distribution over many variables may factorize into a product of factors. A factor in a factor graph represents an independent contribution to the joint probability distribution, and is connected to the variables that depend on that factor. Typically, a factor defines a constraint on a subset of variables. The BP method is proven to be exact on tree-shaped factor graphs. It has many useful applications in approximating distributions on sparse factor-graphs

Top panel: the global information consists of collecting the probability distributions of the non-cavity parameters without the contribution from the cavity condition. This is a simple product over all _{ik}

A major advantage of BP algorithms is the reduction of computational complexity. This not only leads to a substantial reduction in computational effort for smaller systems but also opens the door to solving inference problems for larger systems, which would otherwise be prohibitive.

We use a series of assumptions, described below, to factorize the probability model in Equation 3 into a form that can be efficiently calculated without sacrificing the quantitative and predictive nature of the models. The assumptions below reduce the problem from a probability distribution over whole model configurations (Equation 3) into a collection of marginal probability distributions for each individual parameter. Subsequent sampling from these individual marginals will result in efficient exploration of high probability model configurations.

To simplify the probability model, we compute the probability distributions for model parameters over discrete values, from a set Ω, rather than for continuous values. The choice of discretization is an important detail and affects the convergence properties of the BP algorithm and the quality of the resulting marginals. Empirically, with the data set at hand we find that a set of 11 discrete values, centered at zero, rarely fails to converge to a stable set of marginals. Conversely, searching over only 3 weight values results in a high rate of non-convergence. As for the quality of the resulting marginals, the entropies are close to zero if we limit the search to only 3 discrete values (see Supplemental

In the dynamic model, the system variables are coupled such that a change to any variable propagates to all others via the time derivatives as in Equation 1. A rigorous way to compute the fitness of a configuration is to simulate a configuration and then compare simulation output to the training data. Such a computation, while feasible in principle, is very costly. An alternative is to take advantage of the relationship at the steady state (Equation 4) where the time-derivative is equal to zero.

Equation 4 is a system of self-consistent equations for all variables {_{i}_{j}^{μ}_{j}^{μ*}

This approximation decouples the variables from each other: the model predicted value of ^{th} row of the interaction weight matrix _{j}^{μ*}

We introduce the notation _{i}_{ik}_{i}_{i}_{i}_{ij}

As already mentioned above, the BP method consists of randomly ordered updates to the marginal probabilities for individual parameters, one at time. Updates continue until convergence, when the marginal probabilities do not change between consecutive updates. We describe the method in detail below for single _{i}

A local update takes place inside an abstract ‘cavity’, which isolates a single parameter whose marginal distribution is to be updated (_{ik}

Recall that we are optimizing parameters in a single row of ^{μ}_{ij}

The BP algorithm begins with random ^{μ}_{ij}_{ik}

The exponent _{ij})^{μ}_{ik}

We define _{i\k}^{th}_{ik}^{μ}_{ik}_{i\k}_{i\k}

In the field of optimization algorithms, these equations are sometimes referred to as messages because they communicate information between variable nodes and factor nodes on the factor graph, where in this case the variable nodes in the factor graph correspond to model parameters. Thus, BP belongs to a class of ‘message-passing’ algorithms. It is common to see Equation 7 referred to as messages from the variable nodes to the factor nodes and denoted

^{μ}_{ik}^{μ}_{ik}^{μ}_{i\k}

Therefore,

This equation is equivalent to the sum-product formulation, which is standard in BP literature

Another complication in Equation 10 is that a brute force implementation of the sum operation requires enumeration over an exponentially large number of configurations, which in total is K^{N-1}. Here, we replace the sum over multivariate configurations (_{i\k}_{i\k}

To complete the substitution of _{i\k}_{i\k}

The explicit calculation of

In summary, the following BP equations are calculated for each cavity update iteratively until convergence.

When the above iterative process converges, the final marginals are calculated from the set of factors, reflecting the information from experimental constraints.

The BP algorithm provides marginals characterizing a set of good models. Thus, we reduce the unbounded model search space to a set of tractable probability distributions for model parameters. Next, one must generate high probability models by drawing from the BP calculated marginals.

We need distinct model solutions to proceed with predictive and quantitative analysis of signaling pathways via explicit model simulations. Distinct solutions are derived by the BP guided decimation algorithm _{ij}_{kl} = ω_{kl}

Due to limitations in experimental measurements, not all proteins and their many phosphorylated states are directly measureable. The result is that many key intermediate players are excluded from the model. For this reason, direct interactions in our model do not necessarily imply direct biological interactions (Supplemental

The success of BP depends on whether or not the simulations of the models taken from BP are quantitatively predictive of cellular response to drug combinations. It is also useful, as an exercise, to evaluate the overall performance of the BP algorithm on data sets engineered from completely known networks. With such toy datasets we achieve the following: (i) demonstrate that BP converges quickly and correctly; (ii) compare BP network models to a known data-generating network; and (iii) evaluate performance in biologically realistic conditions of noisy data from sparse perturbations. The synthetic data is generated without the assumptions used in the formulation of BP, and therefore serves as a reasonable test of the sensitivity of the BP method to those assumptions. See methods for more information on how the toy data is generated.

Monte Carlo (MC) simulation and optimization is a strategy for sampling the space of explicit solutions, in which full parameter configurations are searched as a whole. Short of infinite coverage, a thorough MC search yields reasonably accurate approximations of the ‘true’ probability distributions: both posterior probability distributions of explicit configurations, and marginal probabilities of individual parameters, which are calculated by counting the frequency of any parameter assignment across the set of good solutions. MC is a frequently used optimization strategy in statistical physics, and thus a valuable candidate for comparison. We examine speed and accuracy performance of MC and BP for increasingly large models. To do this, one toy data generator is constructed for each of the ten different sizes from N = 10 to N = 100. In each case, the number of training patterns equals the number of nodes for consistency of comparison, i.e., M = N. Both methods search a very large parameter space of 41 possible parameter assignments with ^{N}.

The first criterion of interest is time of convergence (

(A) BP converges three orders of magnitude faster than MC, even as the size of the system increases to 100 nodes. In this test, the number of training patterns equals the number of nodes in both BP and MC. (B) The means of the distributions from BP are plotted against the true non-zero parameters from the set of the data generators. BP has a high correlation (R = 0.7) with the true parameter values, with many points exactly on the diagonal. (C) MC and BP produce low errors per data point compared to random interaction assignments (Red bar).

The second criterion is accuracy. While we are interested primarily in the accuracy of predicting responses to new perturbations, we are also interested in the accuracy of the inferred interactions as an indicator of the models' explanatory and predictive power. In practice, we find that BP does no worse than MC on these datasets given reasonable termination conditions for MC (Supplemental

A Pearson correlation coefficient between the set of non-zero parameters in the data generators and the average values inferred by BP is a reasonable measure of agreement between true and inferred parameters. BP results in a correlation of R = 0.7, (

While other parameter search methods such as genetic algorithms

BP inference is fast and almost perfect when the system has been sufficiently explored by perturbations. In the case of toy data, one can perturb any set of nodes simultaneously with complete control, and generate information-rich data sets. Use of rich data sets provided a sufficient training set for BP to nearly perfectly infer the underlying system (Supplemental

Here, we evaluate BP performance in biologically realistic conditions; small number of sparse perturbations applied individually and in pairs. The inference is repeated with added noise to evaluate sensitivity to noise. The Gene Network Generator GeNGe

Ultimately, the predictive power of the inferred models can only be assessed by explicit simulation of individual models and comparison with experiment. However, the average value of the BP inferred probability distributions can be used in a descriptive sense and either guide human intuitive understanding of biological pathways or be compared to prior knowledge.

The performance features for evaluating the inferred interactions are recall and precision. Recall is the fraction of interactions from the data-generating network that are correctly inferred by BP marginals. False negatives decrease the recall fraction. Precision is the fraction of interactions inferred from BP marginals that are also in the data-generator. False positives lower the precision fraction.

The average interactions from the BP marginals yield a sparse network with a significant number of true interactions (

The average parameters from the BP distributions are compared with the true interactions in the synthetic data generator. The color-coded matrix (A) summarizes all inferred and true interactions. While BP recovers many of the true interactions, some of the interactions are missing (orange; false negatives) while others are incorrect (yellow; false positives). We identified three compensatory motifs (B), which relate false positives to false negatives. Collectively, these classes of compensatory motifs contribute to most of the false negatives (C) and false positives (D). In D, we've also included a category for interactions that have a significant probability of being zero (a non interaction). Even in the presence of considerable noise, (E, F) a significant number of interactions are correctly captured and most of the falsely inferred edges participate in compensatory motifs.

Interestingly, false positives and false negatives are somehow structurally related. In other words, BP tends to miss one or more true interactions (false negatives) and replace them with one or more compensatory interactions (false positives) that are structurally adjacent to the missed interactions. We observe three common structural correlation motifs, which we refer to as (a) upstream, (b) symmetric, and (c) co-regulation motifs (

In the upstream motif, nodes A and C are connected through an intermediate node B, but an edge is inferred from A to C directly. In the symmetric motif, a false positive connects two directly connected nodes but in the wrong direction. In the co-regulation motif, node A directly regulates B and C separately, but a false positive exists between B and C directly. In addition to being structurally correlated, the numerical correlation between nodes involved in false positives are observably high in the training data (Supplemental

BP misses 17 of the 60 true interactions, giving a recall of 74%. Only 4 of these 17 false-negatives are not involved in one of the three structural correlation motifs. Meanwhile, BP predicts 37 false positive interactions for a precision of 55%. However, 29 of these are either involved in one of the three motifs or have a significant probability of being zero (and therefore ceasing to be false-positives). Consequently, we conclude that while many of the false positives may seem worrisome, they are supported in the data and in the underlying data-generating network.

The results on this toy data confirm that this implementation of BP has trouble disambiguating correlation and causation from steady-state data, which is a difficulty common in analysis of steady-state data

It is likely that the assumptions inherent in this BP algorithm may cause the incorrect edge predictions, in particular assumption 2, which separates the likelihood function from the dynamics of the system. We expect that combining a tailored likelihood function to incorporate time-series data may dramatically improve the ability of a similar BP method to infer causality more efficiently.

With toy data, we can accurately analyze the effect that noisy data has on the accuracy of inferring network interactions. Noise from measurement technology can have deleterious effects on network inference and introduce sensitivity to data outliers. For example, RPPA produces Gaussian distributed data in the absence of substantial biological variability _{0,γ}) with a mean of zero and a standard deviation of γ representing the CV as in Equation 16.

We construct two data sets with added Gaussian noise; one with a realistic CV of 15% (

This analysis of BP inferred interactions is limited to a thorough examination of a single set of interactions, taken as the average of each parameter from the BP generated marginal probabilities. We demonstrate that BP is fast, accurate and minimally sensitive to realistic amounts of noise. Moreover, BP is sufficiently strong in distinguishing causal from correlated relationships even though we are currently limited to steady-state data from a small set of perturbation conditions. We know how good the inference of interactions is in a scenario where a perfect model of the data generator exists. The comparison gives us an idea of the structural plausibility of the interactions inferred in real biological contexts.

The probabilistic nature of the BP algorithm is the key feature that enables

We systematically perturb SKMEL-133 cells with a panel of 8 targeted drugs (

(A) Perturbation experiments with systematic combinations of eight small molecule inhibitors, applied in pairs and as single agents in low (light green) and high (dark green) doses. The perturbation agents target specific signaling molecules, detailed in the table. The listed drug dose is the standard drug dose (light green), and two times the standard dose was used for the high dose conditions (dark green). The degree of response is the approximate ratio of downstream effector levels in treated condition compared to untreated condition. (B) The response profile of melanoma cells to perturbations. The response profile includes changes in 16 protein levels (total and phosho-levels, measured with RPPA technology) and cell viability phenotype relative to those in no-drug applied condition. The slashed-zero superscript denotes the unperturbed data.

As an example, the AKT inhibitor (AKTi) concentration is chosen based on reduction in AKT phosphorylation at S473 (AKTpS473). The drug dose response curve indicates that ∼5000 nM of AKTi is required to reduce AKTpS473 levels by 40% compared to untreated controls. Therefore, 5000 nM is the so-called protein IC40. In this study, we choose to work with protein IC40 concentrations, which is a compromise between the competing requirements of gentle perturbations and observable effects.

The main intent of the systematic perturbations is to explore diverse aspects of the signaling response and to maximize information in the response profiles for model inference.

We use an array technique (reverse phase protein arrays, RPPA _{2}-ratio of the measured level in the perturbed condition against the measured level in the untreated control condition. In this work, a response refers to a log_{2}-ratio value. Viability of the cells after drug perturbation is measured using a resazurin assay 72 hours after the cells are perturbed.

Collectively, the responses for 16 protein/phospho-proteins and the cell viability phenotype from the 44 perturbation conditions constitute the training data for

In order to test the accuracy and the predictive power of the models, we use a leave-k-out cross validation test. For each leave-k-out test, we withhold k experiments from the training data, infer network models from the corresponding subset of training data, predict response profiles (via simulation) for the withheld perturbation conditions, and then compare the predicted profiles against those from the withheld test data. Specifically, each leave-k-out test focuses on the removal of a single drug from the training data; all combinations involving the drug of interest are removed, leaving only data from the experiment in which the drug of interest was applied alone (single dose). For each test, we generate 1000 network models with the BP guided decimation algorithm and keep only the top 100 models (those with lowest error in the training set). Overall, eight sets of network models are generated; one for each of the 8 unique drugs.

High correlations between simulated and withheld experimental profiles indicate substantial agreement for each of the 8 independent tests (

Eight distinct leave-7-out cross validation calculations indicate a strong fit between the predicted and experimental response profiles. In each cross-validation experiment, network models are inferred with partial data, which lacks responses to all combinations of a given drug. Next, network models are executed with

1000 unique models are drawn via the BP guided decimation algorithm. From those, the top 100 models are kept for the analysis. The marginal distributions before decimation (

The probability distribution of edge values (_{ij}

For simplicity of interpretation and visualization, a single network representing the set of average interactions, called the ‘average network’, is presented (

The network's description of pathways is limited to the scope of observed model variables. The details of our network may deviate from the intricate details of canonical pathway models due largely to the existence of unobserved nodes. As a natural consequence, the predictions we generate from our networks are also limited by the scope of observed model variables. As the scope expands, the network models may converge to the detailed pathway descriptions with additional capacity for context-specific quantitative predictions.

The canonical PI3K/AKT pathway is characterized by a series of complex interactions resulting from the activity of the upstream kinase PI3K and reaching more downstream regulatory proteins

The indirect positive effect of PI3K on AKT phosphorylation, phosphorylation of TSC2 on T1462 by AKT activity, the regulation of S6 and 4E-BP1 by the PI3K/AKT pathway are all represented in the inferred average network. In our network, amTOR, the activity of mTOR, has no upstream node. This is an artifact of our modeling approach, which prohibits incoming interactions to unmeasured activity nodes. However, the interaction connecting TSC2pT1462 to 4E-BP1pS65 links the upstream components of the pathway to the downstream components. An inhibitory interaction from amTOR to AKTpS473 in the PI3K/AKT pathway is reminiscent of the reported feedback loops in this pathway

In the RAF/MEK/ERK pathway, the average network captures many of the known interactions that link the MAPK activity to Cyclin D1 levels and Rb phosphorylation

In addition to the agreement with well-studied biological interactions, the average network also indicates a series of potentially novel interactions.

First, a strong bidirectional interaction is inferred between RbpS807 and MEKpS217. One hypothesis for the MEK to Rb interaction is that it stands in for the known direct RAF to Rb interaction in the absence of any measurements of phosphorylated RAF. Disruption of the RAF to Rb interaction induces apoptosis in melanoma

An inhibitory interaction from aHDAC (activity of HDAC) to SRCpY527, a critical phosphorylation site for the auto-inhibition of SRC

Given a set of network models of SKMEL-133 cells, we predict the effect of arbitrary

As an example, we predict the cell viability response of SKMEL-133 cells to various targeted perturbations via explicit numerical simulations of our network models. The simulations for each of the network models produce predictions of the temporal trajectories for all model nodes: proteins, phospho-proteins and phenotypes, alike. In each simulation, a single perturbation to a single protein node _{i}_{i}_{2}(0.5) = −1), which in turn propagates throughout the network model. The simulated steady-state responses to four particular, individually perturbed target nodes predict substantial decrease in cell viability (

(A) The histogram of phenotypic response profiles to the four most effective virtual perturbations from the best 100 network models. The response to STAT3p705 reflects the effect of PKCi on cell viability. Viability changes in response to perturbations on cell cycle proteins PLK1 and Cyclin B1 are genuine predictions from the network models. Perturbation of TSC2pT1462 (inhibitory phosphorylation) down regulates the PI3K/AKT pathway and leads to a decrease in cell viability in the PTEN null SKMEL133 cell line. (B) The perturbed nodes that lead to reduction in cell viability in the context of average network model (Circled in red).

The most significant reduction in cell viability comes from the predicted response to perturbation of STAT3pS705, whose mean predicted inhibition of cell viability is roughly 56%, i.e., 44% of the unperturbed cell viability (_{i}_{i}_{i}

We subsequently tested one of these predictions in the lab by treating the SKMEL-133 cells with the PLK1 inhibitor (BI 2536) and measuring cell viability response with the resazurin assay (

Qualitative analysis of networks from

The fourth perturbation that predicts significant change in cell viability is TSC2pT1462 inhibition, which has a central role in the average network model, particularly in PI3K/AKT pathway. It is regulated by AKTpS473 and also interacts with MAPK14pT180, which is upstream of PLK1 and Cyclin B1 in the models. Note that SKMEL-133 cell line is PTEN-null and that a constitutively active PI3K/AKT pathway may play a role in drug resistance in this cell line

The predicted effect on cell viability for each of these four computationally perturbed nodes has at least one of the following explanations: consistent with known perturbation responses of immediately adjacent nodes, present in model training; consistent with the known genetic background of the model cell line (e.g., SKMEL-133 is PTEN-null); or represents a genuinely novel drug target, predicted to be highly efficacious and subsequently validated experimentally.

We describe a combination of experimental and computational methods, in the field of network pharmacology, to construct quantitative and predictive network models of signaling pathways. The particular contribution is a set of algorithmic advances, which we adapted from statistical physics to infer network models in sizes and complexities not reachable by classical gene-by-gene molecular biology. The necessity of inference of complex network models stems from the fact that classical methods, in which a small number of perturbation experiments lead to functional description of carefully selected sets of genes and gene products is reaching technical limits. High-throughput proteomic and genomic profiling technologies provide much richer and more complex information about cellular responses than can be analyzed by a scientist's thought processes. At these levels of completeness and detail, predicting changes in physiological attributes from molecular data requires computational modeling and quantitative analysis. Our quantitative network models not only capture already known biological interactions but also nominate novel interactions and may detect complex regulatory mechanisms, such as feedback loops, in specific biological contexts. The quantitative analysis of molecular and cellular behavior in these models provides detailed understanding of the coupling between signaling processes and global cellular behavior. Such understanding is hard to achieve by reductionist approaches that focus on the relation between single or few molecules and cellular processes. Furthermore, we provide a systems biology platform to predict the cellular (i.e., proteomic and phenotypic) response profiles to multiple perturbations such as those in combinatorial cancer therapies.

This investigation constitutes a proof-of-principle of a particular technology: a combination of a network inference algorithm and a technology for perturbing cells and measuring their response. The overall context is network pharmacology, by which we mean the science of using network models to derive and then test effective therapeutic targets and combinations. The particular challenge in cancer biology is the complexity and individual variation of genetic and epigenetic alterations that are plausibly cancer causing, and thus modulate the response to therapy; and, the emergence of resistance to successful targeted therapies, such as EGFR inhibitors in lung cancer or RAF inhibitors in melanoma. In our view, the fairly fragmented gene-by-gene classical methods of molecular biology, while extremely powerful as a reductionist method, are reaching a clear limit. Those methods struggle with effects such as ‘cross-pathway coupling’, ‘multigenic diseases’ or individual variation of response to therapy. More comprehensively quantitative and computationally predictive methods (‘systems biology’) are likely not only to increase our predictive abilities but also save substantial overall effort by computationally testing large numbers of cellular states on a large variety of genetic backgrounds and in exhaustively explored perturbation conditions.

The implementation of our BP algorithm enables us to infer much larger network models than are reachable with standard Monte Carlo search methods. First, we use biologically realistic toy data to illustrate the dramatic speed improvement of the BP method over standard MC methods. BP convergence times scale favorably with the size of the models at no measureable loss of accuracy. This is a desirable property for constructing large network models

We previously published an MCF7 breast cancer cell line dataset, for which network models were derived with a nested MC search algorithm (CoPIA)

We are able to capture the known interactions in MAPK and AKT/PI3K canonical pathways through

We are aware that drugs do not usually have a single specific target. While this is potentially problematic for modeling and simulating drug effects, we find that the inference is driven largely by the correlations in the data arising from the effect of perturbations on the overall system. We estimate that BP is most sensitive to strong and untrue assumptions about the direct effects of a given drug. The use of activity nodes is an indirect way of dealing with off-target drug effects. These nodes, which are perturbed but not measured, represent the coupling of a drug to the rest of the system. Such coupling includes both specific and off-target effects. BP may infer interactions from these activity nodes to any number other measured nodes thus simultaneously inferring targets for the particular drug used. Drug specificity also affects the predictive power of the resulting models. All effects from the AKT inhibitor, for example, are assigned to single inhibition of the aAKT node, even if other targets of the AKT inhibitor are partially responsible for the measured outcomes. Therefore, any simulation-based prediction regarding perturbation of the aAKT node will be tethered to the off-target effects of the AKT inhibitor used in training. The complete solution of the drug specificity problem requires a more comprehensive and systematic analysis to determine the effect of off-target effects on quality and predictive power of inferred models.

The leave-k-out analysis reveals few outlier points, where the predicted response profiles largely disagrees with test data. These (mis)predictions fall into two major categories. In the first category, (mis)predictions arise due to measurements with very low signal to noise ratios and high experimental uncertainties. The (mis)predicted S6pS240 levels and cell viability phenotypes in some of the perturbation conditions fall into this category. Note that the models are trained and simulated after logarithmic conversion (i.e., _{i}_{2}-ratios of measured signals), which exaggerates the errors for low signals. In linear space, no such outliers are observed suggesting that those outliers can be considered as an artifact of our analysis in logarithmic space. In the second category, (mis)predictions arise due to insufficient experimental constraints. (Mis)prediction of AKTpS473 levels after mTOR inhibition falls into this category. mTOR inhibition leads to an increase in AKT phosphorylation possibly due to the disruption of a feedback loop. In our current analysis, mTOR inhibition and steady state measurements on AKT phosphorylation are the sole experimental inputs to detect the changes in this feedback mechanism. When the mTOR inhibition is withheld in leave-k-out tests, experimental constraints become insufficient to describe the regulation of AKT phosphorylation, thus leading to the (mis)predictions. A systematic experimental design to enrich the perturbations, better characterization of the AKT phosphorylation dynamic range and richer proteomic measurements on this part of signaling pathways are possible ways to improve the quality of these particular predictions. In general, careful optimization of perturbation conditions (drugs and combinations) and observations (protein arrays, mass spectrometry target list), within available resources, would significantly enhance the predictive power of this approach.

Predicting the effect of drug combinations is highly challenging and has been the subject of many studies, e.g.,

Beyond the computational power of well-constrained and robustly derived network models, one may expect to achieve a conceptual understanding of the principles of epistasis of drug effects and the mechanisms of resistance to targeted therapeutics. For example, the initial system response to drug intervention on the scale of minutes to days may be indicative of subsequent epigenetic and genetic changes in a population of treated cells that represent the long-term and hard-to-treat emergence of drug resistance. In this context, reliable dynamic network modeling may be an excellent guide to strategies for blocking the emergence of resistance in the first place. The pre-clinical consequence is the selection of combinations of therapeutic interventions that not only are effective in slowing the proliferation or promoting the elimination of cancer cells, but also counteract resistance to otherwise effective treatments, such as RAF inhibitors in melanoma or AR inhibitors in advanced prostate cancer, in clinical trials.

While the approximate solutions to the problem of network inference deliver interpretable biological results, there is much room for improving the power and information value of the method. For example, when time-dependent response measurements after perturbation are available, one can use analogous, extended algorithms to infer probability distributions over (time-independent) interaction parameters

A straightforward and powerful extension is the systematic use of prior information in the form of directed interactions adapted from the current scientific literature or pathway databases. Such prior information is easily incorporated as a set of additional constraints on the probability distributions of

The network pharmacology approach described here provides a strong tool for a system level description of signaling events in cancer cells. Moreover, it presents a step forward in quantitative prediction of responses of cancer cells to drug perturbations. Beyond cancer biology, there is no reason to believe that the proposed technology cannot be used to derive accurate quantitative and predictive network models for biological cellular systems in general, provided sufficiently diverse experimental perturbations and sufficiently rich readouts are accessible. In this way, we hope to extend the power of classical molecular biology to a broad spectrum of cellular systems with targeted, and possibly clinical, applications.

Eight small molecule drugs targeting mainly the MAPK or AKT/PI3K pathways were chosen based on the knowledge of target specificity and relevance for exploring BRAF signaling in SKMEL-133 cells (

For Western blotting and reverse-phase protein arrays (RPPA) assays, BRAFV600E mutant SKMEL-133 cells were grown in 6-well plates to around 40% confluence in RPMI-1640 medium containing 10% fetal bovine serum (FBS). In a series of perturbation experiments, cells were perturbed with 8 drugs either singly or in paired combinations, and harvested after 24 hours by collecting and freezing the cell pellet. Non-perturbed control cells were treated with drug vehicle (DMSO) for 24 hours (elsewhere called “no-drug control”). Cells were thawed, lysed and protein concentrations were determined by the Bradford assay. Protein concentrations were adjusted to 1–1.5 mg/mL and proteins denatured in 2% SDS for 5 minutes at 95°C. For RPPA, cell lysates were spotted on nitrocellulose-coated slides in Gordon Mills' laboratory at MD Anderson Cancer Center, as described previously

Cells were grown in 6-well plates and perturbed in the same way as for the RPPA assays. After 72 hr drug treatment, resazurin (Sigma-Aldrich, Catalog # R7017) was added at a final concentration of 44 µM to each well and the fluorescent signals were measured after 1 hr incubation, using 530 nm excitation wavelength and 590 nm emission wavelength. For control wells (0 hr drug treatment), the fluorescent signals were monitored after 4 hr incubation. Standard curves of cell numbers were generated as well to back calculate the cell numbers in different wells. Cell viability measurements at 72 hours are used to ensure the phenotypic responses reached to steady state. Significant phenotypic response is observed as a consequence of changes in relatively early proteomic responses to drug perturbations. Indeed, analysis of cell viability changes at 0, 24, 72 and 120 hours after drug perturbations revealed no significant changes in cell viability between 72 and 120 hours. Conversely, the cell viability response at 24 hours had not reached steady state.

Network models are constructed using the measured proteomic and phenotypic response profiles to drug perturbations as experimental data. The reported network models contain 25 nodes and are trained to protein plus phenotype response profiles from 44 experimental observations. Each measured protein level is log normalized with respect to its measured level at no-drug control condition. For quantification of the activity nodes, see below. The probability distribution for each possible interaction strength in the system is computed using the belief propagation algorithm. In the current implementation, the edge strengths can assume values within the interval [−1, 1] with discrete steps of 0.2. The initial messages are sampled uniformly from a random distribution and the BP algorithm is run until the difference between marginals in consecutive iterations is less than 10^{−6}.

The inverse-temperature scaling constant _{i} is taken as 1 and ε_{i} is estimated from the dynamic range of each proteomic measurement sampled in the biological dataset. The ε_{i} and α_{i} parameters are further optimized with a gradient descent algorithm.

Distinct model solutions are computed with the BP-guided decimation algorithm. The interaction parameters in each model are further optimized using the Pineda gradient descent algorithm

Each network solution is simulated individually with specific virtual perturbations according to the model Equation 1 until the system reaches its steady state (Supplementary

“Activity node” is a technical term defined within the context of applied perturbations and derived network models (_{i}_{2}(0.55) = −0.863. Based on the model equations (Equation 1), the value for the activity node is _{i}_{i}

A constant perturbation (_{i}_{i}_{ij}) are independent of the perturbation. As modeled by Equation 1, the dynamic properties and steady state value of node _{i} are a functions of the combination of influence from all upstream nodes (_{j}_{ij} and the strength of the perturbation. The perturbation term in Equation 1 models the effect of targeted interventions such as targeted small molecules. The model equation can also incorporate other perturbation forms such as genetic alterations or RNA interference (RNAi). In case of genetic alterations, the impact can be modeled by fixing _{i}_{i}_{i}

We generated toy data based on toy network models in order to test the performance of BP against a known set of true interactions. The toy models are generated by first fixing a topology of positive and negative values, which are then assigned a set of real values by drawing from an even distribution between 0 and a maximum strength of 2. The topologies are designed to represent cascade-like hierarchical networks to include parallel chains, feed-forward and feedback motifs. For the analysis focusing on true interactions, the topology is generated with the web-service Gene Network Generator (GeNGe). At the time of this analysis, popular toy data generators such as GeneNetWeaver focus on scale-free like network topologies that are common in gene regulatory networks, but not typical for signal transduction pathways. Given the network model, the data is generated by simulating the model according to Equation 1 in response to external perturbation, until the system reaches a stable steady state. The steady state values for all nodes are recorded in each perturbation condition. In rare cases, the simulations encountered perpetual oscillations and these results are excluded from the final toy data set. These simulations are purely deterministic as no stochasticity is incorporated into the simulations. Noise is added to the data post-simulation. We chose to simulate the dynamics of the toy networks with Equation 1 so as to remove the choice of model equation as a source of error.

(ZIP)

(ZIP)

(ZIP)

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

_{ij}

(TIF)

(TIF)

(TIF)

_{2} ([AKTpS47]_{final}/([AKTpS47]_{initial}) = −1).

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

(PDF)

(DOCX)

(DOCX)

(DOCX)

(DOCX)

(DOCX)

(DOCX)

(DOCX)

(DOCX)

We gratefully acknowledge the help of Deb Bemis in organizing and Ed Reznik in improving the manuscript. We thank Debora Marks, Doug Lauffenburger and Peter Sorger for discussions and Sven Nelander for early guidance, discussions and comments on the manuscript.