Constraint-Based Modeling and Kinetic Analysis of the Smad Dependent TGF-β Signaling Pathway

Background Investigation of dynamics and regulation of the TGF-β signaling pathway is central to the understanding of complex cellular processes such as growth, apoptosis, and differentiation. In this study, we aim at using systems biology approach to provide dynamic analysis on this pathway. Methodology/Principal Findings We proposed a constraint-based modeling method to build a comprehensive mathematical model for the Smad dependent TGF-β signaling pathway by fitting the experimental data and incorporating the qualitative constraints from the experimental analysis. The performance of the model generated by constraint-based modeling method is significantly improved compared to the model obtained by only fitting the quantitative data. The model agrees well with the experimental analysis of TGF-β pathway, such as the time course of nuclear phosphorylated Smad, the subcellular location of Smad and signal response of Smad phosphorylation to different doses of TGF-β. Conclusions/Significance The simulation results indicate that the signal response to TGF-β is regulated by the balance between clathrin dependent endocytosis and non-clathrin mediated endocytosis. This model is useful to be built upon as new precise experimental data are emerging. The constraint-based modeling method can also be applied to quantitative modeling of other signaling pathways.


INTRODUCTION
The transforming growth factor b (TGF-b) superfamily consists of TGF-bs, bone morphogenetic proteins (BMPs), activins and related proteins. These proteins regulate numerous cellular processes, such as cell proliferation, differentiation, apoptosis and specification of developmental fate [1,2]. TGF-b initiates signaling by forming a ligand-receptor complex with the type I and type II receptors at cell surface. The activated receptor complex propagates the signal inside by phosphorylating the receptorregulated Smad (R-Smad). Activated R-Smad then forms a heteromeric complex with common mediated Smad (Co-Smad), Smad4. These complexes accumulate in the nucleus and regulate gene expression in a cell-type-specific manner through interactions with transcription factors, coactivators and corepressors [3]. The nuclear Smad complexes are then dephosphorylated by Smad phosphatase [4]. Another important player is inhibitory Smad (I-Smad), Smad7, which recruits Smurf to the TGF-b receptor complex to facilitate the ubiquitin dependent degradation of receptors [5]. Both R-Smad and Smad4 continuously shuttle between the cytoplasm and nucleus in uninduced cells and also in presence of TGF-b signal [6][7][8]. The receptors and activated ligand-receptor complex are internalized through two distinct endocytic routes, the clathrin-dependent endocytosis and the caveolar lipid-raft mediated endocytosis [9,10]. Figure 1 schematically depicts the Smad dependent TGF-b signaling pathways.
Quantitative modeling studies of signaling pathways have been successfully applied to understand complex cellular processes [11][12][13][14]. Traditionally, the quantitative models are validated by fitting western-blot data. Due to the complexity of the model and the quality of the data, over-fitting is a common problem for parameter estimation. When the model has many estimated parameters and the corresponding data is a few, the over-fitting problem might lead to some unwarranted conclusions because the parameter set in the model might be a special domain of possible parameter sets which can similarly fit the data well, but result in different predictions. On the other hand, genome-scale constraintbased models of metabolism have been constructed and used to successfully interpret and predict cellular behavior [15]. Constraint-based modeling is an effective method to narrow the range of the possible parameter space for quantitative models. However, there is less attention to apply the constraint-based modeling for the quantitative analysis of signaling pathways. Here, we proposed a constraint-based modeling method to build a comprehensive mathematical model for the Smad-dependent TGF-b signaling pathway by fitting experimental data and incorporating the qualitative constraints from the experimental analysis.
So far, several mathematical models have been proposed for the TGF-b signaling pathways. Vilar et al. proposed a concise model for the TGF-b receptor trafficking network [16]. On the other hand, Clarke et al. [17] and Melke et al. [18] proposed the models for the Smad nucleocytoplasmic shuttling and Smad phosphorylation response to the TGF-b signal without considering the receptor trafficking steps. In this study, we established a comprehensive mathematical model for the TGF-b signaling pathway, which includes the signal transduction, the receptor endocytosis, the Smad nucleocytoplasmic shuttling, and the ligand induced negative feedback. The simulation analysis of the model agrees well with the experimental analysis of TGF-b pathway, such as the time course of nuclear phosphorylated Smad, the subcellular location of Smad, and Smad phosphorylation response to different concentrations of TGF-b. We employed the mathematical model to study the dynamic relationship between receptor trafficking at the cell surface and the activation of the phosphorylated Smad in the nucleus. The simulation results indicate that the signal response to TGF-b is regulated by the balance between clathrin dependent endocytosis and non-clathrin mediated endocytosis.

Mathematical model of the Smad dependent TGF-b signaling pathway
We proposed here a comprehensive model for Smad dependent TGF-b signaling pathway in mammalian cells, which includes three modules: receptor trafficking and signaling; Smad nucleocytoplasmic shuttling and signaling and I-Smad negative regulation. In the module of receptor trafficking, the model takes into account the following essential elements: (i) constitutive receptor synthesis and degradation; (ii) receptor and ligand-receptor complex trafficking by two distinct endocytic routes; (iii) the distribution of receptors and ligand-receptor complex in different pools, such as membrane surface, clathrin-coated pit, early endosome and caveolar lipid-raft; (iv) the formation of activated receptor complex induced by TGF-b. In the module of Smad nucleocytoplasmic shuttling, the signal events we consider are (i) Smad nucleocytoplasmic shuttling; (ii) Smad complex formation; (iii) nuclear Smad complex dephosphorylation by nuclear phosphatase. In the module of I-Smad negative regulation, the negative feedback contributing to the ligand-induced degradation of the receptors by I-Smad is simplified and modeled as a black box from the nuclear phosphorylated Smad. We made the following assumptions for the model: (1) Experimental analyses indicate that TGF-b receptor trafficking is not affected by TGF-b stimulation [9,10]. We assume that the corresponding internalization and recycling rates for type I receptor, type II receptor and activated ligand-receptors are the same. (2) Referring to the concise model of signal processing in the TGF-b ligand-receptor network [16], we assume that the rate of sequential formation of ligand-receptor complex is proportional to the amount of ligand, type I receptor and type II receptor. (3) The experimental data shown by Lin, et al. indicate that the variation of total amount of Smad is small [4]. We assume that the total amount of Smad is constant. Therefore, the production and degradation of Smad are not considered in this model. (4) Previous experimental analyses suggest that I-Smad/Smurf complex targets the lipid-raft-bound receptor for degradation which leads to the ligand-induced negative feedback [5,9]. We assume that the rate of ligand-induced degradation of receptors is proportional to the amount of nuclear phoshporylated Smad and the receptor complex in caveolar lipidraft.
In order to compare with the published data, we used Smad2 denoting R-Smad. We used the law of mass-action to describe the rate of signal transduction steps. The time-dependent changes of the concentrations of the signaling proteins and protein complexes are determined by the following system of differential equations: The model is composed of an ordinary differential equations system with 16 state variables and 20 parameters. The values and the corresponding biological meaning of the parameters are listed in Table 1. The initial conditions and the biological meaning of the variables are listed in Table 2.

Derivation of the parameter values
In order to keep the consistency of the parameter values, we derived the parameter values from experimental analysis in epithelial cells. In particularly, most of the data used in this study is from experimental analysis of HaCaT (Human keratinocyte cell line) cells. The rational guided the derivation of relative parameter values is described in the following: ), respectively. On the other hand, Figure 3 of reference [9] shows that only 30% of the initial labeled receptor complex remain in the cell after 8 h when the caveolar endocytosis is inhibited. Based on this information, we can derive the constitutive degradation rate constant of ligand- ).
N Rate constant of receptor internalization and recycling: Vilar et al. derived the rate constant of receptor internalization and recycling based on the experimental observations [16]. On the other hand, reference [9] indicate that receptors are internalized through clathrin and nonclathrin independent endocytosis route with similar rates. Therefore, we choose the internalization rate constant of receptors (ki EE and ki Cave ) with the value of 0.33 min 21 which is estimated in reference [16]. The recycling rate constant (kr EE ) for the receptors recycled from early endosome back to the cell surface is set to a value of 0.033 min 21 , which is derived in reference [16].
N Rate constant for Smad nuclear import and export: The nuclear import rate of Smad2 (k Smad2 imp ) was experimentally measured with the value of 0.16 min 21 in reference [8].
Reference [8] also measured that the ratio of mean nuclear fluorescence to mean cytoplasmic fluorescence (R) of Smad2 in uninduced cells is about 0.5. The R ratio of Smad2 is equivalent to the ratio of nuclear to cytoplasmic Smad2 concentrations. We can derive R~k  Figure 5A of reference [8] also shows that the nuclear export rate of Smad4 is about half of the nuclear export rate of Smad2. Hence, the nuclear export rate constant of Smad4 (k Smad4 exp ) is 0.5 min 21 . Since the R ratio of Smad4 is measured with a value of about 0.5 [8], we can derive the nuclear import rate constant of Smad4 (k Smad4 imp ) a value of 0.08 min 21 .
N Nuclear import rate constant of phosphorylated Smad: Figure 3 of reference [22] shows that nuclear import rate between phoshporylated and unphosphorylated Smad2 is similar. Therefore, we set the import rate constant for the phosphorylated Smad complex (k Smads_Complex imp ) to the same value as the import rate constant for unphosphorylated Smad2, which is 0. 16 The steady state concentrations of type I receptor at cell surface, in the lipid-raft and in the early endosome, obtained by solving the systems of algebraic equations (17)(18)(19), are The ratio of nuclear to cytoplasmic Smad2 concentration corresponds to the ratio of mean nuclear fluorescence to mean cytoplasmic fluorescence (R) of Smad2, according to equations (29)(30), which is R~½ Smad2 nuc ss ½Smad2 cyt ss~k The total amount of Smad2 (N total Smad2 ) and Smad4 (N total Smad4 ) are estimated to be 6610 219 mol per cell and 1.4610 218 mol per cell in HaCaT cells, which corresponds to about 3.6610 5 and 8.4610 5 molecules per cell, respectively [23].
Without the treatment of TGF-b, the concentrations of signaling proteins will arrive at steady state balancing protein production, degradation, receptor endocytosis and Smad protein nucleocytoplasmic shuttling. We set the initial concentrations of the signaling proteins as their steady state values before TGF-b is added.

Parameter estimation by constraint-based modeling
The parameter estimation for the 7 unknown values of rate constants was done using a modified version of the tool SBML-PET [24], which incorporates stochastic ranking evolution strategy (SRES) for parameter estimation jobs. SRES is an evolutionary optimization algorithm that uses stochastic ranking as the constraint handling technique [25]. The objective of the para-meter estimation is to find the most feasible parameters in the model that reproduce the quantitative experimental data for the TGF-b signaling pathway. At the same time, the model with the estimated parameters should satisfy some qualitative experimental observation of this pathway. Therefore, the corresponding quantitative time course data is used in the objective function definition and the qualitative data is coded as constraints during the optimization process. We used two time courses of Smad phosphorylation for parameter estimation: (1) The quantitative western blot data reported in Figure 1A in Lin et al. [4] gives us the time course of Smad2 in the whole HaCaT cell extract in the presence of continuous TGFb stimulation for 24 hours. We normalized the data to its maximum value which gives us the dynamic change of the protein. We also normalized corresponding simulation data to its maximum value so that we can compare the experimental data and the simulation data. (2) The quantitative western blot data reported in Figure 1C in Lin et al. [4] gives us the time course of Smad2 phosphorylation under a different condition. TGF-b (2 ng/ml) is added at first 30 minutes, then washed out and type I receptor kinase inhibitor SB431542 is added to terminate the signal.
Other qualitative information from the literatures is used as constraints encoded within SRES method. fusions [8]. The study indicates that the ratio of nuclear to cytoplasmic Smad2 concentrations is about 2.5 after 1 hour TGF-b treatment. After TGF-b stimulus, about 80% of the Smad2 in the nuclues is phosphorylated or complexed. We set constraints for the corresponding Smad2 distribution: the ratio of nuclear to cytoplasmic Smad2 concentrations should be in the range of 1.5,3.5 and 60%,90% of the Smad2 in the nucleus is phosphorylated or complexed after 1 hour TGF-b treatment. (4) Previous experimental data indicate that 0.5 ng/ml (20 pM) of TGF-b is sufficient to yield a maximal response of Smad phosphorylation after 1 hour TGF-b treatment [27,28]. We encode this information as a constraint that the ratio of Smad phosphorylation level with the dose of 0.5 ng/ml to that with the dose of 2 ng/ml TGF-b at 60th minute should be larger than 0.9 and smaller than 1.1.

Comparison of kinetic simulation with experimental analysis
We first check whether the model obtained by constraint-based modeling method can reproduce the experimental data used for parameter estimation, which can give us the information about quality of ''in-sample fit'' (how good is the model fitting the data used for parameter estimation). Experiments reported that the signal would peak at about 30-60 min after TGF-b addition. The result shown in Figure 2A indicates that the simulated time courses of Smad2 phosphorylation agree well with the experimental data [4]. The model is also able to reproduce the experimental observation that the treatment with type I receptor kinase inhibitor SB431542 will cause rapid decrease of the nuclear phosphorylated Smad2 level [4,29] ( Figure 2B). Taken these simulation results together, the model has good ''in-sample fit'' for the data. We next asked whether the model has a good match to other experimental data that were not used for parameter estimation. This test can be regarded as ''out-sample fit'' or model validation. The result shown in Figure 2C indicates that the simulated time courses of nuclear phosphorylated Smad2 agree well with the experimental data [29]. As a further test, we calculated the subcellular location of Smad2 from the simulation result of the model. The results are in agreement with previous reports that TGF-b causes a change in the overall Smad2 distribution from predominantly cytoplasmic to predominantly nuclear [6][7][8]29]. After TGF-b treatment, Smad2 proteins are rapidly accumulated in the nucleus and then return to the cytoplasm ( Figure 2D). Finally, we tested whether the model can predict well the signal response to different dose of TGF-b. Previous experimental data indicate that the response of Smad phosphorylation after 1 hour TGF-b treatment will be saturated when the concentration of TGF-b is larger than 0.5 ng/ml [27,28]. The model successfully predicts the dose-response of Smad phosphorylation upon different concentrations of TGF-b (Figure 3).

Model performance is significantly improved by constraint-based modeling
We compared the performance of the model generated by constraint-based modeling method and that of the model obtained  Figure 1A in Lin et al. [4]. (B) Effect of type I receptor kinase inhibitor SB431542. Cells were treated with TGF-b for 30 minutes, then were washed out TGF-b at 30th minute and added SB431542 to prevent rephosphorylation of Smad2. The experimental data is normalized from Figure 1C in Lin et al. [4]. (C) Comparison of the model time course with an experimental time course of nuclear phosphorylated Smad2 after TGF-b treatment (80 pM, 2 ng/ml). The western-blot data reported by Inman et al. (Fig. 1A, top panel) is quantified with Scion Image software [29]. (D) Subcellular location of Smad2 after TGF-b treatment (80 pM). The concentrations shown here refer to the local concentrations in cytoplasm and nucleus. doi:10.1371/journal.pone.0000936.g002 by only fitting the time course data. We first compared the experimental data and simulation results from the model obtained by only fitting the time course data. The results shown in Figure 4 indicate that the model has been over-fitted to the data used for parameter estimation ( Figure 4A-4B), but it has bad predictions for the data that are not used for parameter estimation. For example, experimental evidence shows that the response of Smad phosphorylation after 1 hour TGF-b treatment will be saturated when the concentration of TGF-b is larger than 0.5 ng/ml [27,28]. However, the predicted time course of phosphorylated Smad2 in the model, obtained by only fitting the time course data, is not saturated even for a very high dose of TGF-b (10 ng/ml), which contradicts the experimental results ( Figure 4C).
On the other hand, we compared the possible variation of parameter sets of the models by these two different methods. For each method, we independently generated 1000 parameter sets which make the model have similar goodness of fitting the time course data of Smad phosphorylation (Table S1 and Table S2). According to the statistical result for the parameter sets shown in Table 3, the ranges of the variation for the 7 estimated parameter values are significantly narrowed by constraint-based modeling method.
Previous modeling studies of TGF-b signaling pathway indicate that the Smad phosphorylation response to TGF-b is robust to a large variation of some parameters [17,18]. The large variations in the estimated parameter sets are usually caused by two reasons. The data quality is the main reason for the large variation of parameter sets obtained by only fitting the time course data. When we fit the western-blot data, we actually fit the scaled data. However, the parameter for the scaled coefficient is unknown, which can vary in a large range if we only fitting the scaled time course data without considering other qualitative constraints. Another reason comes from the insensitivity of some parameters to the signal response. For example, the sensitivities of the parameters k LRC , v T2R and k lid are very small ( Figure 5), which means the output of the model (Smad phosphorylation level) is robust to the variation of these insensitive parameters.

Sensitivity analysis of the model
We next systematically investigated the sensitivity of all the rate constants and found those whose perturbation the pathway is most sensitive or most robust against. Response sensitivities were used for quantifying the effects of all the rate constants on the concentration of signaling proteins. In this model, we regard nuclear phosphorylated Smad complex (nuclear phosphorylated Smad2) as the readout of the signal in this pathway because experimental results indicate that nuclear phosphorylated Smad complex acts as transcription factor to induce the expression of many genes [1,3]. We determined the sensitivity of the integral concentrations of the nuclear phosphorylated Smad complex from the beginning of the TGF-b activation to the end of the simulation time (8 hours). The definition of response sensitivity for nuclear phosphorylated Smad is as following: As shown in Figure 5, we can divide the parameters into three groups: positive control, negative control and almost no control to the changes of nuclear Smad complex. The strong positive and negative control groups belong to the reactions participating phosphorylation of Smad, receptor endocytosis, type I receptor production. The most negative control on the concentration of nuclear Smad complex is the rate constants corresponding to the dephosphorylation of nuclear Smad complex, which implies  Experimental data is the same as described in Figure 2A. (B) Effect of type I receptor kinase inhibitor SB431542. Experimental data is the same as described in Figure 2B. The regulation of the signal: balance between clathrin dependent endocytosis and non-clathrin mediated endocytosis In the experimental studies, potassium depletion can be used to interfere with clathrin-dependent trafficking of receptors, which can inhibit the TGF-b signal [9,30]. On the other hand, nystatin treatment causes the inhibition of the non-clathrin endocytosis pathway [9]. The simulation analysis of the model indicates that the inhibition of clathrin dependent endocytosis causes a transient response of Smad2 phosphorylation ( Figure 6A). The simulation result also shows that inhibition of non-clathrin dependent endocytosis increases TGF-b signal amplitude, which produces a prolonged signal ( Figure 6C). What will happen if both clathrin dependent and non-clathrin dependent endocytosis are inhibited? The simulation results indicate that the key quantity is the ratio of clathrin to nonclathrin dependent endocytosis rate. A transient response of Smad2 phosphorylation appears upon the combination of a strong inhibition of clathrin dependent endocytosis and a weak inhibition of non-clathrin mediated endocytosis, which corresponds to a low ratio of clathrin to non-clathrin dependent endocytosis rate ( Figure 6D). Furthermore, when both clathrin dependent endocytosis and non-clathrin dependent endocytosis are equally inhibited (a medium ratio of clathrin to non-clathrin dependent endocytosis rate), the signal response of Smad2 phosphorylation is The values of the 1000 parameter sets obtained by each method are available in Table S1 and   similar as the control ( Figure 6E). Finally, a prolonged signal response of Smad2 phosphorylation is observed ( Figure 6F) with the combination of a weak inhibition of clathrin dependent endocytosis and a strong non-clathrin dependent endocytosis (a high ratio of clathrin to non-clathrin dependent endocytosis rate). Therefore, the TGF-b signal response is regulated by the balance between the strength of signal initiation from clathrin dependent endocytosis and the strength of negative feedback in the venue of non-clathrin mediated endocytosis. Recently Vilar et al. proposed a concise computational model of signal processing in TGF-b superfamily ligand-receptor network [16]. This work indicates that the key quantity that determines the qualitative behavior of the pathway is the ratio of the constitutive to the ligand-induced rate of receptor degradation (CIR, constitutive-to-induced degradation ratio). Low CIR causes a transient increase of signal activity that returns to pre-stimulus levels. On the contrast, high CIR produces a permanently elevated level of signal activity. The concept of CIR refers to the rates of two degradations process rather than the simple expression of the degrading rate constants. This conclusion is affirmed by our analysis of the balance between clathrin dependent endocytosis versus non-clathrin mediated endocytosis. Our model shows the TGF-b signal response is regulated by the ratio of clathrin to nonclathrin endocytosis rate. On the other hand, the concise model doesn't distinguish the receptors in early endosomes and in caveolar lipid-raft. For ligand-induced degradation of receptors, the concise model simply regards the processes of the non-clathrin dependent internalization, recycling and the degradation of the receptors in caveolar lipid-raft as one-step of ligand induced receptor degradation at cell surface. Therefore, the control role of CIR ratio proposed in the concise model is consistent with regulation role of the clathrin to non-clathrin endocytosis rate ratio in the comprehensive model of this work.