Data-based stochastic modeling reveals sources of activity bursts in single-cell TGF-β signaling

Cells sense their surrounding by employing intracellular signaling pathways that transmit hormonal signals from the cell membrane to the nucleus. TGF-β/SMAD signaling encodes various cell fates, controls tissue homeostasis and is deregulated in diseases such as cancer. The pathway shows strong heterogeneity at the single-cell level, but quantitative insights into mechanisms underlying fluctuations at various time scales are still missing, partly due to inefficiency in the calibration of stochastic models that mechanistically describe signaling processes. In this work we analyze single-cell TGF-β/SMAD signaling and show that it exhibits temporal stochastic bursts which are dose-dependent and whose number and magnitude correlate with cell migration. We propose a stochastic modeling approach to mechanistically describe these pathway fluctuations with high computational efficiency. Employing high-order numerical integration and fitting to burst statistics we enable efficient quantitative parameter estimation and discriminate models that assume noise in different reactions at the receptor level. This modeling approach suggests that stochasticity in the internalization of TGF-β receptors into endosomes plays a key role in the observed temporal bursting. Further, the model predicts the single-cell dynamics of TGF-β/SMAD signaling in untested conditions, e.g., successfully reflects memory effects of signaling noise and cellular sensitivity towards repeated stimulation. Taken together, our computational framework based on burst analysis, noise modeling and path computation scheme is a suitable tool for the data-based modeling of complex signaling pathways, capable of identifying the source of temporal noise.


Introduction
During development and homeostasis of mammalian tissues, proliferation and differentiation are coordinated among thousands of cells. To communicate with each other, cells use cytokines, a group of extracellular signaling molecules. The cytokine transforming growth factor beta (TGF-β) and other members of the TGF-β superfamily play an important role in tissue homeostasis, as they induce antiproliferative and apoptotic responses in adult cells to effectively limit tissue growth [24,59]. Furthermore TGF-β signaling induces a loss of cell-cell junctions, cytoskeletal reorganization and migration [42], thereby allowing epithelial cells to evade from their original location by acquiring a migratory, mesenchymal phenotype in a process called epithelial-to-mesenchymal transition (EMT).
Since the TGF-β signaling pathway is involved in such central regulatory processes, its deregulation is associated with diseases like fibrosis and cancer. In cancer progression, TGF-β signaling plays a dual role: In normal cells the pathway typically provides a cytostatic response and thus acts as a tumour suppressor. In early tumor stages, the tumor-suppressive function is frequently lost, whereas in late-stage tumors TGF-β signaling may induce EMT and promote metastasis [27]. Hence, TGF-β signaling undergoes a specificity switch from a tumor-suppressing to a migration-enhancing function. To better understand this specificity switch, deeper insights into cellular information processing in the TGF-β pathway are required.
The molecular mechanisms of TGF-β signal transmission within cells are well characterized: Signaling is initiated by binding of TGF-β to the TGF-β receptor type II (TGFBR2). This receptor then recruits a type I receptor (TGFBR1) to form the receptor-ligand complex, which phosphorylates SMAD2 and SMAD3, the intracellular transducers of the signal. In the cytoplasm, phosphorylated SMAD2/3 proteins form trimers with another protein called SMAD4, and the SMAD trimers translocate into the nucleus to act as transcription factors regulating the expression of downstream target genes. Among the induced genes are both positive and negative feedback regulators, e.g., SMAD7 which inhibits TGF-β receptors by inhibiting their catalytic activity and inducing their degradation [57].
Previous studies indicate that the cellular response to TGF-β stimulation is correlated to the temporal dynamics of SMAD activation [44,60,62]. For instance, in cancer cell lines EMT and cell migration seems to be induced upon transient SMAD activation, while the cytostatic response requires sustained signaling [44]. Moreover, we recently analyzed the dynamics of SMAD nuclear translocation in single MCF10A cells using live-cell imaging and found that cells with sustained signaling tended to divide less when compared to cells showing a transient signal [62]. Due to high temporal and spatial resolution, this data allowed us to accurately observe the temporal heterogeneity of SMAD signaling at the single-cell level. Interestingly, we found temporal fluctuations in the nuclear translocation of SMAD2 on an intermediate time scale of 30 min to 2 hours which we refer to as activity bursts in the following. These bursts may be physiologically relevant, as stochastic or periodic pulsatile changes in signaling proteins were found to have an influence on the cellular response for several other pathways including calcium [4], NF-kB [3,53], ERK [1,2] and P53 [49] signaling.
The temporal regulation of SMAD translocation is well studied and understood by deterministic mathematical models describing the average dynamics of large cell populations, see e.g. [46,62,67,72,77]. Using these models, ligand degradation due to receptor internalization and the negative feedback via SMAD7 have been identified as important factors in the termination of signal. However, deterministic models do not account for the temporal activity bursts we observe at the single-cell level. For a better understanding of the pathway dynamics and to reproduce the effect of bursting events on the pathway, stochastic simulations are required. A common but computationally expensive way to integrate stochastic effects in time trajectories is to use the Gillespie algorithm [20], also known as kinetic Monte Carlo method. The algorithm delivers a solution of the underlying chemical master equation by sampling from a large number of realizations [16,22]. Equivalently, the chemical Langevin equation introduces a system of stochastic differential equations (SDEs), that assumes noise in the concentrations of the chemical species to approximate the stochastic dynamics [21,35]. Other types of SDE and random ordinary differential equation models have been applied to cellular dynamics for example in [55,65,75]. The complexity of the TGF-β system under investigation is reflected by a large number of dynamic variables and parameters acting on different time scales leading to a stiff system of ordinary differential equations (ODEs). This reduces the efficiency of stochastic simulations and complicates the quantitative calibration of stochastic models based on large-scale single-cell datasets.
Species that are present in low copy numbers contribute most to the stochasticity. In contrast, species with large copy numbers or fast reaction rates necessitate smaller time steps in the Gillespie algorithm, thereby increasing the required computing time. In common splitting methods, either the species [51] or the reactions are split into a slow and a fast group [54,73], or combinations of both [5] under quasi-steady-state or partial equilibrium assumption. In other hybrid methods, based on a separation of time scales, fast components are modeled deterministically while other components are treated with a stochastic model [33]. Further hybrid methods determine their stochastic species based on the variance [25,39]. Derived from fundamental chemical reactions of simple molecules without conformational changes, all of these methods assume Poissonor normal distributions of reaction events.
In contrast to these methods we propose a hybrid deterministic-stochastic modeling framework, which allows for multiparametric stochastic processes corresponding to more complex processes, still isolating the underlying and unknown complexity in a systems biology manner. To this end our efficient approach applies stochasticity by means of SDEs to a subset of selected variables within the system. We further propose an analysis framework that uses species splitting to our advantage since it offers the possibility to attribute sources of noise. To quantitatively reproduce bursting events in the SMAD signaling pathway, we apply an automated detection algorithm enabling us to analyze how burst properties vary across the population. Applying the same burst analysis to both the measured data and the simulations, we define an objective function and thereby compare different model variants that consider stochasticity in distinct parts of the signaling cascade to the data. Thus, we were able to determine the contribution of different pathway reactions to generating activity bursts in SMAD signaling and found that TGF-β receptor internalization may play a key role in this respect. Finally, we compared the predictions of our model to experimental data in untested conditions and show that our model faithfully reproduces the temporal heterogeneity of the SMAD signaling network. The proposed framework may be generally applicable to quantitatively model stochastic biochemical signaling networks based on live-cell imaging data.

Quantification of temporal stochasticity in single cells by burst analysis
To analyze activity bursts in TGF-β/SMAD signaling, we used previously published measurements of SMAD2 nuclear translocation at the single-cell level [62]. In these experiments, MCF10A breast epithelial cells stably expressing SMAD2 fused to a fluorescent protein were analyzed by time-resolved live-cell microscopy. The nuclear and cytoplasmic SMAD2 concentrations were quantified over a 24h time interval after stimulation with different concentrations of TGF-β. To determine pathway activity, the ratio of nuclear to cytoplasmic SMAD2 (nuc/cyt SMAD2 ratio) was used as a quantitative measure of the translocation of SMAD2 to the nucleus (see Fig. 1).
The analysis in [62] showed that upon stimulation with a saturating TGF-stimulus (100 pM), the population-average response summarizing 352 single cells shows a transient peak that is followed by a plateau above the baseline (black line in Fig. 1B). In contrast to [62], where heterogeneity in the cell population was investigated, we here focus on temporal heterogeneity. At the single-cell level, we observed fluctuations in the nuc/cyt SMAD2 ratio on three different time scales: On a short time scale of less then 100 minutes, there are temporally uncorrelated fluctuations that most likely reflect measurement noise. On a larger time scale of more than 12 hours, we observed drifts of the trajectories possibly caused by photobleaching or environmental conditions like the density of the cells. On an intermediate time scale we observed changes in nuc/cyt SMAD2 ratio that we interpreted as bursting events in SMAD signaling (cf. Fig. 1B). As shown in [62]  This burst detection workflow was applied to experimental single-cell data recorded at low (2.5 pM) and high (100 pM) doses of TGF-β, or in the absence of stimulation (0 pM). In the stimulation datasets, the first signal detected as a burst corresponds to the timing of the initial peak observed in the population average (black line in Fig. 1B). The amplitude of these single-cell peaks increased in a dose-dependent manner much like the population-average response (compare Fig. 1D). Subsequent activity bursts, enumerated by the burst indices 2-5, reflect stochastic events. These show a substantially lower amplitude than the first burst, but are still elevated in a dose-dependent manner in most cells treated with 2.5 and 100 pM TGF-β when compared to the 0 pM control. Moreover, the total burst count per single cell within the first 24h after stimulation gradually increased with the ligand dose (Fig. 1E). In contrast, the duration of the first and subsequent activity bursts remained essentially constant across varying TGF-β doses (compare Fig. 1F), which agrees to estimations of the signals self-similarity by means of autocorrelations (see S1 Fig and S2 Fig). Taken together, these data suggest that certain burst features change in a dose-and time-dependent manner (burst amplitude, total burst count), whereas others apparently do not depend on the external stimulus (burst duration).
Stochastic burst-like events may contribute to regulating cellular responses to TGF-β stimulation such as cell migration. To quantify the relationship between bursts and cell migration, a motility score for individual cells was calculated as the summed absolute displacement of cells between consecutive time points, as previously described in [62] (see Methods). By relating the motility of individual cells to bursting events we confirmed that cells with higher SMAD burst counts also tend to exhibit a higher motility than cells with lower burst counts, see Fig. 1G. As cells with a lower burst count are also expected to exhibit a lower total SMAD signal, this analysis was performed for various bins summarizing single cells with a similar total area-under-curve (AUC). Even after this correction for the total SMAD signal, cells with a higher total burst count (>12) persisted to exhibit a higher mobility score than those with fewer bursts (<8). This suggest that bursting events in the nuc/cyt SMAD2 ratio are partially correlated to cellular migration and may be causally related to this cellular response.

Fig 1: Quantification of bursts in TGF-β-induced SMAD translocation
A: Scheme of microscopy-based single-cell analysis of TGF-β signaling. In unstimulated cells the SMAD2-GFP fusion protein is primarily located in the cytoplasm (top). TGF-β stimulation promotes the transport of SMAD2-GFP to the nucleus and hence increases the nuc/cyt SMAD2 ratio that was quantified from microscopy images (bottom). See [62] for details on the data analysis. B: Bursts in SMAD2 nuclear translocation. Time-resolved measurements of the nuc/cyt SMAD2 ratio of four individual cells (colored lines) are compared to the population average response over 352 cells (black line) upon stimulation with 100 pM TGF-β. A stochastic deflection (burst) is highlighted (arrow, dotted lines indicate beginning and end). C: Burst detection process. Bursts are detected in the nuc/cyt SMAD2 ratio of a single-cell upon stimulation with TGF-β. Top: Peaks in the measured trajectory of the nuc/cyt SMAD2 ratio (black line) are approximated by a sum of Gaussian hills (green lines) which are subtracted from a smoothened version of the signal to estimate the long-term trend (blue line). Bottom: Afterwards, the long term trend is subtracted from the measurement data (black line), short-term effects are smoothed out by a Gaussian filter (blue line) and the result is analyzed by a peak finding algorithm which detects 5 bursts in this example (indicated in red). As pathway events might occur shortly after each other we allowed for overlapping bursts. The bursts are characterized by their height and duration (red dashed lines). See Methods for details. D: Burst height increases with TGF-β dose. Violin plots show the distributions of the burst height for the first 5 bursts (burst index) in the nuc/cyt SMAD2 ratio trajectories upon stimulation with 5 pM (blue) and 25 pM (red) TGF-β and in a control population (0 pM). Median and mean are marked by red and black horizontal lines. E: Burst count increases with TGF-β dose. Distribution of burst counts in the nuc/cyt SMAD2 ratio trajectories after stimulation with 5 pM and 25 pM TGF-β and in a control population (0 pM). Higher doses of TGF-β lead to higher burst counts. Median and mean are marked by red and black horizontal lines. F: Burst duration is independent of TGF-β dose. Violin plots show the distributions of burst duration for the first 5 bursts (burst index) in the nuc/cyt SMAD ratio trajectories upon stimulation with 5 pM and 25 pM TGF-β and in a control population (0 pM). Median and mean are marked by red and black horizontal lines. G: Bursts count correlates to cell motility at the single-cell level. Single cells were binned according to their area-under-curve (AUC), measured as the integral of the nuc/cyt SMAD2 ratio over all time points (x-axis), and were further sub-classified in terms of their total burst count (see legend). The cell motility (y-axis) was calculated by taking the sum over the distance moved by the cells in consecutive time points. Among cells with similar total signal (AUC), cells with 8 or less bursts (blue) exhibit lower motility than cells with 12 or more bursts (red). The SMAD-signaling model: from population averages to stochastic single cells A: Topology of the TGF-β pathway model. Extracellular TGF-β (yellow) binds to free TGF-β receptors on the cell membrane (blue ovals) to form a receptor-ligand complex (gray ovals). This complex is then internalized into the endosome (R1R2Le) and there functions as an enzyme that phosphorylates SMAD2 (blue rectangle). Phosphorylated SMAD forms homo-and heterotrimers, which are transported into the nucleus. Nuclear SMAD further induces the expression of a generic feedback regulator (light green) inhibiting TGF-β receptors. Extensions in the scheme indicate endosomal (e), nuclear (n) and phosphorylated (p) species. State transitions and intercompartmental shuttling are indicated with arrows, enzyme catalysis with circle headed bars, and feedback inhibition with blunt headed bars. Colored circles mark parts of the pathway where stochastic noise was introduced in the 'degradation', 'receptor-ligand' and 'internalization' model variants. This panel was modified from [62]. B: CIR model dynamics for kinetic parameters. Temporal noise in kinetic parameters of the model is shown. For combinations of the variance and reversion parameters and in the CIR model (3) chosen according to the graph in the bottom right, multiple trajectories are shown, along with their temporal distributions, which resemble log-normal distributions. In the top left, the noise parameter was low, whereas it was high otherwise, either with a high (bottom left) or low (top right) reversion parameter . C: Flowchart of the modeling approach. Starting from the deterministic model, which was fitted to the population mean, processes within the signaling pathway (colored reaction steps in panel A) were chosen that are sensitive to temporally stochastic occurrences in the cells. Then, random noise (panel B) was introduced in the kinetic parameters connected to the respective critical process (block model). This enabled the reproduction of and model fitting to single cell behavior. D: Propagation of stochastic noise in the model. Noise was introduced in the receptor internalization rates as done in the stochastic internalization model, see Table 3. The noise controlling parameters of the CIR model were chosen as indicated by the same colors in panel B. Simulated trajectories of the dynamic model concentrations accounting for the activated complex on the cell surface ( 7 , left), the activated endosomal complex ( 8 , center) and the nuc/cyt SMAD2 ratio (right) are depicted. For details on model species see Table 2. The trajectories show how the temporal noise propagates through the pathway for different parameters in the CIR model.

Modeling burst like behaviour in single cells by generalizing a population-average model
Given the dose-dependent nature of SMAD activity bursts and their potential relation to cell migration, we sought to describe them using a stochastic modeling approach. To this end, we built on a mechanistic model of the pathway published in [62] that is based on ordinary differential equations (ODEs). This model comprises three modules describing TGF-β receptor dynamics [11], SMAD phosphorylation/complex formation [69] and SMAD-induced feedback regulation [36], see Fig. 2A for details.
For a description of bursting, we converted the population-average version of this model into a stochastic description of a complete cell population. This was done by simulating an ensemble of cells, in which each cell exhibits independent stochastic fluctuations in certain kinetic parameter values. The individual cells of this ensemble share a common extracellular TGF-β pool that serves as an input into the single-cell model, but is also in turn influenced by the single cells, as these internalize receptor-ligand complexes and thereby degrade the ligand. Hence, the model consists of interdependent equations describing the extracellular TGF-β and intracellular signaling protein dynamics.
The temporal dynamic of the external input, which is shared among many single cells, is given by where denotes the concentration of the free ligand TGF-β and Y is a vector including the dynamic concentrations 1 , . . . 22 relevant to the SMAD pathway in cell out of cells. Similarly, the vector P contains the relevant kinetic parameters 1 , . . . , 55 in cell , see Table 3 for details. The right hand side of equation (1) describes the addition of TGF-β into the cell culture dish and its degradation due to internalization into cells. The detailed equations governing the function are given in Table 2 and the dependence on the population size is shown in S12   Table 2 for mathematical details.
To account for bursting at the single-cell level, we introduced noise in a subset of kinetic parameters by employing SDEs. This approach leads to different temporal trajectories of the protein concentrations in each cell considered in (1)- (2). We use the Cox-Ingersoll-Ross (CIR) model [9], to describe the parameter change at time point by a reversion force towards the initial value 0 and a stochastic contribution due to a Brownian motion . The standard deviation parameter controls the time scale of the fluctuations for the corresponding model parameters (see Fig 2B), while the parameter determines the strength of an additional mean reversion force pulling the concentration back to its initial value 0 . For a suitable model of noise in kinetic parameter we expect first that parameters never fall below zero and thus become nonphysical. Second, we expect temporal stability of the kinetic parameter in the sense that its expectation remains constant and its variance bounded over time. And third, the model should lead to log-normal-like distribution of kinetic parameters over time (compare Figure 2B) as fluctuations in many cellular processes, e.g., in gene expression, typically lead to this type of distribution [45].
The non-negativity of model (3) is verified (see also Methods for non-negativity of its discretization), although this does not apply to fundamental Brownian motion and the more commonly used Ornstein-Uhlenbeck (OU) model. Moreover, the model preserves the mean and its variance is bound over time which is unlike Brownian noise and random ODE models whose variance steadily increases with time . We could further verify by simulations that the relative frequency of kinetic parameters over time resembles log-normal distributions (compare Figure 2B). Refer also to S3 Fig and S4 Table for a comparison of the fitted CIR model to OU.
The mean reversion property of the model controlled by the parameter leads to burst-like dynamics already in the paths of the kinetic parameters (see Figure 2B) and our model allowed us to assess how these dynamics propagate to the experimental readout, the nuc/cyt SMAD2 ratio. This ratio is computed from the dynamic concentrations as the sum of nuclear divided by the cytoplasmic SMAD2 species as described in the Methods. The occurrence and strength of fluctuations in the nuc/cyt SMAD2 ratio predicted by the model depend not only on the mean reversion force and variance of the stochastic kinetic parameter. They are also heavily influenced by the choice of the parameter in which stochasticity is introduced. This sensitivity allowed us to discriminate different model assumptions and identify key parameters for which added noise can reproduce the burst features of the single cell data shown in Fig. 1.

Quantitative model calibration by fitting to single-cell datasets
By quantitative fitting to experimental data we aimed to reproduce TGF-β-induced bursting in single cells (Fig. 1). To reduce the complexity and to gain insights into the origin of stochastic fluctuations, we considered block models in which we allowed randomness only in sets of parameters describing similar reactions. Technically, each block model uses (3) only for a subset of kinetic parameters ∈ ⊂ {1, 2, . . . , 55}. The remaining kinetic parameters are assumed to be temporally constant. For introducing stochastic fluctuations, we focused on reactions on the receptor level, as receptor dynamics are rate-limiting in TGF-β signaling [28,62]. Moreover, we observed in our published single-cell data that complete inhibition of the TGF-β receptors by a small-molecule inhibitor during a stimulation experiment leads to homogeneous decay of SMAD nuclear translocation that showed no sign of stochastic SMAD dynamics at the single-cell level [62].
We considered five stochastic block models (see Table 3 for details). The receptor/ligand block model introduces noise in the reactions related to the binding of TGF-β to its receptors of type 1 and 2 on the cell surface. The internalization block model allows for stochasticity in the transport of receptors and TGF-β/receptor complexes from the cell membrane into the cell. In the endosomal traffic block model noise is considered in the intracellular transport of internalized ligand/receptor complexes. Stochastic synthesis of feedback proteins and type 1 and 2 TGF-β receptors is assumed in the synthesis block model. Finally, the degradation block model adds systematic noise to the degradation rates of the TGF-β receptors and the feedback regulator.
To compare the stochastic models to the experiments, we applied the burst detection and analysis used to characterize the data in Fig. 1 to the model output. This allowed us to define an objective function, which computes a distance measure between simulated and measured cell populations. This function compares summary statistics of the population snapshot distributions (mean and standard deviation of nuc/cyt SMAD2 ratio, Fig. 3B) as well as single-cell characteristics (count, height and duration of bursts, Fig. 3C). Snapshot summary statistics were computed based on the nuc/cyt SMAD2 ratio of all cells at a given time point using the same (5 min) sampling intervals as in the experimental data. As depicted in Fig. 3A, the objective function is a weighted sum of the distance values resulting from the comparison of these single-cell and population statistics (see Methods for further details). The weights were balanced to consider all single-cell features and the standard deviation of the snapshot distribution equally while strictly penalizing deviations from the the snapshot mean through a larger corresponding weight to ensure that the model reproduces the population-average response to TGF-β stimulation. We validated the objective function by applying it to computationally generated data sets that clearly contained differences in the features we aimed to quantify (not shown). Furthermore, for series of data sets with gradually changing features from a reference data set, we verified that the objective function varied only gradually as well, see Fig. 4B.
To calibrate the stochastic models we simulated the nuc/cyt SMAD2 ratio in 375 cells and compared them to the the nuc/cyt SMAD2 ratio measured in 730 cells upon stimulation with 100 pM TGF-β using the objective function. For the simulation of the cells, we implemented a high-order stiff SDE solver that takes into account both the coupling of the ligand degradation by many cells and the stiffness of the system and confirmed its efficiency and accuracy (see Methods and Fig. 4). Interestingly, even in the absence of TGF-β stimulation stochastic events were observed in SMAD nuclear translocation in the experimental data, possibly due to noisy basal shuttling or due to measurement noise. To account for this in the stochastic models, we have added measurement data (normalized by subtracting the mean) from a control experiment without stimulation to the simulated time paths after their numerical computation. As a result even the model version with constant parameters, to which we refer to as 'deterministic model', includes temporal noise.
The kinetic parameters of the deterministic ODE model had been previously fitted to time-resolved measurement data obtained in experiments with different doses of TGF-β in [62]. With the exception of the kinetic parameters in which we introduced noise, we maintained these originally obtained parameters in our stochastic model. Hence, only the stochastic parts of the models had to be estimated using the objective function. Independent noise was assumed for each stochastic parameter in a block model, i.e., three values were to be determined per (the initial value ( ) 0 , the noise parameter and the reversion parameter ). Depending on the number of stochastic parameters included in a block model between 3 and 9 parameters were estimated during parameter optimization. The extended scatter search [14] within the software package MEIGO [13] was used to carry out a global parameter search within the parameter space. This evolutionary algorithm uses latin-hypercube sampling to identify an initial population of parameter states, which then evolves by forming combinations between its members. These combination states can replace the original members for the following iteration. The search was stopped when the distance between subsequent populations was lower than a defined level for an extended sequence of iterations. This algorithm together with our high-order path computation scheme could optimize the noise parameters in feasible time.  The slope in the graph corresponds to the strong order of convergence. While for larger time increments the scheme exhibits an experimental convergence order only slightly larger than one (yellow line), this order increases to approximately 1.5 for small time increments (red line). Thus, the high order of the scheme is verified (opposed to the commonly used Euler-Maruyama scheme of strong order 0.5) and accurate numerical integration is ensured for small time increments. B: Sensitivity of the objective function with respect to the dose of TGF-β. Single-cell data from experiments with different doses of TGF-β was compared using our objective function (see Fig. 3). The computed function values (indicated by numbers and color code) show that the objective function increases gradually as the difference in dose increases.

TGF-β internalization explains temporal stochasticity in nuclear SMAD
The optimization results are summarized in Fig. 5 and Table 1. Distributions of the burst height, width and frequency in all block models are presented in Fig 5B. In Table 1 we show the fitting errors of the block models with respect to the objective function. In Fig 5A various paths of the internalization and the synthesis block models can be compared to the nuc/cyt SMAD2 ratio observed in the experiments.
In summary, our results show that the internalization block model achieved the best fit to the data (compare Fig. 5A). All tested models succeeded in being consistent with the population average of the nuc/cyt SMAD2 ratio. However, in terms of temporal stochastic fluctuations significant differences between the models were observed: The block models endosomal traffic, receptor-ligand and synthesis could barely achieve any variability in the observable unlike the internalization and degradation block models (Fig. 5B). Moreover, the latter two models attain single cell statistics similar to those of the data, while the other models underestimate the burst count and the variance in their height and duration.
As reductions of the full complexity of intracellular processes, the stochastic models cannot explain the full variability in the data. Yet, the results are accurate enough to discriminate different models. Our model comparison implies that stochastic processes in the receptor-ligand internalization may play a key role in the temporal heterogeneity of the nuc/cyt SMAD2 ratio in the cell. We hypothesize that the predicted stochastic internalization may arise from low receptor numbers on the cell surface combined simultaneous sorting of dozens of receptors into internalization vesicles [70,74]. Given that TGF-β receptor become only activated after internalization, the formation of a vesicle will give rise to a sudden increase in the signal and thus to a burst in the nuc/cyt SMAD2 ratio (see Discussion).
Stochastic bursting of SMAD2 is a dose-dependent phenomenon, as the burst number and amplitude are reduced at low TGF-β concentrations, whereas the burst duration essentially remains constant (Fig. 1). Since the stochastic model has been calibrated solely on the 100 pM TGF-β stimulus data, we asked whether it could successfully predict changes in bursting dynamics at 5 and 25 pM TGF-β, respectively. Indeed, we found that the internalization model also generated realistic predictions of population average and single cell responses for these doses, which had not been employed for model fitting (see Fig. 6).
Since the stochastic parameters were not fitted to the new stimulation doses, these predictions show the generality of the model. They are also an indication that the stochasticity within TGF-β internalization is an intrinsic property independent of the strength of the signaling. In other words, the processes that lead to internalization noise do not depend on the stimulation   Distance measurements achieved by the five block models after fitting to experimental data for a stimulation with 100 pM TGF-β. The objective function with final value shown in the last column is the weighted sum of mean and standard deviation of the nuc/cyt SMAD2 ratio, as well as burst characteristics in single cells including count, height and duration, for details see Fig. 3 and Methods. Distance measures for burst height and duration are shown for the first four bursts in each model. In case of single cell statistics the table shows summed contributions of median and standard deviation. For the fitted models no computational instabilities occurred and hence the component accounting for failed simulations is zero. The stochastic internalization model achieved the best fit. An alternative validation in terms of the Akaike information criterion led to consistent conclusions, see S5 Table. dose and the dose mainly influences how strongly the internalization noise is transferred to the nuc/cyt SMAD2 ratio.

Temporally stable and unstable noise contributions explain variability in response to repeated stimulation
The stochastic noise in our model leads to signaling fluctuations with a short memory, as the bursts typically decay within a few hours after their appearance (Figs. 5A and 6A). As shown in [62], SMAD signaling also exhibits a temporally stable noise component, since signaling in sister cells remains similar over longer time scales of more than a few hours after cell division.
Restimulation experiments, in which the same cell population is subjected to a second TGF-β stimulus 6h after the initial treatment, probe the relative contribution of temporally stable and unstable fluctuations, and are thus an interesting test case for our stochastic model: If the response to the first and second stimuli are highly correlated at the single-cell level, the variability is dominated by temporally stable fluctuations. In contrast, temporal fluctuations lead to uncorrelated restimulation responses and feedback loops in SMAD signaling additionally contribute. To investigate the long-term memory of the pathway, we compared the best-fit internalization model to restimulation experiments published in [62] (see Methods for details).
The model-data comparison was performed at the level of population-average trajectories (Fig. 7, top row) and by relating single-cell peak amplitudes approximately 90 minutes after first and second stimulus, respectively (Fig. 7, bottom row). The stochastic model approximates the measured population-average trajectories reasonably well and reproduces that the second peak reaches only approximately 80% of the magnitude of the first peak, thereby indicating signaling refractoriness after the initial stimulus (Fig. 7A). At the single-cell level, the stochastic model performs less well, as the simulated noise reproduces only 48% of the measured variance of the first peak (initial stimulation), and 62% of the variance of the second peak (restimulation). Furthermore, the first and second peaks upon restimulation are negatively correlated in the model (Pearson correlation of r=-0.19), whereas their correlation is positive in the data (r=0.34) (Fig. 7A, bottom). Taken together, the best-fit stochastic model fails to fully reproduce the restimulation behavior of the SMAD pathway.
In our model, SMAD signaling is controlled bytranscriptional negative feedbackloops ( Fig. 2A)which may introduce pathway refractoriness and are known to reduce signaling variability [46]. To investigate whether the feedback strength may be overestimated in the model, we lowered the parameter of SMAD-dependent feedback induction to 30% of its best-fit value (see also S6 Fig for a prediction of feedback reduction in case of single stimulation). In line with a role in pathway refractoriness, the reduction of feedback increased the second peak at the population-average level (Fig. 7B, top) and diminished the negative correlation of first and second peaks at the single-cell level (Fig. 7B, bottom). Furthermore, the feedback-reduced model showed a higher cell-to-cell variability of both signaling peaks as expected (Fig. 7B, bottom). However, significant model improvements at the single-cell level could not be achieved without strong deviations between model and data at the population- Experimental data (red) was compared to the best-fit internalization model (blue). Shades indicate the 25 to 75 % quantiles. Bottom: Scatter plots comparing the height of the first peak (x-coordinate) with the height of the second peak (y-coordinate) in single cells, each dot representing one cell. The height of each peak in single cells was defined as the nuc/cyt SMAD2 ratio at the peak times in the population-average (55 minutes and 8h, respectively). Contour lines show the two dimensional densities of the cell population. The first and the second peak show a positive correlation in the data (Pearson correlation r=0.34) but a negative correlation in our simulation (Pearson correlation r=-0.19). B: Model with reduced feedback strength shows less pronounced refractoriness to restimulation. Same as in A, but the experimental data is compared to the model with feedback induction ( 39 in Table 3) reduced by 70 % (yellow). Compared to the orginal model (A), the population-average is increased, especially for the second peak, and the single-cell peak correlation is closer to 0 (Pearson correlation r=-0.12). Both indicate a reduced refractoriness to restimulation. C: Minor reduction in the initial SMAD4 level has pronounced effect on stochastic model behavior. The best-fit internalization model is compared to a model variant in which the total SMAD4 concentration is homogeneously reduced by 40% in all cells of the stochastic ensemble (dark green). A reduction in SMAD4 increases the nuc/cyt SMAD2 ratio by indirect effects on negative feedback regulation. D: A model comprising both temporally stable and unstable noise contributions provides a good match to the restimulation data. Artificial cell populations generated by the best-fit model (blue in panel C) and the model variant with reduced SMAD4 (green in panel C) were merged (cyan) and compared to the experimental data (red). The merged population reproduced the variability of the data and the positive correlation of the peak height (Pearson correlation r=0.08).
average level (Fig. 7B, top). Therefore, feedback strength alone does not account for differences between model and data, and this model variant was not considered further.
In [62], we showed that temporally stable SMAD signaling fluctuations arecaused bydifferences of signaling protein expression levels across cells. Therefore, we argued that a combination of stochastic noise and stable signaling protein fluctuations may be required to reproduce the restimulation experiments.
To assess the effect of varying initial protein concentrations in the best-fit stochastic model, we focused on total SMAD4 levels, as this parameter showed the strongest difference between cellular subpopulationsin our previouswork [62]. In Fig. 7C, we homogeneously decreased the total SMAD4 concentration by 40% in all cells of the stochastic model ensemble (green). Compared to the best-fit stochastic model (blue), the SMAD4-perturbed model shows a higher population-average response and a higher peak variance across cells (Fig. 7C). Hence, minor variations in SMAD4 levelscause a noticeable change in the behavior of the stochastic model.
To combine temporally stable SMAD4 fluctuations with stochastic noise, we constructed a cellular ensemble (Fig. 7D, cyan) that consists of two stochastic cell populations, one with the best-fit parameters, and one with a homogeneous 40% reduction in total SMAD4 levels in all cells (Fig. 7C, blue and green). This combined model reproduced the major aspects of the restimulation data including the population-average and the variability of the first and second signaling peaks (Fig. 7D). Furthermore, the single-cell correlation between the signaling peaks was slightly positive (r=0.08), and thus more consistent with the experimental data than the previous stochastic model variants (Figs. 7A-C).
Even though the more realistic simulations proposed in this paper were required to capture all aspects of the restimulation data, these results further support our earlier observation that SMAD signaling involves a temporally stable noise component, likely due to variations in signaling protein levels. In addition, stochastic noise and negative feedback also contribute to the restimulation response, e.g., by dampening the positive correlation of the first and second signaling peaks. Therefore, the pathway responds in a complex and non-deterministic fashion to repeated stimulation.
In summary, our restimulation analysis helped to assess the interplay of noise sources and feedback mechanisms in determining the long-term heterogeneity of SMAD signaling in single cells.

Discussion
In this work, we employed stochastic differential equations to model temporal variability observed in single-cell SMAD signaling as stochastic fluctuations in kinetic parameter values. Assuming a CIR process in TGF-β receptor internalization, the proposed model realistically reproduced noisy signaling at the single-cell level. To quantitatively calibrate the model based on experimental data, we developed a method to detect and quantify bursting events. In combination with an adjusted high-order time integration scheme, we thereby developed an efficient optimization scheme to discriminate different model variants based on single-cell measurements.
In previous modeling studies, the SMAD pathway was mostly described at the population-average level, i.e., the simulations described the dynamics of one average cell [6-8, 32, 40, 43, 58, 67, 69, 72, 76]. Now that more and more single-cell datasets of SMAD dynamics are available [18,37,66,76], quantitative modeling of cell-to-cell variability in the pathway becomes feasible. In [62], a first attempt was made by sampling the initial protein concentrations in a deterministic model. Thereby, a heterogeneous ensemble of single cells was simulated and the heterogeneity was assumed to be temporally stable. Here, we have now focused on understanding the source of temporal fluctuations observed in live-cell imaging of SMAD2 nucleocytoplasmic shuttling.
Specifically, we analyzed SMAD2 bursting events which may contribute to variability in the cellular response to TGF-β, since cells with more bursts exhibit higher motility in the cell culture dish, clf. Fig. 1G. Apparently, stochastic bursts were previously reported for the nuclear translocation of the closely related SMAD4 protein during the development of Xenopus embryos [71]. Our implementation of automated burst detection allowed us to quantify burst height, width and frequency at the level of individual cells. We found that the number and height of bursting events are dose-dependent, and therefore contain information about the extracellular TGF-β concentration. To describe this bursting phenomenon, we have introduced temporal noise in specific kinetic parameter values and analyzed how this noise propagates to the experimentally observed nuc/cyt SMAD2 ratio, clf. Fig. 2B and 2D.
Quantitative stochastic modeling in biology so far is mostly limited to a description of simple reaction networks, e.g., promoter cycles describing the activity of single genes [19,38,41,47,50]. Our large-scale signaling pathway model (see Fig. 2) contains 45 kinetic parameters and therefore poses a challenge to stochastic modeling, especially when calibrated based on a singlecell dataset with high temporal resolution (0-24h in 5 min intervals) and a large number of cells (n=730). To overcome this challenge, we proposed a hybrid deterministic-stochastic approach where temporal fluctuations were modeled by a CIR process in a subset of the kinetic parameters. In contrast to the Wiener process, which is commonly used in stochastic modeling, the CIR process guarantees non-negative dynamic variables. The properties of the CIR process are well suited for biological modeling of noise: the temporal distribution of the noise tends to a log-normal distribution, clf. Fig. 2B, which was reported for kinetic rates also from experiments, see for example [45].
In this work we have implemented a time integration scheme that is capable to achieve reliable simulations and fast enough to fit the complex model to detailed data. The scheme's efficiency has been obtained by high order and semi-implicit time stepping. The non-negativity of the CIR process has further contributed to the stability during parameter estimation. The identification of appropriate noise parameters for our model required a calibration that also takes properties of the fluctuations into account. Instead of comparing full distributions, which could be achieved for example using the Wasserstein distance, we used the first two statistical moments of the burst feature in our objective function. While the duration of the bursts did not exhibit significant differences among several tested models the burst height and their count played a crucial role in their discrimination, clf. Fig. 3B.
Using our burst detection technique, we could discriminate different models and quantify the contribution of one or several parameters to the observed noise. We showed that the receptor internalization model performed better than the degradation and the synthesis models, clf. Fig. 3. In the synthesis model, we observed low variability close to the results of the deterministic model. The degradation model variability is closer to the experimental data in terms of burst statistics, but is still outperformed by the internalization model. These results indicate that bursts might be caused by stochastic events in receptor internalization.
In recent papers studying the internalization of Epo receptors and of the yeast methionine transporter Mup1 by live-cell imaging it has indeed been shown that receptor transport processes can exhibit strong heterogeneity [10,30]. A plausible explanation for noise arising in receptor internalization is that large numbers of receptors on the cell surface get constricted and internalized at once during vesicle formation. Given that TGF-β receptor internalization promotes downstream signaling [74], the formation of a vesicle will give rise to a sudden increase in the signal and thus to a burst in the nuc/cyt SMAD2 ratio (see also S10 Fig. for an alternative internalization model). For EGFR receptors, it was indeed observed that each internalized vesicle in the endosome contains around 100 receptors [68]. Given that TGF-β receptors are present on the cell surface in low amounts of few hundreds to thousand molecules [70,74] and are internalized via coated pits [15], these internalization bursts may indeed give rise to significant fluctuations in downstream signaling as we predict here.
Our findings about the role of receptor internalization in the SMAD pathway suggest promising ways to derive simplified models with similar burst dynamics in the future (e.g. through quantized state systems methods).
The best-fit internalization model was independently validated by analyzing the bursting behavior at low TGF-β concentrations. The successful prediction of bursting at various TGF-β levels proved that the noise parameters found by fitting to the 100 pM are not specific to this condition. Hence, the internalization noise remained constant across conditions, but its propagation through the signaling network changed, so that the net result is a lower signaling output noise in terms of burst count and amplitude upon weak TGF-β stimulation (Figs. 5 and 6).
The restimulation experiments showed that the stochastic model only partially accounted for the pathway behavior upon repeated TGF-β application (Fig. 7): In the experimental data, the first and second signaling peaks are positively correlated, whereas the model, which focuses on temporally unstable noise, predicted a negative correlation between peaks. By performing detailed simulations, we could show that correlated restimulation behavior is explained by the stochastic model if we additionally assumed a temporally stable noise contribution. As in our previous work [62], this temporally stable noise contribution was implemented as cell-to-cell differences in the total signaling protein concentrations which are important determinants of signaling fluctuations in various pathways [12, 17, 29-31, 56, 61]. The different time scales of these two noise contributions explain why sister cells rapidly desynchronize their SMAD signaling behavior after a division event (temporally unstable noise), but remain partially similar over long time scales (stable noise) [62]. Likewise, the single-cell correlation of the first and second peaks in a restimulation experiment will likely become more and more dissimilar if the second treatment is delayed relative to the first one. Our simulations have shown that stochastic modeling is a valuable tool for the analysis of such experiments which provides insights into the underlying mechanisms.
To the best of our knowledge, this work is the first approach to model bursting events in signaling pathways using CIR processes in kinetic parameters. Our results demonstrated that this approach is suitable to simulate single-cell dynamics and to analyze the origin of noise in large systems. The optimization framework combining burst detection and an efficient time integration scheme are not limited to TGF-β signaling and can be generalized to other systems in the future.

Single-cell data
The experimental datasets used in this paper are published in [62] and are available online at the Dryad digital repository [64] and in the GitHub repository [34]. In brief, human MCF10A cells expressing SMAD2 fused to the yellow fluorescent protein mVenus and H2B fused to the cyan fluorescent protein mCerulean under the control of the Ubiquitin C promoter were imaged on a Nikon Ti inverted fluorescence microscope with a CCD camera and a 20x plan apo objective using appropriate filter sets. The microscope was surrounded by a custom enclosure to maintain constant temperature and atmosphere. Images were acquired every 5 minutes for the duration of the experiments. Cells were tracked in the corresponding images using custom-written scripts and fluorescent intensities quantified. For parameter estimation of the stochastic models (Fig. 7), the measured nuc/cyt SMAD2 ratio after stimulation with 100 pM of TGF-β from the published dose-titration experiment ( Fig. 2A-D in [62]) was merged with a second, previously unpublished dataset with the same experimental setup (N = 378 cells). For model validation (Fig. 6), we used the nuc/cyt SMAD2 ratios from the 5 pM and 25 pM conditions from the same dose titration experiments. In the restimulation analysis (Fig. 7), the experimental data from repeated 5 pM TGF-β stimulation (Fig. 4F in [64]) was used.

Burst analysis
To successfully apply burst detection to experimental data, we tuned the parameters of the above-mentioned processing steps, such as the bandwidth of the Gaussian filters and the height thresholds of the peak detection. To create an in silico benchmark for burst detection, we generated synthetic trajectories by model simulations and added Gaussian hills as artificial burst events as well as fluctuations from unstimulated cells as noise. We chose the burst detection parameters such that in silico added bursts are detected reliably (high sensitivity: high true positive detection rate), while keeping the detection of bursts in unstimulated cells low (high specificity: low false detection rate). Specifically, we randomly sampled all burst detection parameters and calculated sensitivity and specificity for each burst parameter candidate. Thereby, a receiver operating characteristic (ROC) curve was generated [48] and the best parameter set achieved a false detection rate below 25% and a sensitivity of more than 75% (data not shown).

Objective function
The objective function that we used for the model validation and parameter estimation is a weighted sum of several distance measures accounting for single-cell characteristics and summary statistics of the population distribution, see also Fig. 3. All these distance measures were calculated based on the SMAD2 nuc/cyt ratio which in the model is given by Hereby, we refer to Table 2 for details on the dynamic variables employed.

Single cell characteristics
To quantify the fit in burst height and duration we considered the relative difference between model and data in median and standard deviation across simulated and measured cell population. In more details, consider the empirical population including the nuc/cyt SMAD2 ratio in cells either given through measurement data or the model. We define the distribution of the height of the −th burst computed with our burst detection by height ( ). We note that a) the burst detection might detect less than bursts and b) in case of a model population the path computation might fail to compute some of the paths. Hence we assume that height ( ) considers the -th burst height only if a) and b) can be excluded. Moreover we define ( ) ≥ 0 to be the relative number of paths where either a) or b) holds for the -the burst deducting the relative number of paths where a) holds in the data. By denoting measurement data by and model results by we set where˜denotes the median of the distribution . Hence, the distance measure increases if the burst height is different between model and data, and/or if a) or b) holds for a large number of cells. Similarly, we quantify the distance in terms of standard deviation ,std with ( ) referring to the standard deviation of the distribution . Denoting by dur ( ) the distribution of the duration of the −th burst in an empirical population , we can define the distances ,med dur ( , ) and ,std dur ( , ) in the same way and obtain the corresponding formulas by exchanging the distributions in (5) and (6). The distance measures with respect to both burst height and burst duration are considered for the first four measured bursts and hence, including median and standard deviation, they account for 16 components in the objective function.
The distance with respect to the burst count is, similarly as their height and duration, measured in terms of relative difference in mean and standard deviation. Denoting by count ( ) and count ( ) the distribution of detected bursts in the model and in the data we computed distance measures in mean and standard deviation like in (5) and (6) and included them in the objective function. In the computation we took the relative number of failed path computations, 0 ( ), into account. To avoid parameter ranges that cause instabilities during the parameter estimation we included this rate as a separate component increasing the value of the objective function.
Population mean and standard deviation The mean and standard deviation of the nuc/cyt SMAD2 ratio across the simulated and measured cell population account for two further components of the objective function. At any fixed point in time these quantities can be computed employing the snapshot distribution across all cells. Thus, for populations and time-dependent means and variances , , , can be considered. The global distances of the populations with respect to mean and standard deviation sum the differences over time and take relative values using the formulas Weighting A balanced weighting of the introduced distance measures with respect to single-cell characteristics and population statistics allowed us to compare measured and simulated populations in terms of their response to TGF-β stimulation. In particular, we considered the height and duration of the first four bursts and weighted burst height, burst duration, burst count and standard deviation of the observable equally. To prevent deviations from the population mean and failed path computations in the parameter estimation we assigned large weights to the corresponding components. Moreover, we weighted differences in median and mean higher than differences in standard deviation.
In more detail, for measurement data and model results , we defined the output of the objective function as the vector and considered as scalar distance the sum over its squared components.

Path computation
For efficient and accurate numerical approximation of the model (1), (2), (3) we employed a method based on the Itō-Taylor formula of strong order = 1.5 from [23]. The formula includes the control parameter , which we chose = 1/2 resulting in a semi-explicit scheme that can both handle the stiff reaction terms of the SMAD signaling model and resolve the dynamics accurately without the need to compute second-order derivatives. Its deterministic counterpart is known as the Crank-Nicolson-method and commonly used for parabolic differential equations. The scheme in its full form reads , and , require the generation of further random variables and its formulas are given in Table 5.
To solve the nonlinear systems in each time step, we used an adaptive Newton method, where the Newton-error is individually computed for each cell. If this error fell below the error tolerance the corresponding cell was removed from the system in the next Newton iteration. This was iterated until convergence was achieved in each cell. This technique significantly reduced the computational costs. We optimized the performance further by employing a sparse Jacobian that we symbolically pre-computed offline. To handle the stiff reactions caused by the step-like addition of the TGF-β stimulus we employed small time increments of size Δ = 0.04 in the stimulation phase, which included only a minor part of the total time interval. For the remaining time steps we switched to Δ = 0.6 for fast and robust computations.
Although our model leads to non-negative protein concentrations and kinetic parameters, the scheme cannot guarantee nonnegativity. To avoid nonphysical parameters in simulations we applied clipping and set computationally obtained negative state variables to zero. Further, in the case of kinetic parameters equal to zero we ignored the high-order corrections in the scheme for the next time step. We experimentally verified the strong convergence of the resulting scheme and observed a strong order higher than 1, see Fig. 4A.

Simulations of restimulation experiments
In restimulation experiments (Fig. 7), the system is subjected twice to a 5 pM TGF-β stimulus at 0 and 6h. To simulate these experiments, the concentration of external ligand in the model was raised from 0 to 5 pM at the beginning of the experiment, and from the value at the time point before 6h to 5 pM at t = 6h. The restimulation experiment was performed using a different charge of recombinant TGF-β when compared to the dose-titration experiments used for model fitting and validation (Figs. 5 and 6). Thus, the initial peak amplitudes of the signal 1h after stimulation were slightly different between both types of experiments (compare Figs. 6A and 7A), likely because different ligand charges have slightly different biological activities. To account for this ligand differences in our simulations, the forward and backward reaction rates of ligand binding ( 49 and 20 , clf. Table 3) were adapted and a dose factor changing the effective external ligand concentration was introduced. To avoid an over-adaptation, only data-points the second stimulus in the restimulation experiments were used in the estimation of these ligand-adjustment parameters. The autocorrelation trajectories vary among each other, they mostly drop fast in the first 120 minutes but then approach 0 only slowly. In species with slower decay of self-similarity like nuclear and cytoplasmic unphosphorylated SMAD2, the noise led to faster decay. In species with faster decay of self-similarity on the other hand, like nuclear phosphorylated SMAD2 and heterotromer, it led to slower decay. In both cases the stochasticity brings the signal closer to the autocorrelation of the nuc/cyt SMAD2 ratio. No species alone shows an autocorrelation similar to the one of the SMAD2 nuc/cyt ratio (compare S1 Fig). Table 1). The best-fit parameters from fitting the CIR internalisation model were used in the OU internalisation model. Parameters could not be a priori estimated for the OU internalisation model due to instabilities caused by negative stochastic parameter values. The components of the objective function in the OU internalization model are consistent and comparable albeit of an increased magnitude in comparison to the results of the CIR internalization model.  S2 Table). In comparison to the CIR internalisation model qualitatively similar bursting is observed in the single cells (compare Fig. 3A). https://doi.org/10.6084/m9.figshare.19064663 S5 Table. Log-likelihood and Akaike information criterion of block models In analogy to simultaneous fitting of an error model proposed in [52], the scaling of the residuals of our cost-function was re-normalized by introducing the rescaling factors for the components of the objective function minimizing ( / ) 2 + 2 log( ). Factors for all components referring to burst height and those referring to burst duration were assumed equal, respectively. This re-scaled cost-function met the requirements on the variance of the residuals to calculate the log-likelihood (LL) and the Akaike information criterion (AIC) of the block models in case of the 100 pM TGF-β stimulation. The AIC ratio was computed with respect to the AIC of the internalization block model and indicates that the internalization model outperforms the other models also taking the degrees of freedom into account. https://doi.org/10.6084/m9.figshare.19064558 Using burst analysis, the model predictions were compared to data from a SMAD7 knockout experiment. Simulation data and burst analysis is presented analog to Fig. 5. The population average is well described by the model prediction. Furthermore, the model predicts a slight increase in the mean burst count, amplitude and duration in the SMAD7 knockout. Moreover the burst amplitude and duration variance is predicted to increase. Most of these predictions are indeed observed qualitatively in the SMAD7 knockout compared to wildtype data, although the effect sizes are not very large. The prediction was similarly accurate to those of 5 pM and 25 pM doses of TGF-β (compare Fig. 6). https://doi.org/10.6084/m9.figshare.19064687

S7 Fig. Comparison to full and hybrid stochastic simulation techniques
We applied -leaping SSA and a hybrid SSA solver to the TGF-β model in case of stimulation with 100 pM TGF-β using the COPASI toolbox [26] and the model from [63]. In a direct comparison between -leaping SSA and the internalization CIR model by running 375 paths on the same hardware, the CIR internalization model performed nearly 100x faster (2.2 sec/path vs. 210 sec/path). The hybrid approach combined SSA with a Runge-Kutta method and selected the stochastically treated species based on the number of particles, which also correlates with the expected variance. The publicly available hybrid solver has been much slower than our approach. Simulation data and burst analysis is presented analog to Fig. 5. While the computationally expensive SSA reproduced experimental data relatively accurately, the hybrid approach failed to accurately describe the population average and underestimated the number of bursts. Unlike in the CIR internalization model, in both the full SSA and the hybrid method, the nuc/cyt SMAD2 ratio tends to stick to the level before stimulation once the trajectory reached this level again and subsequent stochastic events were less frequently observed. This behaviour was even more prominent in the hybrid model. Moreover the full SSA model predicts a higher number of bursts with high amplitude, in comparison to both data and CIR internalization model, which all attain a similar maximum level. Both SSA models overestimate the heights of later bursts, which was not observed in the CIR internalization model. The predicted bursts thus vary significantly from the ones observed in the experimental data and those predicted by the CIR internalization model. https://doi.org/10.6084/m9.figshare.19064702 S8 Table. Comparison to full and hybrid stochastic simulation techniques Distance measurements of -leaping SSA and a hybrid SSA solver (see S7 Fig.) in comparison to the CIR internalization and the deterministic model to experimental data for a stimulation with 100 pM TGF-β (compare Table 1). Distance in terms of burst statistics of both -leaping SSA and hybrid SSA is increased in comparison to the CIR internalization model. https://doi.org/10.6084/m9.figshare.19064621

S9 Fig. Bursts in the nuc/cyt SMAD2 ratio are caused by anti-correlated changes in nuclear and cytoplasmic SMAD2
To validate that the bursting events in the nuc/cyt SMAD2 ratio do not arise from technical artefacts, we analyzed the behavior of cytoplasmic as well as nuclear SMAD2 pools in trajectories of single cells. If the bursts are related to nuclear translocation, the two pools must be negatively correlated in single cells during bursting events. Even though over all cells and considered time points, nuclear and cytoplasmic SMAD2 are positively correlated, in the progress of the initial burst, they are indeed negatively correlated. In the scatter plot on the left each trajectory is represented by a blue dot and a red dot encoding nuclear and cytoplasmic SMAD2 at the time before and within a burst event respectively. The histogram on the right shows the distribution of angles of the line connecting the initial state to the burst peak in the cell population, clearly indicating that for most cells, the increase in nuclear SMAD2 matched the decrease of the cytoplasmic SMAD2. This excludes the existence of globally correlated measurement noise and suggests that nuclear and cytoplasmic SMAD2 is negatively correlated within bursts. https://doi.org/10.6084/m9.figshare.19064699 S10 Fig. Comparison to alternative vesicle model To assess the role of receptor sorting into vesicles we considered an alternative model. This vesicle model assumed the receptors to be attributed to a particular area on the cell surface. We randomly decided whether each vesicle got internalized or not. For each vesicle that gets internalized, all attributed receptors were internalized, independent of their binding states. For simplicity, after internalisation, the receptors are assumed to be released and recycled or degraded according to the ODE model. Thus the underlying hypothesis of vesicles is not yet consistently translated into the new model. Still, even without parameter fitting, the number of bursting events agreed well to the experimental data. Simulation data and burst analysis is presented analog to Fig. 5. While the variance predicted by this model is clearly underestimated, the average height and duration of bursting events were well predicted (see also S11 Table). The variance of the burst properties though, were underestimated. While the results of the vesicle model are promising, further research and modeling are required to better describe the data. https://doi.org/10.6084/m9.figshare.19064675.v1 S11 Table. Comparison to alternative vesicle model Distance measurements of the vesicle model (see S10 Fig.) in comparison to the CIR model and the deterministic model relative to experimental data for stimulation with 100 pM TGF-β (compare Table 1). While the number of bursts predicted by the vesicle model matches the experimental data, the fit in standard deviation and population average is increased compared to the CIR model.

Data availability statement
The source code and data used to produce the results and analyses presented in this manuscript are available from the GitHub repository [34].  cytoplasmic heterotrimer of two phosphorylated SMAD2 proteins and one SMAD4 protein 13  Description of the dynamic variables and their corresponding right hand sides used in the model equations (1) and (2). For brevity, the dependence of the dynamic concentrations on (indicating the cell) is omitted in the descriptions of 1 , . . . , 22 . Details about the kinetic and experimental parameters used here can be found in Tables 3 and 4.  Parameters of experimental conditions that were used in the model (see Table 2). The parameter 1 describes the dose dependent bolus induction. The Parameters 2 , 3 , 4 and 5 describe the production of TGFBR1, TGFBR2, SMAD2 and SMAD4, respectively. Unbinding from complexes is determined by 6 (TGFBR1 from activated complex of TGF-β, TGFBR1 and TGFBR2 ( 7 )), 7 (TGFBR2 from activated complex of TGF-β and TGFBR2 ( 6 )) and 8 (SMAD7 from inactivated complex of TGF-β, TGFBR1, TGFBR2 and SMAD7 ( 9 )). The parameter 9 introduces a short delay in some reactions for numerical stability.
Formulas of the high order corrections used in the numerical scheme. M , and N , denote independent identically distributed standard normal random variables.