^{1}

^{2}

^{*}

^{3}

^{3}

^{4}

^{1}

^{2}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: JH CH TH. Performed the experiments: CH. Analyzed the data: JH CH. Contributed reagents/materials/analysis tools: TH. Wrote the paper: JH CH. Developed and implemented the analysis tools: JH CH FJT.

Functional cell-to-cell variability is ubiquitous in multicellular organisms as well as bacterial populations. Even genetically identical cells of the same cell type can respond differently to identical stimuli. Methods have been developed to analyse heterogeneous populations, e.g., mixture models and stochastic population models. The available methods are, however, either incapable of simultaneously analysing different experimental conditions or are computationally demanding and difficult to apply. Furthermore, they do not account for biological information available in the literature. To overcome disadvantages of existing methods, we combine mixture models and ordinary differential equation (ODE) models. The ODE models provide a mechanistic description of the underlying processes while mixture models provide an easy way to capture variability. In a simulation study, we show that the class of ODE constrained mixture models can unravel the subpopulation structure and determine the sources of cell-to-cell variability. In addition, the method provides reliable estimates for kinetic rates and subpopulation characteristics. We use ODE constrained mixture modelling to study NGF-induced Erk1/2 phosphorylation in primary sensory neurones, a process relevant in inflammatory and neuropathic pain. We propose a mechanistic pathway model for this process and reconstructed static and dynamical subpopulation characteristics across experimental conditions. We validate the model predictions experimentally, which verifies the capabilities of ODE constrained mixture models. These results illustrate that ODE constrained mixture models can reveal novel mechanistic insights and possess a high sensitivity.

In this manuscript, we introduce ODE constrained mixture models for the analysis of population snapshot data of kinetics and dose responses. Population snapshot data can for instance be derived from flow cytometry or single-cell microscopy and provide information about the population structure and the dynamics of subpopulations. Currently available methods enable, however, only the extraction of this information if the subpopulations are very different. By combining pathway-specific ODE and mixture models, a more sensitive method is obtained, which can simultaneously analyse a variety of experimental conditions. ODE constrained mixture models facilitate the reconstruction of subpopulation sizes and dynamics, even in situations where the subpopulations are hardly distinguishable. This is shown for a simulation example as well as for the process of NGF-induced Erk1/2 phosphorylation in primary sensory neurones. We find that the proposed method allows for a simple but pervasive analysis of heterogeneous cell systems and more profound, mechanistic insights.

Multi-cellular organisms are faced with diverse, ever changing environments. To ensure survival and evolutionary success, microbial systems exploit cell-to-cell variability originating from bet-hedging strategies which increase the robustness against environmental changes

Heterogeneous cell populations are usually investigated using molecular and cell-biological methods with single cell resolution. Currently available methods include microscopy

(A) Heterogeneous population consisting of two homogeneous subpopulations with a very different response level. Snapshot data provide at different time points (filled circle) information about the biological state of single cells. This allows for the characterisation of the kinetics of the subpopulations using threshold, histogram and kernel density estimate (KDE) based methods as well as mixture modelling. (B) Heterogeneous population consisting of two heterogenous subpopulations with a large overlap of the dose response behaviour, rendering an analysis using snapshot data difficult. (C) Table including the available analysis tools for population snapshot data and proposed ODE constrained mixture modelling along with key properties of the methods. (D) Sketch of ODE constrained mixture modelling which combines mixture modelling of the measurement data with pathway information, thereby allowing for an improved quantification of subpopulation properties and mechanistic insights.

The analysis of population snapshot data can be approached using a multitude of statistical methods, e.g., thresholding, density based methods and mixture modelling. The selection of the method is highly problem specific

In addition to the aforementioned shortcoming, currently available statistical methods can only analyse measured snapshot data. None of the methods provides directly mechanistic insights, prediction for hidden network components, hypotheses regarding causal factors for the population heterogeneity or estimates for reaction rates. To gain such additional insight and to simultaneously analyse multiple snapshots, a mechanistic description of the underlying process is required. Mostly, such descriptions are based on ordinary differential equations (ODEs). Commonly used ODE models, however, do not allow for the integration of distributional information but only use the measured mean concentration

In the following, we propose ODE constrained mixture models (ODE-MMs), a combination of mixture models and ODE based pathway models which exploits their individual advantages (

Exemplarily, ODE-MMs are applied to investigate NGF-induced Erk1/2 phosphorylation in primary sensory neurones, a signalling pathway regulating pain sensitisation. Due to the diverse functional roles of sensory neurones, the cell system is highly heterogeneous. We introduce a dynamical model for NGF-induced Erk1/2 phosphorylation in primary sensory neurones and attempt the unraveling of the subpopulation structure and the source of heterogeneity using ODE-MMs. The results are validated using co-labelling experiments.

All animal experiments were reported to the responsible authority, the

In this work we consider collections

The analysis of the individual population snapshots

The individual mixture components are often regarded as subpopulations with different characteristics, e.g., different expression levels. To analyse collections of snapshots

To circumvent shortcomings of mixture modelling, we propose to complement it with pathway information. The responses of subpopulations to different experimental conditions is ultimately determined by the involved metabolic, signalling and gene regulatory pathways. Accordingly, experimental conditions can be matched using models of the underlying biochemical pathway.

Biochemical pathways are mostly modelled using reaction rate equations (RREs)

While RRE based modelling of heterogenous cell populations consisting of different subpopulations is not desirable, RREs might be used to model the dynamics of rather homogeneous subpopulations. In the following, we will describe the “average dynamics” of cells in the

As most experimental procedures only allow for the assessment of a few chemical species, we introduce a measurement model,

Assuming that the communication across and transitions between subpopulations can be neglected for the process of interest, the dynamics of the overall population are captured by the weighted dynamics of its subpopulations. This idea is exploited by ODE-MMs, and will in the following be illustrated for mixtures of normal distributions and more general mixture distributions.

The most commonly used mixture models are mixtures of normal distributions,

In contrast to conventional mixture models (1), ODE-MMs (5) describe the distribution of the observed variables at discrete points and the temporal evolution of subpopulations in response to stimuli. Hence, ODE-MMs establish a mechanistic link between different experimental conditions and time points based on pathway models and differences between subpopulations. This renders error-prone matching of distributions across conditions unnecessary (see

The combination of normal mixture models and RRE models yields simple ODE-MMs. More flexible ODE-MMs are obtained by considering other distributions

The analysis of measurement data

Optimisation problem (6) belongs to the class of ODE constrained optimisation problems. In general this problem is non-convex and possesses local maxima. To determine the parameter vector

As the measurement data are limited, the parameters can often not be determined uniquely. In particular the kinetic rates,

The source of the cell-to-cell variability, namely the parameters which differ between subpopulations, are often unknown. ODE-MMs can be used to assess the plausibility of different potential sources of cell-to-cell variability by means of model selection. Models corresponding to different hypotheses can be formulated and fitted to the data. The comparison of these models using model selection criteria such as the Akaike information criterion (AIC)

The proposed ODE-MMs will be used to analyse NGF-induced Erk1/2 phosphorylation. The respective measurement data for NGF-induced Erk1/2 phosphorylation were acquired using quantitative automated microscopy (QuAM)

In short, primary sensory neurones derived from L1-L6 DRGs were prepared from male Sprague Dawley rats. Dissociated cells were cultured for 15–20 h before stimulated with NGF. After treatment, cells were fixed with paraformaldehyde and permeabilised with Triton X-100. Nonspecific binding sites were blocked and cultures were probed with primary antibodies (anti-phospho-Erk (Thr-202/Tyr-204) (1∶200) and anti-Erk (1∶500)) against target proteins, washed three times, and incubated with secondary antibodies. Cells were quantified with a Zeiss Axioplan 2 microscope controlled by the software Metacyte (Metasystems). As selection marker of sensory neurones, cell identification was performed on immunofluorescently-labelled (Erk staining) cells. The fluorescence intensities derived from pErk antibody and Erk antibody were quantified. To compensate for differences in the mean fluorescence intensity between experimental replicates, the data are normalised.

More detailed information, e.g., information about cell culture conditions as well as the detailed immunofluorescence protocol is provided in

In the following, we will illustrate how ODE-MMs can be used, how the results can be interpreted and what kind of insights can be gained using them. For this purpose, we study a simulation example for which the ground truth is known as well as an application example for which new biological insights are gained using ODE-MMs.

To illustrate the properties of ODE-MMs and to assess their performance, we consider the conversion process

For the conversion process sketched in (A) the cases of homogeneous, non-overlapping subpopulations (B,C,D) and heterogeneous, highly-overlapping subpopulations (E,F,G) are studied. (B,E) Histograms of artificial data for the reversible conversion process (6 time points, 1,000 cells), the best fit achieved using ODE-MM and the distribution predicted for the subpopulations. Artificial data have been generated by sampling single cell parameters from parameter distributions, simulating the single cell model and extracting the concentration of B. ODE-MM was fitted using multi-start local optimisation. (C,F) Representative samples of single cell trajectories for the two subpopulations, the means of the samples and the means for the subpopulations predicted by ODE-MM. (D,G) True parameter distributions (grey shaded area) from which single cell parameters are drawn (purple: subpopulation 1; green: subpopulation 2) and ODE-MM derived parameter estimates including the confidence intervals. Vertical lines mark the maximum likelihood estimates and the horizontal bars represent the confidence intervals corresponding to different confidence levels (80%, 90%, 95% and 99%) computed using profile likelihoods. For the population fraction

Artificial data for the conversion process are generated using an ensemble cell population model

Given the artificial data sets, we first asked whether ODE-MMs can detect the presence of two subpopulations and unravel the differences between them. To address this, we considered four competing hypotheses:

H1 No subpopulations.

H2 Two subpopulations with significantly different stimulus dependent conversion rates

H3 Two subpopulations with significantly different stimulus independent (basal) conversion rates

H4 Two subpopulations with significantly different conversion rates

These four scenarios were described using RRE constrained mixture models. To ensure robustness with respect to the distribution assumption, we considered normal distribution and log-normal distributions with the mean parameterized by the RRE as well as log-normal distributions with the median parameterised by the RRE.

The combination of the 4 hypothesis and the 3 distribution assumptions yields 12 models. These 12 models were fitted to the artificial measurement data using multi-start local optimisation. Components weights were constrained to the interval

Scenario 1: homogeneous, non-overlapping subpopulations after stimulation | ||||||||||

distribution | ODE const. | variability | # par. | BIC (10^{4}) |
rank | Δ_{BIC} |
decision | |||

1 | normal | mean | - | 9 | 0.9806 | −1.9534 | 10 | >10 | rejected | |

1 | log-normal | mean | - | 9 | 0.9785 | −1.9493 | 11 | >10 | rejected | |

1 | log-normal | median | - | 9 | 0.9785 | −1.9492 | 12 | >10 | rejected | |

2 | normal | mean | _{1} |
17 | 1.0998 | −2.1848 | 3 | 6.215 | not rejected | |

2 | log-normal | mean | _{1} |
17 | 1.1001 | −2.1854 | 1 | 0 | optimal | |

2 | log-normal | median | _{1} |
17 | 1.1001 | −2.1854 | 2 | 0.429 | not rejected | |

2 | normal | mean | _{2} |
17 | 0.9911 | −1.9673 | 9 | >10 | rejected | |

2 | log-normal | mean | _{2} |
17 | 1.0013 | −1.9878 | 7 | >10 | rejected | |

2 | log-normal | median | _{2} |
17 | 0.9949 | −1.9750 | 8 | >10 | rejected | |

2 | normal | mean | _{3} |
17 | 1.0087 | −2.0026 | 4 | >10 | rejected | |

2 | log-normal | mean | _{3} |
17 | 1.0077 | −2.0005 | 5 | >10 | rejected | |

2 | log-normal | median | _{3} |
17 | 1.0032 | −1.9916 | 6 | >10 | rejected |

Scenario 2: heterogeneous, highly-overlapping subpopulations | ||||||||||

distribution | ODE const. | variability | # par. | BIC (10^{3}) |
rank | Δ_{BIC} |
decision | |||

1 | normal | mean | - | 9 | 6.955 | −13.831 | 12 | >10 | rejected | |

1 | log-normal | mean | - | 9 | 6.923 | −13.768 | 10 | >10 | rejected | |

1 | log-normal | median | - | 9 | 6.922 | −13.766 | 11 | >10 | rejected | |

2 | normal | mean | _{1} |
17 | 7.059 | −13.970 | 3 | >10 | rejected | |

2 | log-normal | mean | _{1} |
17 | 7.069 | −13.991 | 1 | 0 | optimal | |

2 | log-normal | median | _{1} |
17 | 7.068 | −13.988 | 2 | 2.928 | not rejected | |

2 | normal | mean | _{2} |
17 | 6.990 | −13.833 | 9 | >10 | rejected | |

2 | log-normal | mean | _{2} |
17 | 7.003 | −13.858 | 7 | >10 | rejected | |

2 | log-normal | median | _{2} |
17 | 6.997 | −13.846 | 8 | >10 | rejected | |

2 | normal | mean | _{3} |
17 | 7.025 | −13.901 | 5 | >10 | rejected | |

2 | log-normal | mean | _{3} |
17 | 7.027 | −13.906 | 4 | >10 | rejected | |

2 | log-normal | median | _{3} |
17 | 7.021 | −13.894 | 6 | >10 | rejected |

Following the hypothesis testing, the best models were analysed in greater detail, starting with comparisons of model predictions with the data. This comparison revealed that the measured means (

Regarding the parameters, we found for scenario 1 that the ODE-MM estimates of the parameters

To assess the uncertainty of the parameters, we computed the profile likelihoods. The confidence intervals derived from the profile likelihoods are relatively tight. This indicates that even for cell populations consisting of heterogeneous subpopulations, population snapshots provide information about the dynamical parameters and the subpopulation statistics. Furthermore, for this artificial example, the average parameters in the subpopulation are always within the confidence intervals for the parameters of the ODE-MM. This suggests that the ODE-MM parameters can be interpreted as average parameters of the subpopulations.

To conclude the simulation example, we found that ODE-MMs facilitate the simultaneous analysis of several snapshot data sets. Furthermore, ODE-MMs can be used for hypothesis testing, and the states of the RREs accurately describe the subpopulations while their parameters provide estimates for the means of the underlying biological quantities.

In this section, we use ODE-MMs to perform a data-driven study of NGF-induced Erk1/2 phosphorylation in primary sensory neurones. Primary sensory neurones are commonly used as a cellular model for investigating signalling components mediating pain sensitisation. NGF is known to induce a strong pain sensitisation during inflammation, but also to support neuronal repair during neuropathic pain. Studies showed that NGF binds and activates the receptor tyrosine kinase TrkA

(A) Schematic of model for NGF-induced Erk1/2 signalling. Arrows represent conversion reactions and regulatory interactions. (B) Mean and standard deviation of measured pErk levels (kinetic:

Beyond the importance of NGF-induced Erk1/2 phosphorylation in pain research, primary sensory neurones are well suited for the evaluation of ODE-MMs as they exhibit a significant degree of cell-to-cell variability. This variability is no nuisance but relevant for their biological function

The quantitative assessment of signalling in primary and heterogeneous cells is challenging compared to cell lines as many experimental methods are not applicable. To study the dynamics of the MAPK/Erk pathway we previously introduced a quantitative automated microscopy technique

In the literature, it is described that NGF binds to TrkA, yielding the active signalling complex TrkA:NGF. TrkA:NGF-induces the activation of the Ras kinase, which phosphorylates the Raf kinase. The active Raf kinase phosphorylates Mek, which phosphorylates Erk1 and Erk2. In principle the consideration of all these steps is possible, but experimentally the activity of the signalling intermediates Ras, Raf and Mek is difficult to measure in primary sensory neurones as appropriate antibodies are not available. Therefore, we mainly consider a simple pathway model which merely accounts for NGF-TrkA interaction and Erk1/2 phosphorylation. We do not distinguish between Erk1 and Erk2, as their biochemical properties have been demonstrated to be nearly identical (see

In the remainder, all plots depict the scaled TrkA:NGF and pErk concentrations,

We employed the dynamical pathway model A to assess the population dynamics and to compare the three hypotheses:

H1 No subpopulations.

H2 Two subpopulations with significantly different Erk levels (

H3 Two subpopulations with significantly different TrkA levels (

We only regarded altered abundance of signalling molecules as potential differences between subpopulations. Differences in elementary reaction rates would require mutations or differential post-translational modifications which we consider unlikely. As in the simulation example, the scenarios were described using RRE constrained mixture models. For each scenario we considered normal and log-normal mixture components with means parameterised by the RRE as well as log-normal mixture components with medians parameterised by the RRE. This yielded in total 9 ODE-MMs, which have been fitted using multi-start local optimisation. Properties of models, goodness of fit statistics and obtained BIC values are listed in

distribution | ODE const. | variability | ℓ |
BIC (10^{4}) |
rank | Δ_{BIC} |
decision | |||

1 | normal | mean | - | 17 | −5.2890 | 10.5955 | 9 | >10 | rejected | |

1 | log-normal | mean | - | 17 | −3.7659 | 7.5495 | 6 | >10 | rejected | |

1 | log-normal | median | - | 17 | −3.7556 | 7.5288 | 5 | >10 | rejected | |

2 | normal | mean | [Erk]_{0} |
30 | −4.0348 | 8.1006 | 8 | >10 | rejected | |

2 | log-normal | mean | [Erk]_{0} |
30 | −3.6482 | 7.3274 | 4 | >10 | rejected | |

2 | log-normal | median | [Erk]_{0} |
30 | −3.6262 | 7.2835 | 3 | >10 | rejected | |

2 | normal | mean | [TrkA]_{0} |
30 | −3.9846 | 8.0002 | 7 | >10 | rejected | |

2 | log-normal | mean | [TrkA]_{0} |
30 | −3.5847 | 7.2003 | 2 | 2.189 | not rejected | |

2 | log-normal | median | [TrkA]_{0} |
30 | −3.5846 | 7.2001 | 1 | 0 | optimal |

We note that the rejection of hypotheses H1 and H2 requires information about the distribution of pErk levels. Even models for the simplest hypothesis, H1, describe the kinetic and dose response of the mean pErk level (

The selected population structure, H3, assumes different concentrations of the NGF receptor TrkA for the subpopulations. This results in different concentration of TrkA-NGF complexes and ultimately in different Erk phosphorylation levels. The overall Erk concentration, [Erk] + [pErk], is the same for the subpopulations. An illustration of the models and signalling is provided in

(A) Schematic of model for NGF-induced Erk1/2 signalling. Arrows represent conversion reactions and regulatory interactions. The frequency of an object is used to illustrate its abundance. (B) Mean and standard deviation of measured pErk levels (kinetic:

The ODE-MMs representing H3 explain the kinetic and dose response measurements of the mean pErk concentration as well as the pErk distribution. Measurement data and fits for the best two models,

The maximum likelihood estimation of the model parameters provides estimates for the relative size of the subpopulations and their pErk levels. Roughly 70% of the cells belong to the subpopulation with low TrkA levels (subpopulation 1) and 30% of the cells possess high TrkA levels (subpopulation 2). Subpopulation 1 hardly responds to NGF, while subpopulation 2 responds with a 4-fold increase in pERK levels for a 1 nM NGF stimulation. The maximal response is reached after 10 minutes and the response amplitude saturates for NGF concentration

Beyond subpopulation differences in observed pErk levels, ODE-MMs rendered quantities accessible which could not be measured. In particular the Erk dephosphorylation rate and the NGF-TrkA affinities could be inferred. Furthermore, we found a 30-fold difference between TrkA levels in the two subpopulations. This information is valuable as TrkA antibodies with high sensitivity and specificity are not available for immunofluorescence based experiments in cultures of primary sensory neurones. A practical identifiability analysis using profile likelihood showed that all estimated parameters – kinetic parameters, subpopulation sizes and standard deviations – are identifiable (

The ODE-MMs

Pathway model A (

(A) Schematics of three model for NGF-induced Erk1/2 activation. Pathway model A is a simple two component model, while pathway models B and C contain a detailed description of the signalling cascade. Pathway model C also accounts for a negative feedback from pErk to Ras activation. (B) Comparison of different pathway models (colour-coded), hypotheses about the cell-to-cell variability (H1, H2 and H3) and distribution assumptions (distribution: normal vs. log-normal; ODE-constrained: mean or media). BIC values indicate that differences between the pathway models are small compared to differences arising from different variability hypotheses and distribution assumptions. (C) Maximum likelihood estimates of the subpopulations sizes found for each pathway model.

To evaluate the robustness of our predictions with respect to the choice of the pathway description, we considered two additional pathway models. Pathway models B and C (

As for pathway model A, we carried out the parameter estimation and model selection for pathway models B and C (

The hypothesis testing using different pathway models supported our prediction that TrkA is the key source of cell-to-cell variability. Moreover, the maximum likelihood estimates for the size of the responsive subpopulations (

To validate the ODE-MM derived prediction that subpopulations do not possess different Erk levels (H2) but different TrkA levels (H3), co-labelling experiments have been performed. In addition to Erk phosphorylation also total Erk is quantified using a second antibody. As both measurements provide only relative information the scales are not comparable. For details regarding the experiments, we refer to the section

Joint distribution of pErk levels and total Erk levels under (A) control conditions and after (B) stimulation with 1 nM NFG for 30 minutes, along with the corresponding histograms (pooled data of

As subpopulations 1 and 2 have similar total Erk but different pErk distributions, total Erk is not the cause of the different activation potentials of the subpopulations. This verifies the rejection of hypothesis H2, which assumed a predominant role of the total Erk. The different activation potentials have to be caused by a further network compound such as TrkA. This partially validates hypothesis H3. However, the differences could in principle also be due to intermediate signalling components, such as Raf and Mek, which are not considered in the model. While a conclusive proof of H3 would require a simultaneous labelling of pErk and TrkA, which is currently infeasible due to the lack of appropriate TrkA antibodies, there are three – in our opinion convincing – indications that TrkA causes the population split. First of all, the available measurement data can be described by assuming different TrkA levels. Secondly, the estimate for the fraction of cells with high TrkA levels (

To conclude, in this section we proved the applicability of ODE-MMs to practically relevant biological problems. We used ODE-MMs to study data from primary sensory neurones and to determine subpopulation characteristics and kinetic rates. Furthermore, we provided a data-driven explanation for the observed cell-to-cell variability and validated this explanation partially using new experimental data.

Most multicellular organisms and microbial colonies consist of subpopulations with distinct biological functions. A study of mechanistic differences between these subpopulations and their functions is crucial for a holistic understanding of such complex biological systems. In this work, we introduced ODE constrained mixture models, a novel class of data analysis tools which can help to detect subpopulations and to analyse differences between them using population snapshot data. A simulation example illustrates that ODE-MMs possess a higher sensitivity than classical mixture models and ODE models, which originates from the simultaneous exploitation of distribution information and dependencies between experimental conditions. Furthermore, ODE-MMs provide mechanistic insights, e.g., estimates for kinetic parameters and abundance differences between subpopulations. In contrast to population models relying on a stochastic description of the individual cell

To assess and illustrate the properties of ODE-MMs, we studied the response of primary sensory neurones to NGF stimulation. Therefore, we considered single-cell data for Erk1/2 phosphorylation levels collected by quantitative automated microscopy (QuAM)

Beyond insights in subpopulation substructures, ODE-MM can improve estimates of kinetic parameters. This has been revealed by a profile likelihoods based uncertainty analysis of ODE-MMs for NGF-induced Erk1/2 phosphorylation. We found that kinetic parameters of ODE-MMs with two subpopulations are better identifiable than kinetic parameters of ODE-MMs without subpopulation structure. In many situations additional model complexity and an increased number of parameters results in increased parameter uncertainty. This is however not the case if the more complex model can exploit additional features of the data. In this case the data are effectively more informative for a more complex model resulting in a reduced parameter uncertainty. We are not aware of papers which reported this generic observation.

For our analysis of NGF-induced Erk1/2 phosphorylation we considered three pathway models. While these models consider key network motifs, such as an amplification cascade and a negative feedback loop, they are simple compared to the most detailed models (see

In this study we employed reaction rate equation models to constrained means and medians of mixture components. A further improvement of the sensitivity of ODE-MMs might be achieved by using ODE models which capture the cell-to-cell variability within subpopulations. Possible choices are linear noise approximations

Consistent with our studied biological applications, we considered the special case of constant population sizes. There are however many situations in which spontaneous

In our studies, ODE-MM parameters have been estimated by solving the maximum likelihood problem using multi-start local optimisation. The computational efficiency of this approach could probably be improved by using expectation maximisation (EM) algorithms

The availability of pathway information in databases like KEGG

(ZIP)

(PDF)

The authors are grateful for helpful comments and proof-reading by Fabian Fröhlich and Donna Ankerst.