Skip to main content
Advertisement
  • Loading metrics

AI-powered simulation-based inference of a genuinely spatial-stochastic gene regulation model of early mouse embryogenesis

  • Michael Alexander Ramirez Sierra ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    ramirez-sierra@fias.uni-frankfurt.de

    Affiliations Frankfurt Institute for Advanced Studies (FIAS), Frankfurt am Main, Germany, Faculty of Computer Science and Mathematics, Goethe-Universität Frankfurt am Main, Frankfurt am Main, Germany

  • Thomas R. Sokolowski

    Roles Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Validation, Writing – review & editing

    Affiliation Frankfurt Institute for Advanced Studies (FIAS), Frankfurt am Main, Germany

?

This is an uncorrected proof.

Abstract

Understanding how multicellular organisms reliably orchestrate cell-fate decisions is a central challenge in developmental biology, particularly in early mammalian development, where tissue-level differentiation arises from seemingly cell-autonomous mechanisms. In this study, we present a multi-scale, spatial-stochastic simulation framework for mouse embryogenesis, focusing on inner cell mass (ICM) differentiation into epiblast (EPI) and primitive endoderm (PRE) at the blastocyst stage. Our framework models key regulatory and tissue-scale interactions in a biophysically realistic fashion, capturing the inherent stochasticity of intracellular gene expression and intercellular signaling, while efficiently simulating these processes by advancing event-driven simulation techniques. Leveraging the power of Simulation-Based Inference (SBI) through the AI-driven Sequential Neural Posterior Estimation (SNPE) algorithm, we conduct a large-scale Bayesian inferential analysis to identify parameter sets that faithfully reproduce experimentally observed features of ICM specification. Our results reveal mechanistic insights into how the combined action of autocrine and paracrine FGF4 signaling coordinates stochastic gene expression at the cellular scale to achieve robust and reproducible ICM patterning at the tissue scale. We further demonstrate that the ICM exhibits a specific time window of sensitivity to exogenous FGF4, enabling lineage proportions to be adjusted based on timing and dosage, thereby extending current experimental findings and providing quantitative predictions for both mutant and wild-type ICM systems. Notably, FGF4 signaling not only ensures correct EPI-PRE lineage proportions but also enhances ICM resilience to perturbations, reducing fate-proportioning errors by 10-20% compared to a purely cell-autonomous system. Additionally, we uncover a surprising role for variability in intracellular initial conditions, showing that high gene-expression heterogeneity can improve both the accuracy and precision of cell-fate proportioning, which remains robust when fewer than 25% of the ICM population experiences perturbed initial conditions. Our work offers a comprehensive, spatial-stochastic description of the biochemical processes driving ICM differentiation and identifies the necessary conditions for its robust unfolding. It also provides a framework for future exploration of similar spatial-stochastic systems in developmental biology.

Author summary

Our study presents a spatial-stochastic model for the gene regulatory network (GRN) and the signaling pathway governing cell-fate differentiation during early mouse embryogenesis, specifically at the blastocyst stage. Departing from biophysics-based models of gene regulation, we perform stochastic simulations of the biochemical processes driving early mouse embryogenesis both at the cell and tissue level. Combining these simulations with state-of-the-art AI-aided inference techniques, we successfully parameterize our model, replicating key experimental observations and providing mechanistic insights into the biochemical interactions giving rise to them. Thanks to the stochastic nature of our approach, we quantify the high robustness of ICM specification to various kinds of noise, and provide quantitative predictions for the effects of diverse experimentally testable perturbations. Altogether, we provide a deeper understanding of the intricate mechanisms driving early cell-fate decisions in mouse embryogenesis, highlighting the synergy of local cellular and broader tissue-scale interactions that shape development.

Introduction

The process of maintaining cellular plasticity while concurrently directing functional differentiation of individual cells is a cornerstone paradigm in early biological development. A fundamental question connected to this is how a complex multicellular organism can robustly emerge from a single cell, despite the inherent stochasticity or noise in the biochemical processes driving cell specification. In early mammalian development, the cell signaling and fate specification dynamics during the preimplantation stage of mouse embryogenesis aid as key catalysts for understanding this core paradigm, serving as an intriguing example of a self-organizing system [13].

As such, mouse experimental models have become essential for characterizing genome plasticity, cellular potency, and cell diversity [2, 4, 5], especially considering the rapid advancement of regenerative medicine [6, 7]. It is nowadays relevant for improving stem-cell therapy to unravel the cellular differentiation processes through highly-detailed mechanistic representations of intracellular gene regulation and intercellular signaling, extrapolating from studies of the preimplantation mouse embryo for disentangling human diseases [57]. For this, it is particularly important to understand the biochemical mechanisms granting high robustness and reproducibility to early mammalian embryo development [13, 7]. It is fascinating that, despite key distinctions in gene-regulatory components among mammal species, the spatio-temporal patterning of early embryos unfolds in a seemingly deterministic fashion at tissue level, while its progression is subject to fundamentally stochastic biochemical processes at single-cell level [1]. In particular, highly robust and reproducible lineage proportioning of the vitally significant cellular-fate transition from inner cell mass (ICM) to epiblast (EPI) and primitive endoderm (PRE) populations relies on neither maternal clues nor other positional information determinants [2]. This process thus correctly proceeds regardless of dissimilar experimental or environmental conditions (in vivo, ex vivo, in vitro, and organoid settings), even while being subject to diverse mechanical or geometrical constraints [2, 810].

For the early mouse embryo, these orchestrated processes have been investigated through both experimental and theoretical approaches [1, 1126], and found to rely on dynamic cross-interactions between gene expression, molecular signaling at the tissue level, plus mechanical cues essential for tissue remodeling and proper cell positioning before implantation [2, 3, 5, 8, 2737]. The preimplantation stage in mouse embryos involves two pivotal events transforming the zygote into a blastocyst composed of three distinct cell types [13, 29, 30, 38, 39]. Initially, cell segregation results in the formation of the trophectoderm (TE), which contributes to the embryonic part of the placenta, and the ICM, an uncommitted bipotent progenitor tissue which is the source of embryonic stem (ES) cells [40]. Subsequently, a second cell-fate decision within the ICM leads to the differentiation of EPI and PRE tissues. The EPI, marked primarily by NANOG expression, is pluripotent and gives rise to all embryo-proper tissues. In contrast, the PRE, marked primarily by GATA6 expression, contributes to the formation of extraembryonic supporting tissues. Notably, EPI cells secrete FGF4, a signaling ligand that, noteworthily, promotes PRE cell specification. The FGF signaling pathway, which relies on FGF receptors and ERK for downstream transmission, therefore plays a crucial role in regulating the balance between EPI and PRE cells. Interestingly, FGF4 acts within an external feedback loop that ultimately inhibits its own production, downregulating it in PRE cells [8, 21, 41]. The mechanisms underlying the precise and timely distribution of FGF4 across the ICM, critical for appropriate cell lineage specification, remain elusive [9, 42].

For successful embryogenesis, the TE, EPI, and PRE lineages in the blastocyst must be segregated in specific proportions and within a narrow developmental time window [21, 30, 43]. This last specification from ICM to EPI and PRE occurs over approximately 48 hours between embryonic day (E) 2.5 and E4.5 (i.e., 2.5–4.5 days post fertilization) [8, 21]. By E3.0, when blastocyst formation begins, ICM cells co-express NANOG and GATA6. In the further course, EPI and PRE cells emerge asynchronously from ICM cells in a spatially stochastic manner, influenced by the short-range FGF4 signal and regulated by auto- and paracrine feedback loops [9, 30, 44]. A mutually exclusive gene expression profile emerges, with EPI cells exhibiting high NANOG and low GATA6 levels, and vice versa in PRE cells.

This process is driven by a gene-regulatory network characterized by self-activation and mutual repression of NANOG and GATA6, following a common motif for generating multistable expression states [4549], and is refined by the external FGF4 feedback. The spatio-temporal segregation of EPI and PRE lineages is subsequently coordinated through a mechanical cell-sorting process [37]. The final spatially ordered pattern thus is an emergent property of the differentiating tissue, in stark contrast to systems which employ morphogen gradients for high-precision system-wide coordination of developmental trajectories [5057], although layers of interacting genes downstream of the gradients can acquire self-organizing and scaling capabilities after their position-dependent activation [49, 5864].

Leveraging high-throughput single-cell RNA sequencing (scRNA-seq) and similar technologies for quantitative immunofluorescence and genetic profiling [9, 6570], experimental studies have sought to characterize the dynamics and biophysical parameters of ICM specification, focusing on internal and external developmental cues, molecular mechanisms underpinning gene transcription and mRNA translation, and key signaling pathway components [17, 44, 7173]. Nevertheless, this remains challenging, due to the typically high sparsity and low granularity of scRNA-seq and similar data [7477], as well as the low mRNA and protein abundance of relevant biochemical species, leading to significant biological noise. Here we categorize biological noise as either intrinsic (related to the discrete and stochastic nature of biochemical reactions and molecular transport within cells) or extrinsic (referring to external fluctuations affecting all cells). This noise strongly impacts the reliability of developmental dynamics, particularly during early development [7884], and at the same time hampers experimental measurements.

For these reasons, mathematical and computational models have established themselves as potent tools capable of providing mechanistic insights into biochemical noise control strategies. However, existing models of EPI-PRE specification employ primarily deterministic methods, treating noise as a secondary feature [11, 15, 22, 23, 38]. These models do not fully capture the stochastic nature of transcription, translation, and signaling, among other processes, and therefore may not accurately reflect the impact of biochemical noise on early embryo development.

More recently, deep-learning-based approaches have been successfully used for both pattern recognition and reconstruction of ICM organoid data, employing experimental and synthetic datasets for training [85]. Although these approaches have good predictive power for both cell-fate spatial composition and determination, they do not reveal full mechanistic insights into the gene regulatory interactions that orchestrate ICM patterning.

In this study, we investigate cell specification during blastocyst formation in the early mouse embryo under truly stochastic conditions. In order to elucidate mechanisms that enable robust and replicable cell-type proportioning in the ICM, we have constructed a spatial model that allows for exact simulation of the stochastic dynamics of key biochemical species in the developing ICM. This model incorporates three distinct diffusive-signaling modes between neighboring cells: autocrine, paracrine, and intermembrane ligand-exchange (akin to juxtacrine) signaling via FGF4. As such, our focus is the gene regulatory and cell-cell signaling processes coordinating the differentiation and proportioning of mouse ICM derived lineages (EPI-PRE). We bypass any representation of force interactions between cells (i.e., exclude cell division, proliferation, motility, and sorting), for properly quantifying the synergistic effects of stochastic biochemical processes working at multiple scales without the influence of extrinsic variability.

Departing from traditional event-driven algorithms for simulating the Reaction-Diffusion Master Equation (RDME) in a spatial context, we devised a scheme for simulating the stochastic evolution of the core biochemical species governing early ICM development at the cellular and tissue scales. Our approach leverages the Simulation-Based Inference (SBI) framework, combining our spatial-stochastic simulator with an advanced AI-based inference technique: the Sequential Neural Posterior Estimation (SNPE) algorithm [86, 87]. This allowed us to perform millions of individual stochastic simulations of our spatial system and to use these data to infer parameter sets that align with desired system behavior.

By integrating our biochemically realistic spatial simulations with SBI, we learned parameters that accurately replicate key experimental observations of mouse ICM cell-lineage differentiation, including its temporal dynamics based on reported lifetimes of the involved biochemical species. Successful parametrization of our explicitly stochastic model enabled us to explore the effects of various sources and degrees of system perturbations, providing insights into the potential role of noise in the functional development of mouse blastocyst cell populations, and predicting system properties that so far remained unaddressed.

Our findings predict point and interval estimates of biophysical model parameters, while suggesting that the ICM system, being naturally biased towards a default “raw” state, requires tissue-level coordination for fate differentiation. Specifically, we recapitulate that: (1) early emergence of the EPI lineage is essential for stimulating the PRE fate; (2) the target cell-lineage proportions arise independently of the number of cells in the system; (3) ICM plasticity and its sensitivity to exogenous FGF4 are coordinated in time. In addition, we predict that: (4) intercellular communication enhances the robustness of ICM fate differentiation compared to a purely cell-autonomous process; (5) moderate variability in cellular initial conditions can enhance the accuracy of establishing the proper cell-fate ratio. These recapitulations and predictions provide quantitative bases for experimentally testing the hypotheses and insights contributed by our computational approach.

Our study not only quantitatively explains how the early mouse embryo’s resilience to biological noise arises from the FGF4-driven tissue-level coordination mechanism, but also underscores the existence of critical windows in developmental timing that dictate cell plasticity and responsiveness to neighboring signaling molecules. On the technical and conceptual side, it exemplifies that AI-driven simulation-based inference can be instrumental in uncovering mechanistic details of system-wide coordination of noise control in highly stochastic biophysical systems.

Results

Multiscale spatial-stochastic model of mouse blastocyst development: from ICM progeny to EPI and PRE lineages

We constructed a biophysics-based model of mouse embryo blastocyst formation that features a stochastic-mechanistic description of its intracellular gene-regulatory network and its intercellular diffusion-based signaling processes. It focuses on the differentiation of epiblast (EPI) and primitive endoderm (PRE) lineages from the inner cell-mass (ICM) progenitor population. This differentiation is driven by mutual regulatory interactions between the NANOG and GATA6 genes which act as primary markers of the EPI and PRE fates, respectively. It also accounts for their interactions with FGF4 via FGF receptors and the ERK signaling cascade, which implements a sensing mechanism for external FGF4; the FGF4 proteins are secreted by cells that commit to the EPI fate. We describe our model and simulation methods in detail in section Computational model of mouse blastocyst (ICM cell differentiation) of the Methods part.

For computational feasibility, our model represents the developing tissue as a static two-dimensional (2D) lattice, mimicking a monolayer or 2D cellular culture [9]. This structural representation preserves the essential characteristic of the mouse blastocyst as a densely packed assembly of pluripotent cells that can spatially communicate among each other. Similar geometric assumptions have been successfully utilized in previous ICM models [15, 22, 23]. Each voxel of the lattice symbolizes an individual embryonic cell, providing an accessible foundation to analyze cell-cell communication modes (for more details, see Model at tissue scale).

Derived from established Reaction-Diffusion Master Equation (RDME) simulation schemes, our approach computes realistic molecular count time series using accurate lifetimes for essential biochemical species (detailed in Model at cell scale). This results in an authentic reproduction of the temporal dynamics of ICM fate specification, as seen in Fig 1. We opted for an event-driven scheme in order to maximize computational efficiency of our simulations, which is a necessary precondition for applying the Simulation-Based Inference (SBI) framework to it, as described next.

thumbnail
Fig 1. Workflow summary: GRN motif, cell signaling model, and inference framework.

[A] The developing ICM is represented by a static spatial lattice of biochemical reaction volumes (cells) coupled via FGF4; which mimics a monolayer or 2D cellular culture. Each cell contains a core gene-regulatory network (GRN) featuring mutual repression between the genes Nanog (N) and Gata6 (G), and their self-activation. Both N and G regulate the expression of internal FGF4 (FI). External FGF4 (FE) can either diffuse to a neighboring cell (paracrine signaling), bind to the membrane of the origin cell (autocrine signaling), or be exchanged between neighboring cell membranes. FGF receptors transmit the sensed FGF4 signal back to the core GRN by activating ERK (EI ↔ EA). [B] Pipeline of data generation and analysis. Key stages of parameter inference (columns 1 through 5). Initially, all cells display the undifferentiated (UND) fate (row 2 column 3). Rows 1 and 3 show simulations without and with FGF4 signaling activated, respectively. All generated stochastic trajectories are processed by the same steps: (I) resampling data onto a regular time grid and calculating relevant system observables at cell scale (total NANOG, GATA6, and FGF4 levels); (II) determining the lineage for each cell at every time point (EPI, PRE, or UND); (III) summing up the corresponding total cell count for each fate at tissue scale; (IV) constructing the (joint) pattern score time series. The map between simulation parameters and resultant score time series is used for training a deep neural density estimator via the sequential neural posterior estimation (SNPE) algorithm (row 5 column 5), which directly estimates the parameter posterior distribution (row 5 column 3) conditioned on a target observation (row 4 column 5). Multiple posterior estimates are produced with the same training set, selecting the best learned distribution conditional on the target observation by analyzing a “meta score” distribution (row 4 col 3). This summary/meta score is calculated per posterior, and it relies on the maximum a posteriori (MAP) estimate of all the model parameters. In general, the next-round prior does not need to be the current-round posterior: it is plausible to obtain a well-informed next-round mixture distribution (row 5 col 1). Several iterations of the workflow are performed until the meta score surpasses an arbitrarily prescribed level. For additional information about this data-processing pipeline, please see Model parameter inference framework.

https://doi.org/10.1371/journal.pcbi.1012473.g001

Exploring model parameter space via SBI

In computational modeling of complex biological systems, inferring parameter sets that reproduce experimental observations is a key challenge. This usually requires specific domain knowledge, as the selection of an appropriate parameter search method is chiefly influenced by the particular properties of the problem at hand [88]. One significant hurdle in developmental biology is the lack of a versatile, general-purpose inference method suitable for high-dimensional stochastic models, which typically require large and comprehensive datasets with fine-grained resolution. This expands both in terms of the features measured (e.g., gene expression levels at the single-cell scale rather than bulk measurements) and the temporal resolution (e.g., time-series data capturing dynamics at all relevant scales). However, most experimental studies are unable to simultaneously measure all system variables required for such in-depth mechanistic representation. To bridge this gap, Biologically-Informed Neural Networks (BINNs) and Simulation-Based Inference (SBI) frameworks have emerged as powerful strategies addressing several of these modeling and inferential challenges [8992]. Particularly advantageously, SBI allows to learn multidimensional parameter sets that comply with a prescribed behavior formulated in terms of lower-dimensional utility functions, thus mitigating the limitations posed by the scarcity of detailed quantitative data.

Here we use the SBI framework [9395] to establish a comprehensive parameter exploration workflow for our spatial-stochastic model of mouse ICM fate decisions. Notably, our approach does not depend directly on fitting quantitative experimental data. Instead, it primarily uses qualitative observations to reconstruct system behavior and infer parameter distributions complying with it in a holistic manner. At heart, our approach fuses a novel AI-based technique, specifically the Sequential Neural Posterior Estimation (SNPE) algorithm [86, 87], with concepts from classical inference and optimization strategies [96, 97]. The key steps of our approach are summarized by the following workflow:

  1. (1). Construct an objective function that quantitatively represents the system dynamics as a time-varying “score”, indicating the progression towards a desired or ideal state. For the ICM model studied here, this function traces the deviation from the desired cell-fate ratio (see Constructing the pattern score (objective) function for details).
  2. (2). Simulate the model many times, each with a different parameter value vector sampled from a suitable prior distribution.
  3. (3). Evaluate the objective function using the data obtained from these simulations.
  4. (4). Train an artificial neural network (ANN) to learn the posterior distribution of the model parameters, effectively using the ANN as a surrogate for the simulator to approximate the mapping between parameter values and objective function scores.
  5. (5). Define a target score time series that represents the ideal system behavior, using it as a synthetic observation.
  6. (6). Estimate the maximum-a-posteriori (MAP) parameter value that aligns with the desired system behavior represented by the target score time series, using the posterior distribution predicted by the ANN conditioned on the given target score.
  7. (7). Conduct multiple iterations starting from step (2) until the resulting score time series aligns with the target score time series.

For a more detailed description of these steps, see Fig 1 and Model parameter inference framework. In a companion study [98], we compare this SNPE-guided methodology against a strategy inspired by a classical optimization algorithm, specifically simulated annealing; we find that the AI-based approach surpasses its classical counterpart in predictive performance, assuming equal allocation of computational resources. This superiority can be traced back to the innate flexibility of the ANN and the absence of a native parameter interpolation procedure in simulated annealing.

Reproduction of tissue-level features defines a score function.

We explored ICM differentiation guided by three pivotal characteristics which informed and constrained our parameter inference workflow.

The first key feature is the high reproducibility of the EPI and PRE cell-fate proportions in the blastocyst [3, 29, 38]. Despite varying reports on its precise ratio, here we adopted a typically reported value of 2 : 3 [99]. Note that minor deviations from this ratio are unlikely to significantly change the inferred posterior distributions or other findings of our model.

Secondly, blastocyst formation occurs within approximately 1.5–2 days of development, starting around E2.75 [2, 3, 30]. The EPI and PRE populations must reach and sustain the required fate proportions 8 to 12 hours before the end of the preimplantation period (around E4.75). This requirement serves as another optimization criterion for our model, regardless of the actual underlying cell-differentiation mechanism.

The third key feature is the role of spatial coupling via FGF4 in fate determination. In its absence, almost the entire ICM population assumes the EPI fate, which impedes the exit from naive cellular pluripotency [21, 73]. Correspondingly, only few cells adopt the PRE fate, a phenomenon attributable to intrinsic stochasticity at the gene expression level. This highlights the critical importance of local cell-cell signaling for correct pattern formation, and the inherent bias of the ICM population towards a naive pluripotent state.

All these key features together were incorporated into the design of the “score function” used in our SBI approach, in order to infer parameter distributions complying with them. The mathematical definition of the score function is described in Methods section Constructing the pattern score (objective) function.

Comparative analysis of parameter sets for complementary wild-type and mutant models via SBI.

Exploiting our SBI workflow, we successfully identified two distinct sets of model parameters that shed light on crucial aspects of the underlying biological problem. These two parameter sets correspond to two complementary models: one representing a wild-type system with functional cell-cell communication via FGF4/ERK, referred to as “Inferred-Theoretical Wild-Type” or ITWT, and another mimicking a mutant system without FGF/ERK signaling (more precisely, without cell-cell FGF4 signaling), referred to as “Reinferred-Theoretical Mutant” or RTM. The RTM is a direct adaptation from the ITWT, but with reinferred core GRN interaction parameters; see also Tables 1 and 2.

Fig 2 summarizes the key differences between the ITWT and RTM systems, contrasting the respective inferred parameter values, i.e. the maximum-a-posteriori-probability (MAP) estimates, for both the central intracellular GRN components (top row) and the extracellular signaling topology (bottom row). The most notable difference between the ITWT and RTM models is the reversal of the relationship between the self-activation thresholds of Nanog and Gata6. For the ITWT, the half-saturation threshold of Nanog self-activation (Nanog_NANOG) is lower/stronger than the half-saturation threshold of Gata6 self-activation (Gata6_GATA6). However, both models exhibit a well-balanced mutual repression between Nanog and Gata6, albeit with varying interaction strengths. We present a complete overview of the full model parameter distributions inferred in this study, and a first approximation of the model parameter sensitivities, in S1 Appendix as well as S1 and S2 Figs.

thumbnail
Fig 2. Summary of inferred central GRN and signaling model parameters (ITWT versus RTM).

[A-D] Two parameter sets were learned for distinct systems capable of recapitulating the correct final ratio between the emerging ICM lineages (EPI and PRE). The wild-type-like system ITWT (inferred-theoretical wild-type—left column) with functional cell-cell signaling via FGF4 (green color), and the mutant-like system RTM (reinferred-theoretical mutant—right column) for which FGF4 signaling was inhibited. Red- and cyan-colored numbers show the inferred values (MAP estimates) of the relevant parameters defining the central intracellular GRN (top row) and the extracellular signaling topology (mid row) components. Red- and cyan-colored percentage values indicate relative interaction strengths calculated via the formula ν = 100(ωθ)/ω. Here ω is the length of the respective parameter range used for the inference scheme, and θ is the MAP estimate of the particular parameter. Cyan and red colors refer to gene activation and repression, respectively. [E, F] Conditional model parameter correlation matrices (ITWT versus RTM—bottom row). Pearson’s correlation coefficients computed after conditioning posterior distributions on their own MAP estimates. Note high (absolute) linear correlations between Nanog_NANOG, Gata6_GATA6, Gata6_NANOG, and Nanog_GATA6, suggesting potential compensation mechanisms among parameter-value pairs.

https://doi.org/10.1371/journal.pcbi.1012473.g002

Both models correctly recapitulate the final ratio (of cell counts) between the two emerging ICM lineages (EPI and PRE) of the fully-formed mouse blastocyst. However, the ITWT relies on a cell non-autonomous mechanism in which spatial coupling via FGF4 is essential for accurate and precise lineage specification. In contrast, the RTM relies on a cell autonomous mechanism, which is purely probabilistic. We explore this mechanistic difference and its consequences in more detail below (sections Intercellular communication via FGF4 functionally improves ICM differentiation robustness by 10–20% compared to a purely-binomial baseline scenario and Robust cell-fate proportions are independent of cell-grid size). In the following, however, we focus on the ITWT, as it incorporates the necessary FGF4 signaling confirmed by experiments.

Posterior distributions approximated via SBI reveal correlations and potential compensatory mechanisms between model parameters.

To illustrate a key advantage of our AI-powered inference method, which extends beyond simply providing point and interval estimates of model parameters aligned with a postulated target behavior, here we exemplify the generative capabilities of an ANN trained via the SNPE algorithm. Specifically, we compute the correlation matrix of parameter interactions for the two inferred models, ITWT and RTM (see Fig 2E and 2F). The trained ANN serves as a direct surrogate for the posterior distribution consistent with the target behavior, and can be easily conditioned on the optimal model parameter set represented by its MAP estimate (see Fig 2A–2D).

This conditional posterior is particularly useful for exploring parameter space structures, as it can generate linear correlation coefficients between any pair of model parameters, thereby uncovering compensation mechanisms that preserve the desired system behavior. More importantly, since the trained ANN functions as an implicit surrogate for our spatial-stochastic simulator, this analysis is computationally efficient and feasible, unlike traditional parameter sensitivity or perturbation analyses that typically depend on brute-force techniques that require extensive additional simulations [89].

For brevity, we focus on the core gene regulatory network (GRN) interactions for both ITWT and RTM models, as depicted by the size of their respective conditional correlation matrices (Fig 2E and 2F). A comprehensive analysis of the entire parameter space structure is presented in our companion study [98]. Here, we highlight the most significant predictions derived from our approach using the estimated full model parameter posterior distributions. These predictions reveal potential compensatory mechanisms among parameter pairs that ensure the robustness and reproducibility of the target patterning behavior.

For the ITWT model (Fig 2E), we highlight the strong flexibility of the simulated biophysical system in response to perturbations of core GRN parameters, specifically Nanog_NANOG, Gata6_GATA6, Gata6_NANOG, and Nanog_GATA6. This flexibility is evidenced by their pairwise high absolute conditional correlation coefficients, which is notable given their highly nonlinear interactions. The trained ANN has successfully learned the complementary relationships among these parameters. For instance, weakening an auto-activation threshold can be counterbalanced by strengthening its associated mutual repression threshold to preserve overall system behavior, as shown in Fig 2E by their anti-correlated interactions. Furthermore, the trained ANN predicts the existence of linear hyperplanes where different parameter values can effectively compensate for each other, creating a large population of behavior-conserving models. This underscores the strong perturbation resilience and adaptability characteristic of real biological systems [95, 100, 101]. These resilience and adaptability traits emerge in the presence of not only model parameter variations but also intrinsic and extrinsic sources of biochemical noise.

The ITWT conditional correlation matrix also predicts a high interdependence of parameter value choices for the secondary core GRN parameters, particularly Fgf4_NANOG, Fgf4_GATA6, and Nanog_A-ERK, but excluding Gata6_A-ERK. The activation and repression thresholds for Fgf4 must move in tandem, as they are positively correlated. However, these regulatory thresholds of Fgf4 are correlated with opposite signs with the auto-activation thresholds of either Nanog or Gata6 (compare first and second entries in columns Fgf4_NANOG and Fgf4_GATA6, respectively).

Note that while we purposefully included the repression of Gata6 expression by active ERK protein, motivated by recent in vitro experiments [102], our correlation analysis predicts that this interaction may be redundant or unnecessary for achieving and replicating the target system behavior. Indeed, the true role of this interaction remains a topic of debate in several experimental and theoretical studies [9, 15, 38, 69]. This observation underscores the potential utility of our approach for discovering and validating topological structures of similar GRNs.

For the hypothetical RTM model (Fig 2F), similar observations can be made. Here, the linear coefficients are expectedly higher compared to the ITWT case, as the recapitulation of the target behavior relies solely on the proportional compensation of four parameter values. Since adjusting cell-cell signaling is not possible in the RTM model, retaining proper cell fate proportioning requires to entirely focus on the activation and repression thresholds. As in the ITWT model, the trained ANN predicts clear complementary linear relationships among primary core GRN parameter pairs.

EPI and PRE fates emerge robustly at tissue scale in spite of high single-cell variability

Our simulations were initiated with all cells in an undifferentiated (UND) state with a balanced distribution of cellular resources. At cell scale, the first part of a typical stochastic trajectory reveals the dynamic interplay between NANOG and GATA6 proteins, with FGF4 protein expression being adjusted in response to the levels of these two pivotal regulators. After about 12 hours, on average, the initial symmetry between NANOG and GATA6 is broken, as their proteins embark on divergent expression paths; this is exemplified for the case of high GATA6 and low NANOG final expression in Fig 3A and 3B. As time progresses, the individual cells predominantly commit to a PRE or an EPI fate, accompanied by a decrease or increase in FGF4 expression, respectively. This differentiation is clearly depicted at the tissue scale, with cells categorically aligning into one of three fates: UND, EPI, or PRE (Fig 3B).

thumbnail
Fig 3. Stochastic trajectories at cell and tissue scale.

The inferred-theoretical wild-type (ITWT) system shows excellent agreement with various experimentally observed characteristics of ICM cell specification. All simulations start with all cells in the undifferentiated (UND) state, having well-balanced cellular resources, before committing exclusively to EPI or PRE fates. [A] Example cell-level stochastic trajectory, randomly taken from a 100-cell (10×10 grid) simulation. Within the first 24 h, the cell remains in the UND fate while the FGF4 protein level increases. At around 24 h, a symmetry-breaking event occurs, as the NANOG and GATA6 protein levels take divergent expression paths. Within the last 24 h, the cell acquires the PRE fate while FGF4 levels decrease again. [B] Example tissue-level stochastic trajectory of cell fate counts in a 10×10 cell-grid system. Each cell is categorized into one of three possible fates: UND, EPI, or PRE. With progressing time, UND cells reduce in number while EPI and PRE cell counts settle close to prescribed constant target levels (dotted lines); see Methods section Computational experiments for details of cell-fate classification. [C] Typical cell-level behavior of NANOG (blue), GATA6 (orange), and FGF4 (green) total protein levels. [D] Typical tissue-level behavior of EPI (blue), PRE (orange), and UND (olive) cell-fate counts. Solid lines represent means, dashed lines represent standard deviations around means. Statistics are computed from a batch of 1000 simulations of a 100-cell tissue system.

https://doi.org/10.1371/journal.pcbi.1012473.g003

The variability in the expression profiles of NANOG, GATA6, and FGF4 at the cell scale is significant, as shown by the large standard deviation in molecular counts across different cells and simulations (Fig 3C). However, this cell-scale variability does not translate into high variability of the cell fate ratio at the tissue scale, which instead exhibits remarkable robustness, with a significantly lower standard deviation (Fig 3D). This observation alone underscores the system’s ability to integrate and manage cellular variability, ensuring consistent and reliable outcomes in the differentiation process across the tissue.

EPI cells precede and are necessary for specification of PRE cells

Transitioning from this foundation of robust differentiation, the model predicts an early and critical onset of the EPI lineage, occurring around 2 hours into the simulated trajectories, with the PRE lineage emerging only about 12 hours later. The emergence of the first EPI cells is tightly clustered within a narrow time frame between 2 and 3 hours of development (gray region in Fig 4A); this temporal behavior is consistent across a wide range of initial conditions (detailed in Computational experiments). Following this timely commitment, the EPI population expands rapidly, attaining, on average, 75% of its target proportion within the initial 4 hours (see Fig 3D). This leads to elevated FGF4 levels among the newly specified EPI cells, enabling the distribution of FGF4 across the ICM. Only then PRE cells begin to appear in significant numbers within a broader time window, ranging from 8 to 17 hours (refer to the gray region of Fig 4B), implying an approximate 7-hour delay between the emergence of the first EPI and PRE cells, in line with recent experiments [30]. Subsequently, the PRE cell population gradually increases, reaching its target proportion at around the 40-hour mark (Fig 3D).

thumbnail
Fig 4. EPI cells precede and are necessary for specification of PRE cells.

Expression profiles reveal strong linear correlations among key proteins. Upper row shows typical cell-level protein count time evolution for each cell-fate category. Cell fates were identified at 48 h. For each fate, cell-level dynamics were traced from last to initial simulation data point. [A-C] Solid lines represent mean behavior, dashed lines represent standard deviations around means. Gray regions accentuate time intervals of first differentiated cell appearance for the given fate (vertical dash-dotted line = mean time). Statistics are computed from a batch of 1000 simulations. Only the most relevant proteins are shown: FGF4 (green), NANOG (blue), GATA6 (orange). [D-F] Pairwise relationships among the most important proteins at 48 h. Inner panels show different colors representing the protein-pair relationship for the respective cell fate: UND (olive), EPI (blue), PRE (orange). Outer panels show protein count histograms. Data points come from a batch of 1000 simulations. rxy = Pearson’s product-moment correlation coefficient. For details of cell-lineage classification, see Methods section Computational experiments.

https://doi.org/10.1371/journal.pcbi.1012473.g004

Nascent PRE cells are coordinated by EPI cells via differential expression of Fgf4, and the expression profile of Gata6 is a principal indicator of the onset of PRE cells. This coordination is controlled by nuances in fate-specific FGF4 distributions. Such nuances not only control the emergence of PRE lineage, but they are also key for EPI- and PRE-fate maintenance [9, 21]. Tight control of Nanog expression in EPI cells is a requirement for escaping naive pluripotency during the implantation stage [103105].

Note that in our model the observed precedence of EPI emergence over PRE cells arises naturally as a predictive result, without any explicit incorporation into the modeling or inference procedures. The mechanism driving the delayed emergence of the opposing cell fates can be summarized as follows: first, stochastic self-activation of Nanog triggers the EPI fate specification program in a subset of the ICM cells. This, in turn, promotes the progressive differentiation of other unspecified cells into the PRE fate when they sense FGF4, which is only released by cells where Nanog reached substantial expression levels. A crucial precondition of this mechanism is the lower self-activation threshold of Nanog compared to Gata6. Despite the inherent stochasticity of the differentiation process at the single-cell level, coordination via FGF4 makes it appear deterministic at the tissue level.

Expression profiles reveal strong linear correlations among key regulatory proteins

Our simulations show that the copy number distributions of the three key proteins NANOG, GATA6, and FGF4 are clearly bimodal by the 48-hour mark, as depicted in Fig 4D–4F. At the beginning of every simulation, we use a well-defined initial condition distribution (ICD) which restricts the protein and mRNA expression levels to a region where all the cells start with the undifferentiated (UND) fate. This guarantees symmetric splitting of initial resources on average and prevents any systematic fate bias at the simulation start. Despite the innate randomness of the ICDs employed for simulating, early variability does not have adverse effects in the final lineage proportions. Instead, contrasting expression profiles slowly emerge and are clearly visible by the last simulation time point (48 hours).

A notable observation from our simulation data are strong linear correlations among these key regulatory proteins, which emerge in spite of the nonlinear regulatory interactions between them. Specifically, we identify a pronounced negative linear correlation between NANOG and GATA6, as well as between GATA6 and FGF4 (refer to Fig 4D and 4E). Conversely, a strong positive linear correlation is observed between NANOG and FGF4 (see Fig 4F).

Affirming the validity of our simulation approach, these robust linear relationships mirror experimental findings: Fig 5D in [65] shows similar trends among Nanog, Gata6, and Fgf4 mRNA levels at ∼ 64-cell stage, using sc-qPCR data; Fig 2B and Fig 3B in [66] show correlated mRNA levels of EPI-PRE markers at ∼ E4.5 and the anticorrelation between NANOG-GATA6 protein levels at ∼ E3.5, respectively, employing sc-qPCR data; Fig 4B in [68] shows negative correlation between Gata6 and Sox2 (proxy for Nanog) mRNA levels at ∼ E4.5, utilizing scRNA-seq data; Fig 2B in [9] shows negative correlation between GATA4-mCherry (proxy for GATA6) protein and Fgf4 mRNA levels at multiple induction times, using qHCR imaging. We remark that our simulation data go beyond these available experimental measurements, because we provide a more granular and completely quantitative perspective on the correlations among these key regulatory elements.

Intercellular communication via FGF4 functionally improves ICM differentiation robustness by 10–20% compared to a purely-binomial baseline scenario

In order to investigate the robustness to noise in ICM specification, we assessed whether and to which extent tissue-level coupling via FGF4 is capable of reducing variability in the acquired cell fates. To this end we compared the variability observed in our Inferred-Theoretical Wild-Type (ITWT) model simulations to an entirely cell-autonomous fate decision-making scenario. In such hypothetical “Purely Binomial” (PB) scenario, the cell lineage distribution is supposed to follow a binomial pattern, as each cell’s fate, either EPI or PRE, is determined independently of others. The comparison was carried out by analyzing the coefficient of variation (CV) of 48-hour fate proportions across 13 different system sizes (η = [5, 10, 15, 25, 35, 50, 65, 75, 85, 100, 150, 225, 400] cells) for two cases.

For the first case, we calculated the coefficient-of-variation (CV1) as the ratio of sample standard deviation to sample mean. For the second case, the hypothetical PB scenario, the corresponding measure (CV0) is simply the coefficient of variation of the binomial distribution, with standard deviation being a function of mean fate numbers and total cell count. Interestingly, for the EPI and PRE fates (excluding the UND category) CV1 is consistently lower than CV0, indicating of noise reduction due to FGF4 signaling (see Fig 5A). The ratio CV0/CV1 suggests that the ITWT model outperforms the PB model by 10–20%, implying fewer incorrectly specified cells (inset of Fig 5A).

thumbnail
Fig 5. Noise at tissue level: intercellular communication improves ICM differentiation robustness by 10–20% compared to purely-binomial baseline scenario.

[A] Comparison between inferred-theoretical wild-type (ITWT ∼ CV1) and purely-binomial (PB ∼ CV0) systems. The main plot shows the coefficient of variation (CV) as a function of the system size for each cell fate (colors as in Fig 4). The inset shows the ratio between CV0 and CV1, highlighting systematically lower fate specification error in the ITWT compared to the PB system. [C] Comparison between reinferred-theoretical mutant (RTM ∼ CV1) and purely-binomial (PB ∼ CV0) systems. Colors and symbols as in [A]. The inset plot shows that fate specification errors in the spatially uncoupled RTM system are of the same magnitude as in the PB scenario. Axes use logarithmic (base 10) scale. For each system size, the data point is computed from a batch of 1000 simulations. [B, D] Typical time evolution of normalized cell fate counts for three example system sizes (η ∈ [25, 100, 400] cells). Solid lines represent means, dashed lines represent standard deviations around means.

https://doi.org/10.1371/journal.pcbi.1012473.g005

To further validate the role of cell-cell communication in enhancing patterning robustness, we also compared the ITWT model to the Reinferred-Theoretical Mutant (RTM). The RTM lacks cell-cell signaling (see Fig 2 and S1 Fig, plus Methods section), reproducing the prescribed cell-fate ratio (on average) with a purely cell-autonomous patterning mechanism. We asked whether binomial noise emerges naturally in this system. Indeed, we found that the CV of the RTM model is comparable to that of a PB model (Fig 5C). Both systems adhere to the same power law, with negligible differences across system sizes (inset of Fig 5C).

In conclusion, our findings demonstrate that cell-to-cell communication via FGF4 diffusion, encoding local environmental variations, enhances ICM fate differentiation robustness by approximately 10–20% compared to a purely cell-autonomous scenario. This finding corroborates the notion that ICM differentiation is a tissue-level process, where tissue-scale signaling feedback via FGF4 plays a functional role in mitigating cell-fate decision noise. At the same time, it is in line with previous studies highlighting the benefit of spatial coupling for noise reduction in developing tissues [46, 47, 54, 55, 106108].

Robust cell-fate proportions are independent of cell-grid size

Recent experiments suggest that robust control in the EPI to PRE lineage ratio does not depend on the absolute size of these populations [29, 99]. Resilience of the mouse embryo to variations in ICM size, as reflected by alterations in total cell number, was found both in vivo and in silico [1, 8]. Nonetheless, there remains a debate on whether a critical embryo size is essential for proper blastocyst lineage segregation [7, 13, 22, 109]. To assess the impact of absolute tissue size (cell number) on ICM specification, we analyzed cell-fate proportions and associated noise levels across various tissue sizes, hypothesizing that smaller cell numbers might correlate with increased noise in fate decisions.

We conducted 1000 simulations for each of 13 distinct cell grid sizes, ranging from 5 to 400 cells in total, for both the ITWT and the RTM models. We find that noise intensity scales with system size with a power law in both models, as illustrated in Fig 5A and 5C. This indicates that noise diminishes predictably as system size increases.

Moreover, our simulations reveal a universal mean value (μ) for cell-fate proportions, consistent across all different cell grid sizes for both the ITWT and RTM. Despite this, the two models show, respectively, unique characteristics in commitment times, standard-deviation magnitudes, and independence of fate choice among cells (see Fig 5B and 5D).

The commitment time discrepancies between the ITWT and RTM models can be attributed to their distinct mechanisms. The ITWT augments probabilistic differentiation with tight control via FGF4 signaling, which makes it more resilient against perturbations. This mechanism requires initial random emergence of a portion of the EPI population, which subsequently coordinates other undifferentiated cells towards specific fates based on local neighborhood information, thus globally regulating EPI-PRE proportions. Early commitment to the EPI fate and subsequent emission of FGF4 is crucial to this process. In contrast, the RTM relies purely on stochastic differentiation, and is more sensitive to perturbations. This system lacks regulation beyond the inherent genetic program at the cellular level and does not integrate tissue-neighborhood information, with fate commitment timing primarily dictated by target protein levels for EPI and PRE markers.

The computationally predicted scaling rule in Fig 5 could potentially be tested in vitro using current experimental techniques. In a wild-type-like system, one could measure the mean and standard deviation of specified cell numbers as a function of the overall population size at predefined time points. This could also be done in a mutant-like system with cell-autonomous EPI-PRE differentiation (lacking cell-cell signaling), however here the key challenge would be the creation of such system in the first place. This could be facilitated by recent advances in optogenetic expression control for altering the signaling network in a targeted fashion [110, 111]. For both systems, the predicted scaling could be assessed by artificially splitting the original cellular population into subsets with distinct cell numbers, e.g. by creating grids with varying numbers of communicating cells (for the WT system), or uniformly randomly sampled cellular neighborhoods of varying size (for the mutant system). This experimental strategy would mimic modern bootstrapping techniques [112].

We emphasize that, unlike the purely cell-autonomous case where binomial noise is expected, the predicted scaling in the spatially coupled case is non-trivial. Intercellular communication in the ICM is a nonlinear phenomenon influenced not only by fluctuating copy numbers of key intracellular molecular players but also by response delays arising from the production and transport of signaling agents. This necessitates integration of stochastically fluctuating neighboring information.

In sum, our findings suggest no critical cell number for accurate ICM fate specification; however, its precision increases with the (square root of the) cell number, while spatial coupling via FGF4 can reduce the noise magnitude by ∼20% compared to a purely cell-autonomous mechanism.

Autocrine- and paracrine-signaling modes play reciprocal roles in robust cell-cell communication

One key characteristic of the mouse blastocyst is the overwhelming dominance of EPI cell fates when FGF4 production is inhibited or related loss-of-function mutations are applied. In such cases, almost all cells commit to the EPI fate by the time of implantation, as they are unable to exit naive pluripotency due to the absence of mechanisms controlling precise Nanog expression, leading to adverse developmental outcomes [21, 73, 105, 113, 114].

In agreement with this (and as demanded by the imposed score function), when FGF4 signaling is impeded in our simulations, cells initially co-express EPI and PRE fate-specific markers, but eventually only a small subset adopts the PRE fate [9, 21]. This leads to NANOG upregulation and commitment to the EPI fate in the majority of ICM cells [8, 30].

To dissect the roles of different communication modes in ICM specification, we modified the ITWT model to interrupt specific components of the signaling pathway. This model incorporates three distinct communication modes between neighboring cells: autocrine, paracrine, and intermembrane ligand-exchange (akin to juxtacrine) signaling via FGF4. Autocrine communication occurs when a cell secretes signaling molecules that bind to receptors on its own membrane, thereby regulating its own activity. Paracrine communication, in contrast, involves the release of signaling molecules that act on nearby cells within the local environment, facilitating coordinated responses among neighboring cells. Intermembrane ligand-exchange occurs between adjacent cells when signaling molecules unbinding from the membrane of one cell are caught by receptors on the membrane of the neighboring cell.

In this way, we created three “theoretical mutants” implementing the following signaling scenarios: complete absence of FGF4 (TM-APM), lack of autocrine signaling (TM-A), and absence of paracrine signaling and membrane-to-membrane exchange (TM-PM). Each modification affects cell fate determination differently, as shown in Fig 6A–6C.

thumbnail
Fig 6. Autocrine- and paracrine-signaling modes play reciprocal roles in robust cell-cell communication.

The figure shows the effects of perturbing autocrine- and paracrine-signaling modes in theoretical mutant (TM) systems. [A] The TM-APM (inhibition of autocrine, paracrine, and membrane-exchange signaling) represents the loss-of-function phenotype of experimental FGF/ERK pathway mutants. [B, C] The complementary TM-A and TM-PM portray the importance of individual feedback modes (paracrine plus membrane exchange and autocrine signaling only, respectively; see also inset text). Solid lines represent mean and standard deviation values (μ and σ) for a given TM; dashed lines represent μ and σ for the inferred-theoretical wild-type (ITWT). [D] Cell-lineage allocation at the 48 h time point. All TMs are compared against the (ITWT). For all compared systems, statistics were calculated from 1000 independent simulation runs. Colors represent different cell fates: UND (olive); EPI (blue); PRE (orange).

https://doi.org/10.1371/journal.pcbi.1012473.g006

As expected, the TM-APM model exhibits a strong bias towards the EPI fate (Fig 6A and 6D). Initially, its dynamics parallel those of the ITWT system, but as the simulation progresses, the EPI fate predominates, with only a minor fraction of cells adopting the PRE fate.

The TM-A model displays a decreased accuracy of the ICM specification mechanism due to the absence of self-regulation, though the overall precision of the system remains unaffected (Fig 6B and 6D). Adjustments in other signaling components could potentially correct for this, but these also would likely reduce the system’s ability to buffer against dynamic signaling perturbations.

In the TM-PM model, the critical role of paracrine communication and, to a lesser extent, FGF4 membrane exchange becomes evident (Fig 6C and 6D). Eliminating these communication modes disrupts the maintenance of target cell-lineage ratios, even when autocrine signaling is preserved. During initial simulation phases, differentiation seems normal, but over time, a significant fraction of cells remains undifferentiated, and the EPI lineage fails to sustain its population. Exchanging FGF4 with neighboring cells therefore is crucial for correct cell-fate specification, once again underpinning the importance of its tissue-scale coordination.

Although cells can only indirectly distinguish between autocrine and paracrine molecules (via receptor specificity-tuning or localization, and spatio-temporal patterns), it is plausible to quantify their particular roles played in proper EPI-PRE specification through the adaptation of recent bioengineering and programmable synthetic biology tools that emulate cell-cell communication [111, 115, 116].

Our findings predict that, as shown in Fig 6B and 6C, the absence of autocrine mode affects accuracy of cell-type proportioning, decreasing it by approximately 20% (while precision or noise remains unchanged). In contrast, the absence of the paracrine mode impacts both accuracy and precision, deteriorating homeostatic capabilities by approximately 30%. These magnitudes consider the collective distortions across all target cell-lineage populations (EPI-PRE-UND) at the final simulation time point.

In summary, both autocrine and paracrine signaling are integral to ICM differentiation and maintenance. Autocrine signaling ensures the accuracy of fate specification, while paracrine signaling, along with membrane exchange, maintains lineage proportions, enhances precision, and promotes cellular homeostasis.

ICM cells produce similar local and global neighborhood features: lineage ratios are preserved at both scales

We next asked whether the tissue-level spatial coupling via FGF4 leads to specific signatures in the emerging spatial distribution of cell fates.

A central challenge in developmental biology is the precise characterization of spatial patterns, such as the “salt-and-pepper” arrangement reported for the ICM, which has often been described informally in the literature [11, 16, 38, 117, 118]. The term typically implies a random distribution, yet randomness in a mathematical context can take various forms. Recent studies have endeavored to rigorously define this pattern using experimental and theoretical approaches [10, 25, 35, 85, 119]. Here we understand the “salt-and-pepper” pattern as an archetype in which each individual cell-fate decision is independent of the cell fates of its neighbors, which implies a multinomial distribution of cell fates in every tissue neighborhood.

The dynamically growing ICM is also shaped by cellular division and intercellular forces, which can lead to local fate clustering and compositional variability, as reported in prior studies [11, 35, 37]. In such scenario, a multinomial or “salt-and-pepper” distribution is not expected in the first place. However, here our static cell arrangement isolates the problem from these factors, and allows for an analysis that focuses solely on the influence of cell-cell communication on the spatial distribution of cell fates. We therefore asked to which extent the spatial cell distribution in our simulated ICM system agrees with or deviates from a multinomial baseline.

To this end, we first determined the neighborhood composition in our Inferred-Theoretical Wild-Type (ITWT) model, focusing on the three cell fate categories: EPI, PRE, and UND (Fig 7A–7C and 7E–7G). For each cell within a specific category, we included neighbors up to a predetermined degree (first or second-degree neighbors, as seen in Fig 7A–7C and 7E–7G; see Fig 7D and Model at tissue scale for the details of neighborhood stratification). We then compared the resulting cell-fate arrangements to the distributions resulting from artificially generated systems, in which the cell fates were sampled from a multinomial distribution with the same proportions as extracted from the simulated data. This comparison was carried out for all simulated times.

thumbnail
Fig 7. Exploring self-similarity and spatial dynamics of simulated ICM neighborhoods.

[A-C, E-G] Temporal evolution of spatial correlations among acquired cell fates. Panels [A-C] (upper row) and [E-G] (lower row) show cell-neighborhood degrees 1 and 2, respectively (see also panel [D]). Here, ρ1 and ρ0 represent mean cell-neighborhood composition values split by fate: ρ1 is directly computed from the spatio-temporal distribution of cell fates in inferred-theoretical wild-type (ITWT) simulations; in contrast, ρ0 represents a multinomial baseline model assuming that cell fates emerge in a spatially uncoupled fashion from independent trials leading to the acquisition of one of three cell fates with fixed occurrence probability. The latter is estimated by averaging cell fate proportions at every time point across simulations, discarding any spatial information. [D] Example of a cell neighborhood at the 48-h time point. Numbers indicate the neighborhood degrees (1 and 2) relative to the reference cell (0). For details on the neighborhood stratification methodology, see Model at tissue scale. [H] Mean and standard deviation of FGF4 protein count as functions of maximum cell-neighborhood degree. FGF4 levels quickly become uncorrelated beyond the nearest-neighbor cell distance. Colors represent different cell fates. ρ0: EPI (turquoise); PRE (deep pink); UND (forest green). ρ1: EPI (blue); PRE (orange); UND (olive).

https://doi.org/10.1371/journal.pcbi.1012473.g007

A subtle discrepancy, especially at the first-degree neighborhood level, emerged between the non-cell-autonomous system model (ρ1) and the cell-autonomous, multinomial model (ρ0) after 24 hours of simulation (inset plots of Fig 7A–7C). This difference, about ±3% in average neighborhood composition, is significant when considering their sampling distributions (standard errors are around ±0.5%). However, when comparing to the variation between simulations (with a ±10% standard deviation), this discrepancy becomes less significant.

Furthermore, we analyzed the FGF4 distribution across five independent neighborhood degrees ([0, 1, 2, 3, 4]; Fig 7H). After 48 hours, FGF4 predominantly remained concentrated near its source (EPI cells) but also spread to more distant neighboring cells at a notably lower molecular count. This finding aligns with experiments reporting that, in artificial systems, FGF4 signaling proteins stabilize around their source cells at a single cell-distance length scale [9]. It suggests that diffusive coupling balances the overall FGF4 level across the system, essentially acting as a quorum signal that reflects the proportion of FGF4-producing cells at the tissue scale.

Our findings indicate that both at the local and global scales, the spatial distribution of cell fates is similar, irrespective of whether cell differentiation is coordinated at the tissue level, as in our ITWT system, or purely cell-autonomous. This observation seems counter-intuitive, given the necessity of cell-cell signaling for proper lineage establishment. However, this may be part of a strategy to withstand strong perturbations (like drastic changes in cell population) for which cell fate proportions must be maintained locally but coordinated globally, such that neighborhood characteristics are preserved at both scales. This results in a seemingly irregular pattern that nevertheless preserves cell-fate ratios in a spatially homogeneous fashion.

Increased variability in initial conditions enhances developmental accuracy while sustaining its precision

We next assessed how variability in initial condition distributions (ICDs) affects the accuracy and precision of cell-fate specification in our ITWT model; here, “accuracy” is defined as the closeness of the average simulated EPI-PRE proportions to the target ratios, and “precision” refers to the variability of these proportions among simulation ensembles.

Our approach was guided by two aspects: firstly, the broad inferred posterior distributions of the parameters governing the ICDs, and their moderate sensitivity to value changes (S2D and S2H Fig); secondly, similar assessments that were carried out in previous models of mouse-blastocyst, where mainly the ICD variance was modulated [15, 22]. Taking this into account, we generated 1000 stochastic trajectories for 10 distinct ICDs, modifying their variance from 0% to 200% compared with the baseline value (see Fig 8A for examples); the details of ICD modulation are described in Computational experiments. We analyzed both the mean cell count (accuracy) and the corresponding standard deviation (precision) for each cell fate (Fig 8C and 8D).

thumbnail
Fig 8. Initial-condition variations enhance patterning accuracy and simultaneously sustain its precision.

Robustness to initial-condition perturbations (ICPs): uniform perturbation of protein (NANOG, GATA6) and mRNA (Nanog, Gata6) initial resources. [A] Example initial-condition distributions (ICDs) with increasing variability (see legend) for NANOG and GATA6 proteins. Interior scatter plots show sampling intervals; exterior plots show sampling distributions. The 0% distribution defines the unperturbed baseline case. For each ICD, data points come from 1000 independent simulations. [B] Mean cell-fate composition (bars) with standard deviation (error bars) for all tested variability intensities. Bar height ranges from 50 to 100 cells. [C] Time trajectories of mean cell-fate counts for selected ICD variability intensities. Dotted lines specify target proportions for EPI and PRE fates, respectively. [D] Time trajectories of cell-fate count standard deviations for selected perturbation intensities. Inset plot displays temporal coefficient of variation (CV = σ/μ) for the last 24 h. Colors represent different cell fates. Statistics were calculated from batches of 1000 independent simulations per ICD.

https://doi.org/10.1371/journal.pcbi.1012473.g008

Remarkably, increasing the variance of the ICD can, in some cases, positively influence cell-fate specification in the ITWT system. For example, in the 100% initial condition perturbation (ICP) scenario, the accuracy of EPI/PRE specification improves notably compared to the baseline (contrast dashed to solid lines in Fig 8C), while its precision remains unaffected (Fig 8D and inset).

Between the 25% and 75% ICPs we observe a systematic reduction of the PRE populations in favor of the EPI populations (Fig 8B). This can be attributed to the fact that with increasing ICP strength a larger subset of cells is initially biased towards the EPI fate. This trend is inverted as we proceed to stronger ICPs (150% and 200%), since now the initially available Nanog abundance quickly induces FGF4 production, which promotes the PRE fate. Notably, while different ICPs thus alter the cell-specification accuracy, the corresponding precision remains similar for all levels of ICP strength (error bars in Fig 8B and inset plot of Fig 8D).

Our findings indicate that when the ICD is perturbed within normal ranges expected from full protein induction, cell-specification accuracy can be improved without compromising its precision. However, if the ICD is perturbed beyond this, the excess initial resources negatively affect the accuracy, while precision remains unchanged. These observations align with various previous studies which highlighted that stochasticity can play a constructive role in biological systems [79, 81, 82, 120127].

To test how initial-condition variability affects developmental accuracy and precision in vitro (or even ex vivo), one could utilize photosensitive reagents to control mRNA and protein activity in living cells (i.e., optogenetics) [110, 126]. This approach would allow systematic control of cellular initial conditions, for example, via overexpression of competing molecular species.

Cell-fate assignment remains robust when less than 25% of cells start with perturbed initial conditions

Having established that moderate increases in the variability of initial condition distributions (ICDs) can be beneficial for robust cell-fate specification, we now turn our attention to understanding the limits of this robustness by examining the system’s response to different formats of ICD perturbations. In these tests, while maintaining a constant tissue size of 100 cells, a varying number of cells (ranging from 1 to 100) are randomly selected for ICD modifications.

The first perturbation scheme linearly modifies both NANOG- and GATA6-related resources in tandem, with the new mean ICD values ranging from 0% to 200% of the typical initial resources (Fig 9A–9D). For clarity, we have labeled these scenarios based on their deviation from the standard ICD, such as —100%, 0% (reference), and 100%.

thumbnail
Fig 9. Resilience to perturbations is mainly determined by number of affected cells: Patterning remains robust when less than 25% of cell population is perturbed.

Robustness to linear initial condition perturbations (ICPs) of protein and mRNA initial resources (NANOG-GATA6 and Nanog-Gata6). Upper row [A-D] shows the effect of equally perturbing resources of both genes. Lower row [E-H] shows adversarial perturbation of the resources, i.e. linearly progressing perturbation magnitude from 200% NANOG and 0% GATA6 (—100% ICP scenario) to 0% NANOG and 200% GATA6 (100% ICP scenario). [A, E] Initial condition distributions (ICDs) for NANOG and GATA6 proteins. Interior (scatter) plots show sampling intervals. Exterior plots show sampling distributions. For each ICD, data points come from 1000 independent simulations. [B-D, F-H] Quantifying the effect of equal [B-D] and adversarial [F-H] ICPs for different intensities (molecular count percentage with respect to baseline scenario: ICP = 0%), and varying number of perturbed cells for all different cell-fate categories (EPI, PRE, UND). For each fate category, color saturation indicates the fate error, i.e. the mean difference of cell-fate counts between the perturbed and unperturbed baseline systems. Each matrix entry (perturbation pair) is calculated from 1000 independent simulations.

https://doi.org/10.1371/journal.pcbi.1012473.g009

The second perturbation scheme involves a negative linear correlation between NANOG- and GATA6-related resources, creating scenarios where one of these resources is initially dominant (Fig 9E–9H). The range of adjustments spans from 200% NANOG and 0% GATA6 to 0% NANOG and 200% GATA6, again compared to the unperturbed initial conditions.

We quantified the deviations from the typical ICD behavior using the relative Fate Error (FE), which measures the discrepancy in ICM specification accuracy for each cell fate (UND/EPI/PRE) between the perturbed scenarios and the reference (0% ICP) scenario at 48 hours; see also caption of Fig 9 for details about FE.

Our findings from both schemes indicate that when more than 25% of the cell population is affected by the ICD disturbances, the system experiences significant deviations in lineage distributions, regardless of perturbation strength. This suggests the existence of a critical threshold ratio of perturbed cells, beyond which the system’s resilience is notably compromised. When this threshold is exceeded, profound gene expression imbalances emerge across the ICM population.

Both types of perturbation, whether implying scarcity or over-abundance of initial resources, result in similar FE values suggesting a potential correction mechanism that can overcome highly irregular cellular initial conditions. Notably, no significant deviation is observed when less than 25% of the cell population undergoes perturbation. Here the system demonstrates remarkable robustness, underscoring the tissue-level coordination inherent in the ICM specification process.

In perspective, testing the impact of restricting gene expression profiles for multiple cellular subpopulations of different sizes is also achievable via optogenetics [110, 111, 126]. Our approach predicts only minor deviations for final cell-lineage distributions, in terms of their accuracy and precision, when less than 25% of the initial cellular population is perturbed accordingly; see Fig 9. Moreover, we predict that these deviations depend solely on the cumulative number of perturbed cells, rather than the specific locations of perturbation.

Cell plasticity and FGF4-sensitivity are time-window dependent

Temporal modulation of FGF4 concentration and the corresponding shift in cell plasticity are key aspects of ICM cell differentiation, with numerous studies documenting their influence [7, 9, 21, 30, 128]. To examine whether our model can replicate the experimentally observed time-dependent responsiveness to FGF4 level changes, we introduced controlled perturbations of FGF4 in our simulations. This involved adding extra FGF4 molecules to each simulated cell at specific time points, mimicking the effect of exogenous FGF4.

We assessed the response to FGF4 perturbations in two models (ITWT and TM-FGF4) and at two distinct final simulation times, 48 and 96 hours. The second time point, while biologically irrelevant, was introduced to test whether manipulation of FGF4 levels can alter the typical time scale on which cell-fate commitment converges. Exogenous FGF4 addition was carried out at predetermined time points: simulation time = [0, 4, 8, 12, 16, 24, 32, 40] hours, with the amount of added FGF4 molecules determined by a mix of Poissonian and binomial distributions (details in Computational experiments). Note that the TM-FGF4 system represents a theoretical mutant with blocked FGF4 production, mimicking a full loss-of-function phenotype for the Fgf4 gene; this means that any FGF4 originates in these systems from the external addition.

We quantified the ICM patterning robustness to varying amounts of exogenous FGF4 using the Standardized Absolute Fate Deviation (SAFD), which measures the distance, in terms of mean cell-fate proportions, from the null configuration (no exogenous FGF4 at 0 hours) to every alternative perturbation-magnitude-and-addition-time configuration; see also caption of Fig 10 for details about SAFD.

thumbnail
Fig 10. Robustness to varying amounts of exogenous FGF4 added at distinct time points quantified by SAFD at 48 h: cell plasticity and FGF4 sensitivity are time-window-dependent.

[A] Final cell-fate composition (at 48 h) of the inferred-theoretical wild-type (ITWT) system, for all tested FGF4 protein perturbation magnitudes and FGF4 addition at simulation start (0 h). Bar height ranges from 50 to 100 cells. [B-D] Effect of different FGF4 perturbation magnitudes and FGF4 addition times on cell-fate composition at 48 h, quantified by Standardized Absolute Fate Deviation (SAFD) for all three fate categories (EPI, PRE, UND) in the ITWT system. SAFD (color represents amplitude) refers to the distance from the reference (or null) configuration (no exogenous FGF4 at 0 h) to every alternative perturbation magnitude and addition time. SAFD is defined as the absolute difference between the mean of reference and perturbed proportions, rescaling this absolute difference by the standard deviation of the reference proportion. The separating hyperplane indicates the boundary between the regions with distance <= 1 and distance > 1. [E] Final cell-fate proportions (at 48 h) of the theoretical mutant system lacking FGF4 production (referred to as TM-FGF4), for all tested FGF4 protein perturbation magnitudes and FGF4 addition at simulation start (0 h). Bar height ranges from 0 to 100 cells. [F-H] Effect of different FGF4 perturbation magnitudes and FGF4 addition times on cell-fate composition at 48 h (quantified by SAFD) in the TM-FGF4 system. All bars and data points are calculated from separate batches of 1000 independent simulations for each perturbation. The following list represents a one-to-one correspondence between molecular concentrations and counts for mean added exogenous FGF4 protein, assuming a cellular culture medium with 100 compactly placed cells (concentrations are given in [ng/ml]): γ = [0.0, 0.12, 0.25, 0.37, 0.5, 0.99, 1.49, 1.98, 2.48, 2.97, 3.47, 3.96, 4.46, 4.96]. Conversion formula: γ = (x · M)/(NA · Vmed). Where: γ is the resulting mass concentration of FGF4 [ng/ml]; x is the FGF4 copy number; M is the molecular weight of FGF4 (25 kDa); NA is the Avogadro constant; Vmed is the medium’s volume, 100 ⋅ 4200 μm3, as shown in Table 3.

https://doi.org/10.1371/journal.pcbi.1012473.g010

Our analysis at the 48-hour mark shows that the TM-FGF4 model is highly responsive to exogenous FGF4 added at the start of the simulation (addition time = 0 hours). Depending on the FGF4 count, it is possible to manipulate the lineage ratios and rescue the PRE fate (Fig 10E), which is in line with recent in vitro experiments [9]. However, we identify an end-of-plasticity time point around 24 hours, after which the system becomes insensitive to additional FGF4 and locks into its pre-existing cell lineage proportions (Fig 10F–10H). In the ITWT system, cell fate ratios can also be manipulated by varying the FGF4 amount, although no distinct end-of-plasticity time point is observed in this model (Fig 10A–10D).

The quantitative predictions in Fig 10A–10D and 10E–10H motivate potential in-vitro experiments. These predictions not only formulate a priori expectations for the robustness of the real system, but also for its response to varying amounts of exogenously applied FGF4. To our knowledge, while similar experiments with limited scope have been performed for an in-vitro mutant system lacking FGF4 production [9], there is currently no comparable experiment akin to Fig 10A–10D for a wild-type-like ICM system.

To ease the translational value of our computational FGF4-based perturbations, we supply a set of quantities that directly translate the numbers in Fig 10 to wet-lab experimental settings; see also caption of Fig 10. The following list represents a one-to-one correspondence between molecular concentrations and counts (as seen in Fig 10) for mean added exogenous FGF4 protein. For simplicity, we assume a cellular culture medium with 100 compactly placed cells, and the concentrations are given in [ng/ml]:

γ = [0.0, 0.12, 0.25, 0.37, 0.5, 0.99, 1.49, 1.98, 2.48, 2.97, 3.47, 3.96, 4.46, 4.96]

The employed conversion formula for these predictions is as follows: γ = (x · M)/(NA · Vmed). Where γ is the resulting mass concentration of FGF4 [ng/ml]; x is the FGF4 copy number, as shown in Fig 10; M is the molecular weight of FGF4 (25 kDa); NA is the Avogadro constant; Vmed is the medium’s volume, 100 ⋅ 4200 μm3, as shown in Table 3.

We remark that Fig 10E recapitulates the experimental observation shown in Fig 2F of [9]. Notably, our simulation data match the order of magnitude for their experimental measurements. While our values are slightly smaller than the ones reported in [9], this can be explained by the fact that we consider a closed system from which FGF4 cannot leak out by diffusion.

Extending the simulation to 96 hours, the TM-FGF4 exhibits similar behavior as in the 48-hour case (Fig 11E–11H). However, the ITWT system now displays a gradual loss of cell plasticity; beyond the 32-hour mark, additional FGF4 does not significantly alter final fate ratios, provided the average amount of added FGF4 remains below 250 molecules per cell (Fig 11A–11D).

thumbnail
Fig 11. Robustness to varying amounts of exogenous FGF4 added at distinct time points quantified by SAFD at 96 h: Cell-fate composition perturbed by non-endogenous FGF4 addition remains largely unchanged beyond 48 h.

[A] Final cell-fate composition (at 96 h) of the inferred-theoretical wild-type (ITWT) system, for all tested FGF4 protein perturbation magnitudes and FGF4 addition at simulation start (0 h). Bar height ranges from 50 to 100 cells. [B-D] Effect of different FGF4 perturbation magnitudes and FGF4 addition times on cell-fate composition at 96 h (quantified by SAFD) in the ITWT system. [E] Final cell-fate proportions (at 96 h) of the theoretical mutant system lacking FGF4 production (referred to as TM-FGF4), for all tested FGF4 protein perturbation magnitudes and FGF4 addition at simulation start (0 h). Bar height ranges from 0 to 100 cells. [F-H] Effect of different FGF4 perturbation magnitudes and FGF4 addition times on cell-fate composition at 96 h (quantified by SAFD) in the TM-FGF4 system. See caption of Fig 10 for definition of SAFD.

https://doi.org/10.1371/journal.pcbi.1012473.g011

In summary, our simulations with exogenously administered FGF4 underscore that ICM cell plasticity is confined to a specific time window. The ICM population’s transient sensitivity to external FGF4 allows for the maintenance of EPI and PRE lineage proportions under normal conditions. Nevertheless, the balance between these two fates can be influenced by the timing and concentration of exogenous FGF4, showcasing the nuanced interplay between external factors and intrinsic developmental processes.

Discussion

The specification from the inner cell mass (ICM) to epiblast (EPI) and primitive endoderm (PRE) lineages is a pivotal process in preimplantation blastocyst formation, representing a key paradigm in mammalian tissue development. This process exemplifies self-organization, balancing high plasticity with strong robustness of fate proportioning [2]. Its reproducibility and adaptability to experimental perturbations are conserved features across mammalian species [5], highlighting its importance for studying the emergence of pluripotency and homeostasis, especially within the context of embryonic stem cell (ESC) cultures [7].

ICM-derived ESCs possess the remarkable ability to form any germ layer. But, despite their self-renewal, regular developmental timing, and seemingly deterministic tissue patterning capabilities, ESC populations display significant plastic heterogeneity [23, 29, 30, 72]. This property reflects the intrinsically stochastic expression of key gene regulatory factors, the relatively small number of blastocyst cells, as well as the spatial variability of intercellular signaling sources [5, 7, 129]. With the advent of sophisticated genome manipulation tools and high-throughput screening techniques such as scRNA-seq [11, 130], mouse experimental models have advanced our mechanistic understanding of diseases such as cancer, guiding both drug design and disease etiology, thereby substantiating their high relevance to human medicine [6, 7]. Although extrapolating findings from animal studies to human biology remains challenging, theoretical models facilitate the investigation of patterning principles through unifying mathematical themes, elucidating universal properties, and predicting novel experiments across paradigmatic developmental biology systems [88, 131134].

However, existing theoretical models of mouse ICM fate specification exhibit a fundamental conceptual gap due to the lack of detailed mechanistic understanding of the dynamical landscape of cellular potency and plasticity, as they are largely phenomenological and primarily deterministic in nature [8, 15, 22, 38]. As such, these approaches fail to rigorously quantify the implications of noise emerging from the basic biochemical processes driving this developmental process.

To correctly capture the inherent randomness in ICM differentiation, we constructed a biophysics-rooted spatial-stochastic gene-regulation model. This model is simulated via the Reaction-Diffusion Master Equation (RDME) formalism, and we embedded it into a Simulation-Based Inference (SBI) framework, capitalizing on recent advancements in Machine Learning (ML). This combined workflow enables a mechanistic description of the ICM patterning dynamics in multi-cellular settings, using realistic lifetimes for the involved molecular species. It also offers a biophysics-grounded implementation of the mesoscopic processes generating biochemical noise at both cell and tissue scales.

Leveraging this combined workflow, we developed the inferred-theoretical wild-type (ITWT) model, which collectively recapitulates key experimental findings: (1) indispensability of FGF4-mediated signaling for proper ICM patterning [30, 99], evidenced through the FGF4-coordinated stimulation of the PRE fate that requires prior emergence of the FGF4-producing EPI lineage; (2) high reproducibility and robustness of EPI-PRE lineage proportions, irrespective of the cell number (system size) and despite the narrow timescale of blastocyst formation (1.5–2 days of embryonic development) [38, 99]; (3) significant ICM sensitivity to exogenously applied FGF4 in mutant-like conditions [9], with ICM plasticity being adjustable based on application time and dosage strength. Importantly, our computational approach produces quantitative predictions that could inform future experiments: (4) intercellular communication via FGF4 can functionally improve ICM differentiation robustness, reducing fate proportioning error by 10–20% compared to a purely cell-autonomous system; (5) increased variability in the initial conditions of key cellular resources can enhance the accuracy while sustaining precision for EPI-PRE cell-type proportioning, with fate distributions remaining robust when less than 25% of the cell population starts with perturbed initial conditions.

In the absence of FGF4 coupling, the simulated blastocyst fails to establish correct cell fate proportions, displaying a strong bias towards the EPI fate. We thus argue that, given a default naive pluripotent state in ICM-like systems, successful cell-fate specification necessitates a tissue-level mechanism that orchestrates the emergence of distinct cellular fates by providing coordinating feedback between cells with distinct plasticity potentials.

Previous experimental and simulation results also underscore the importance of cell-cell signaling in maintaining reproducible lineage proportions globally while facilitating correct pattern formation locally [8, 9, 35]. Interestingly, our analysis of cell neighborhood composition reveals that the communication range of FGF4, though essentially limited to nearest neighbors, suffices for ensuring effective signaling. This observation aligns with recent findings from both in-vitro and in-silico studies [9, 85], highlighting the nuanced roles of local signaling dynamics in complex tissue patterning.

We successfully recover the temporal sensitivity of the ICM to exogenous FGF4 for a mutant system lacking FGF4 production. We observe a specific time window during which the ICM can respond to external FGF4, with the ability to adjust lineage proportions depending on the timing and dosage of the addition. This finding aligns with recent experimental observations [9] and highlights the importance of timing in developmental processes. In addition, we go beyond the current experimental data, providing quantitative predictions for multiple application times and dosage strengths of exogenous FGF4 signaling molecules, not only for mutant-like conditions but also for our wild-type-like system (ITWT).

We find that system size (number of cells) does not significantly influence the accuracy of attaining the correct lineage proportions. This is in line with previous studies reporting that the mouse blastocyst exhibits resilience to ICM size variations, maintaining consistent patterning irrespective of cell number [1, 8]. However, we demonstrate that the precision of cell specification is system-size dependent. Our results predict that cell-fate misspecification can be reduced down to ∼ 10% when the system size surpasses ∼ 50 cells, which is comparable to the typical number of cells in the ICM around E3.5.

Notably, our simulations show that increased variability in initial conditions at the cellular level does not necessarily constitute a detriment for the tissue-level dynamics. Instead, increased variability in key initial molecular resources, if not excessive, can enhance the accuracy while sustaining the precision of cell-fate specification. Moreover, our simulations predict no significant deviations for final cell-lineage distributions, with respect to accuracy and precision, when less than 25% of the cellular population experiences initial condition perturbations, regardless of perturbation strength and biases.

A paramount challenge in biophysical mechanistic modeling is the estimation of parameter values allowing the constructed model to faithfully recapitulate the characteristics of the considered biological system. This task becomes particularly complex for spatial-stochastic and mechanistic models due to the need for analyzing behavior across numerous independent simulation samples, significantly increasing the computational demands for navigating their vast parameter spaces [92, 135, 136]. In response to this challenge, our approach integrates an AI-powered Simulation-Based Inference (SBI) method with traditional ML techniques, specifically employing the Sequential Neural Posterior Estimation (SNPE) algorithm [86, 87, 89]. This strategy leverages simulation data to efficiently traverse parameter space, incorporating both direct analysis of simulation outcomes and qualitative observations to identify parameter distributions that align with the expected behaviors of the system, encoded in high-level, low-dimensional utility functions (“target scores”).

We utilized a state-of-the-art SBI toolbox [137], which facilitated the integration of the SNPE algorithm into our workflow. This allowed us to train artificial neural networks (ANNs) for predicting model parameter sets capable of reproducing the targeted ICM patterning behavior of several model variants, corresponding to both wild-type and mutant systems. These predictions are point and interval estimates of biophysical model parameters, which could be experimentally tested or used as guidance in future approaches. For example, Table 4 and S2 Fig show full predictive ranges for the effective lifetimes of both cytoplasmic FGF4 protein and the membrane-bound FGFR1-FGF4 (receptor-ligand) complex.

We find that the lifetime of intracellular FGF4 does not need to be tightly regulated, as its functional role is part of the extracellular feedback mechanism. To test the implications of manipulating FGF4 lifetime for the regulation of cell-fate decisions, it is conceivable to employ pharmacological tools such as protein therapeutics or targeted enzymatic actions [138, 139]. In contrast, the lifetime of the FGFR1-FGF4 complex needs to be tightly regulated to achieve ideal cell-type proportioning behavior. This prediction could potentially be tested using FRET (Förster Resonance Energy Transfer) experiments [140], to measure or estimate the required fine-tuning of the stability of the complex.

Our methodology imposes minimal constraints on the inference problem by leveraging only essential experimental observations. This strategy prevents model overfitting, allowing for the extrapolation of system behaviours spanning multiple spatial and temporal scales not directly observed in experimental data. With this approach, we underscore the distinctions between ML models, which prioritize universal prediction at the expense of modeling interpretability, and mechanistic models, which focus on exploratory hypotheses to uncover causal relationships at the expense of modeling fidelity [88, 141]. By merging these paradigms, we demonstrate that despite the scarcity of detailed quantitative experimental measurements, the flexibility and predictive capabilities of ANNs can aid generating full-featured quantitative predictions by imposing key empirical qualitative observations to mechanistic biophysical models. To our knowledge, our work constitutes the first application of an AI-powered SBI framework to spatial-stochastic predictive modeling in developmental biology.

While here we focused on a minimal spatial geometry for targeted assessment of the interplay between biochemical stochasticity and spatial coupling, ICM development is influenced by important additional factors, such as cell divisions and force interaction among cells. Future elaborations of our framework will incorporate suitable tissue-scale dynamics, which will integrate the stochastic dynamics of single-cell gene expression and inter-cellular signaling with the constant remodeling of the tissue geometry. Several approaches addressing tissue remodeling problems have already been proposed; thus, we will focus on exploiting several of these computational platforms, such as the ones described in [142] and [25], suitable for exploring shape homeostasis emergence in simple 3D systems. This will enable the study of how cell neighborhoods varying both in time and space influence ICM lineage differentiation, while leveraging recently recorded tissue structural data [10, 35, 37, 119].

The extended framework will also feature the other two important constitutive elements of the developing mouse blastocyst, namely the blastocoel (blastocyst cavity) and the trophectoderm (TE), exposing an interesting research direction, as recent experimental evidence suggests that the expansion of the mouse blastocyst lumen could play a role in stimulating ICM fate differentiation. This is thought to occur through an interplay of mechanical clues and position-specific induction of gene-expression, possibly mediated by FGF4 molecules deposited in the blastocoel [28, 143].

Perspectively, our AI-powered approach provides a promising basis for establishing virtual replicas, or digital twins, of early embryonic development. The generative model capability of our approach facilitates the discovery of complex parameter space structures and compensation mechanisms, akin to recent advancements in fields such as structural biology [144, 145] and neuroscience [146, 147]. This has potential benefits for the study of human in-vitro fertilization and treatment of prevalent gestational complications [5], as it could circumvent the need to establish associated experimental systems, which may come with ethical restrictions. The highly detailed synthetic trajectories produced by our simulation framework could be coupled with future experiments combining, for example, both scRNA-seq and smFISH (single-molecule fluorescence in situ hybridization) technologies [76], in order to disentangle the noisy dynamics at transcription and translation scales for individual genes by exploiting computer-assisted insights. This outlook also underlines the importance of identifying the essential noise-control mechanisms that maintain a well-balanced ratio between EPI and PRE populations, given that breaking this balance can have significant physiological ramifications for the postimplantation embryo [3, 99].

In conclusion, our AI-parameterized model underscores the complexity and robustness of the EPI-PRE lineage specification, generating unique insights into the interplay of stochasticity, tissue-level signaling, feedback mechanisms, and system size (number of cells) in ICM development. These findings not only deepen our understanding of the developing early mouse embryo under genuine biochemical noise conditions but also provide a comprehensive framework for exploring similar controlled stochastic processes in related biological systems. Potential examples include, but are not limited to, the human blastocyst formation [2, 6, 7, 148], the Bacillus subtilis competence circuit [149151], and the Dictyostelium discoideum cell-type proportioning [152].

Materials and methods

Computational model of mouse blastocyst (ICM cell differentiation)

The model comprises two fundamental building blocks. The first submodel (cell level) consists of the GRN (NANOG-GATA6-FGF4) coordinating the ICM cell specification process. The second submodel (tissue level) describes the cell-cell signaling dynamics. Unlike other existing models [11, 15, 22, 23], our modeling approach does not integrate the notion of noise as a purely extrinsic component. Instead of an arbitrary noise source, we employ a mesoscale description which incorporates noise as an intrinsic component. Thus, noise plays an essential role for faithfully simulating the temporal evolution of our biological system model.

Indeed, the presence of noise in biophysical models is deemed central for discerning the main features of gene regulatory processes [82, 153, 154]. Conventionally, noise is separated into intrinsic and extrinsic categories [121, 122]. While it is problematic to give a clear delimitation of these two categories, here we provide a general interpretation of their scope within the context of our study.

Intrinsic noise arises from the nature of biochemical reaction and diffusion events; i.e., discrete molecules randomly diffuse and randomly react when a collision occurs between each other. As such, intrinsic noise commonly refers to local fluctuations within basic gene regulation mechanisms; e.g., transcription and translation. Extrinsic noise originates from cellular environment variations or changes. Hence, extrinsic noise typically alludes to global factors systematically affecting all cells but irregularly propagating across cellular mechanisms; for example, cell cycle timing and cellular resource partitioning. Nevertheless, recent experimental and theoretical works argue for treating both noise categories as inseparable entities [155158].

When there is a large number of molecules at play, a biochemical dynamics model typically follows a deterministic formulation: reaction rates are represented by constant functions, species amounts are represented by concentrations (continuous-time functions), and it primarily follows an ordinary differential equation (ODE) scheme. By contrast, when there is a small number of molecules at play, a stochastic formalism takes precedence.

Generally, stochastic biochemical dynamics models are formulated as continuous-time Markov chains (CTMCs); i.e., continuous-time discrete state-space Markov processes. Numerous mathematical and computational methods have been developed for analysis and simulation of such stochastic formulations [159166]. These techniques methodically incorporate stochasticity, which is relevant for understanding the effects of noise on cell-cell variability.

A biochemical reaction network involves multiple reactions (edges) and species (vertices or nodes); a CTMC is the most common model of such a network. Particularly, biophysical systems can be abstracted using the Chemical Master Equation (CME) formalism; Eq (1) [166, 167]. (1) Where: x is the state vector of the system X (CTMC); x = X(t) = [X1(t), …, XN(t)]; there are N biochemical species (i ∈ {1, …, N}); each entry of x represents the copy number of a given biochemical species Si; P(x, t|x0, t0) is the time-dependent probability density function of x; x0 is the initial state vector; t0 is the initial time; there are M reactions (j ∈ {1, …, M}); aj is the nominal rate of reaction Rj; vj is the state-change vector (set of stoichiometric coefficients) of reaction Rj; aj(x, t) is the propensity function (effective rate) of reaction Rj when the system X is in state x at time t.

The full GRN implemented by our simulator includes all the molecular species relevant for the developmental system dynamics, together with several auxiliary (computational) species; this procedure facilitates the inclusion of all important molecular relationships and the tracking of crucial model variables. An extensive list of species, relations, and nomenclature guidelines is available from the corresponding simulator scripts. For ease of exposition, Table 1 presents only the actual biochemical species considered for our model.

Within the CME framework, a well-mixed or reaction-limited system is the main assumption; i.e., molecular diffusion is relatively fast compared to the speed of any biochemical reaction. The most popular method to simulate models following the CME formulation is the Stochastic Simulation Algorithm (SSA), a scheme introduced and rigorously proven to be physically relevant by the late Daniel T. Gillespie [159, 168].

Correspondingly, molecular diffusion speed can guide the choice of a biochemical dynamics representation. Fast diffusion is synonym with spatially-uniform distribution of resources; a well-mixed or homogeneous environment. Slow diffusion is synonym with spatial correlation and other spatial factors, which creates a heterogeneous environment.

While there exist multiple techniques tackling different spatial and temporal scales [163, 169172], we aimed for balance between computational efficiency and biophysical realism. Consequently, we have followed the formalism of the Reaction-Diffusion Master Equation (RDME); Eq (2) [173, 174]. (2) Where: and are the reaction and diffusion components of the equation, respectively; xk (or xh) is the state vector of the voxel Ωk (or Ωh); xik (or xih) is the copy number of Si for Ωk (or Ωh); there are L voxels (k, h ∈ {1, …, L}); bikh is the nominal diffusion rate of Si from Ωk to Ωh; wikh is the state-change vector (set of stoichiometric coefficients) for the diffusion of Si from Ωk to Ωh.

The RDME framework works at the mesoscopic level, and its simulation schemes are based on custom versions of the SSA tailored to incorporate reaction-diffusion processes. Here, we depart from one of such schemes, the Next Subvolume Method (NSM) [165, 175], which separates events into two distinct kinds: reaction firing inside every cell, and diffusive jumps between cells. For our system model, each cell is treated as a well-mixed voxel/environment, and tissue communication materializes by representing signaling-molecule diffusion as a morphogen-exchange process between neighboring cells; as such, we will commonly refer to this process simply as “diffusive jump”, “jump diffuse”, or “jump diffusion”.

To put it briefly, our event-driven simulator is congruent with the NSM because it involves the SSA, the computational spatial domain is partitioned into artificially well-mixed compartments where only molecules belonging to the same compartment can react, diffusive jumps transport molecules between neighboring voxels, and there are well-defined event queues. Outside these shared features, our simulator allows for complex interactions among voxels or cells, which facilitates the presence of multiple tissue types and the corresponding relabeling of molecules once they undergo jump-diffuse steps. Likewise, the nominal diffusive-jump rates are calculated based on an arbitrary system model geometry and the principle of conservation of (molecular) flow; unlike the NSM, which calculates the nominal jump-diffuse rates based on a regular cubic geometry and the voxel size.

Key stages of mouse embryo preimplantation development.

To guide our model construction and in silico analysis, we relied on wet-lab experimental descriptions of core phases in early mouse development. The mouse preimplantation period encompasses a series of morphological and molecular changes which transform the zygote (one totipotent cell) into an approximately 256-cell (7–8 cleavages) embryo at around E4.5; at this point, the embryo comprises three spatially segregated cell types: TE, EPI, and PRE. For a complete recap of the mouse embryo preimplantation development, please see [2, 3, 29].

The first cell-fate decision happens between E2.5 and E3.0 (from 8- to 32-cell stage): cells acquire TE or ICM identities. The second cell-fate decision happens at the ICM between E3.0 and E4.0 (from 32- to 128-cell stage). From E4.0 to 4.5, EPI and PRE populations spatially separate. While it is customary to define the blastocyst-formation period between E3.0 and E4.5 [1, 30], these boundaries are ultimately arbitrary as development occurs in a continuum and diverse experimental arrangements/conditions are in use between distinct labs. Moreover, ICM cells adopt their next identities asynchronously as the blastocyst forms [21, 29, 99]. Together with these aspects, it is also commonly accepted that cells are already coexpressing Nanog- and Gata6-related factors at around E2.75 [2, 176], plus Fgf4 expression is already perceptible at around E3.25 [1, 30]. For these reasons, our standard model simulations target a time-window of 48 hours (E2.75-E4.75); this range allows us to circumvent potential discrepancies among timing annotations and keep a temporally faithful description of the biological system under study.

Fundamental interactions among central GRN components.

Many processes coexist during blastocyst development. These processes materialize at multiple temporal/spatial scales and embody the relationships of numerous components operating simultaneously. A vast number of elements conjointly orchestrate developmental progress scaling from cell-level adaptable gene expression mechanisms to tissue-level mechanical/signaling coordination structures. Particularly, the GRN controlling the ICM specification process has a rich collection of components and interactions. Here, we model this GRN by accounting for the key interactions among its main components, as reported by recent experimental studies.

To start, we suppose that our core GRN motif consists of the species and interactions primarily governing the Nanog- and Gata6-gene expression dynamics. This collection of ingredients only includes Nanog mRNA, Gata6 mRNA, NANOG protein, and GATA6 protein, naturally. As transcription factors (TFs), both NANOG and GATA6 proteins exhibit self-activation and mutual repression [3, 73].

The remainder of the complete GRN encompasses all the species and interactions secondarily governing the Nanog- and Gata6-gene expression dynamics. This group includes Fgf4 mRNA, FGF4 protein, and ERK protein. Among these explicit elements, we also implicitly include two FGF receptor (FGFR) complexes, which concertedly facilitate biochemical signal transduction during blastocyst formation [67, 177]. Recently, a comprehensive experimental study demonstrated that NANOG and GATA6 proteins are capable of jointly binding to both EPI and PRE cis-regulatory modules [73]. This concrete evidence supports the previously proposed direct NANOG activation plus GATA6 repression of the Fgf4 gene, both in vitro and in vivo [15, 117]. Likewise, ERK has been indicated to play a crucial role for this GRN [7, 178]. At transcriptional level, ERK is capable of recruiting diverse repressor TFs to Nanog-gene loci [105]. For antisymmetry and simplicity, we assumed that ERK is capable of recruiting diverse activator TFs to Gata6-gene loci; however, there is indeed some experimental evidence indicating such a motif [102]. At post-translational level, NANOG phosphorylation by ERK promotes its instability, which consequently reduces its lifetime [105, 113, 179]. Contrastively, it has been reported that GATA6 phosphorylation by ERK enhances its stability; nevertheless, the implications of this motif are not completely clear and we exclude it [102].

Finally, the upstream release of FGF4 induces the FGFR-FGF4 monomer complex formation, which successively induces the FGFR-FGF4 dimer complex formation. This FGFR-FGF4 dimer complex ultimately triggers the pathways downstream of ERK [7, 114, 128, 178].

Model at cell scale.

The core GRN motif is exclusively comprised by Nanog- and Gata6-related elements. To be more precise, all their directly related species and interactions. The rest of the full GRN is built around the core motif, thus consolidating the remaining elements and their collective effects on the dynamics of the two main players. Importantly, we have arranged all cell-scale reaction events into several groups as follows: summary of gene expression dynamics; promoter binding and unbinding; mRNA synthesis and degradation; protein synthesis and degradation; FGFR activation and inactivation; ERK activation and inactivation; NANOG phosphorylation and dephosphorylation. In that regard, we report all the particular interactions implemented by our simulator and their respective literature sources.

Summary of gene expression dynamics.

The only three genes with an explicit mRNA step are Nanog, Gata6, and Fgf4; Table 2 summarizes their relationships. The expression of the other two genes (Fgfr and Erk) is only visible either at an implicit form or at the protein level. FGFR also does not have an explicit protein count as it is available rather uniformly on the cell membrane [67, 140, 177, 180, 181]; instead, FGFR appears as an implicit component of the auxiliary protein variables/species M-FGFR-FGF4 (FGFR-FGF4 monomer complex) and D-FGFR-FGF4 (FGFR-FGF4 dimer complex), which helps reducing the number of reactions as well as alleviating the computational resources. ERK has itself two different protein forms: I-ERK (inactive ERK) is abundant in the cell cytoplasm [128, 182], and it is already present at the start of all the simulations; A-ERK (active ERK) is always inversely proportional to I-ERK, thus it is a product of the action of D-FGFR-FGF4 on I-ERK.

Promoter binding and unbinding.

Each of the three genes Nanog, Gata6, and Fgf4 has a respective promoter with multiple independent binding sites for each of its TFs; check Table 3. Both gene activation and repression are cooperative: exactly q TF copies must be simultaneously bound to their particular promoter sites for activation or repression of expression; by default, repression takes precedence over activation. For a given TF A, the (time-dependent) effective promoter binding rate is calculated via the formula (diffusion-limited regime). Here, d = 10 nm is a typical binding-site diameter [46], D = 10 μm2s−1 is the cytoplasmic/nuclear TF diffusion coefficient, At is the TF copy number at time t, and V = 4200 μm3 is a typical mouse blastocyst cell volume [183185]. For our system model, we do not know the diffusion coefficients of all the biochemical species, thereby we simply made an educated guess and assumed the same value for all the TFs based on other representative biological systems [46, 167, 186, 187]. Concisely, the nominal promoter binding rate is determined by the equation .

We model TF cooperativity by expressly tuning the promoter unbinding rates. This rate tuning influences the promoter regulation model to mimic a Hill-function-like (nonlinear) transcriptional response. The usage of the Hill function is a staple of phenomenological modelling, however it is incompatible with mechanistic modelling; directly using a Hill equation as a reaction propensity function ignores the non-instantaneous (stochastic) nature of delays between biochemical events and introduces several other simulation artifacts [188190]. We use the concepts of a half-saturation constant and a cooperativity coefficient to perform promoter unbinding rate tuning; accordingly, both quantities are incorporated into elementary reactions to describe promoter unbinding dynamics. This half-saturation constant (hact or hrep depending on TF role) is a free model parameter which dictates the threshold of TF copies needed for reaching 50% of negative or positive gene transcriptional control; consequently, each gene-TF pair requires its own separate half-saturation threshold. This cooperativity coefficient (kcoop) is an auxiliary variable which adjusts the strength of mutual influence among TF copies; we arbitrarily defined it as kcoop = 5 to increase TF-cooperativity potency (i.e., ↑ kcoop ⇒ ↓ ku). Specifically, we calculate the nominal promoter unbinding rates via the formula . Where, for a given gene-TF pair: hsat ∈ {hrep, hact}; kb is its nominal promoter binding rate; q is its maximal occupancy.

To summarize, Eq (3) illustrates the most basic reaction set of the TF promoter binding/unbinding dynamics, plus Fig 12 shows the elementary promoter architecture. (3) Where: A is a given TF; BQ is the current occupancy of a given gene promoter B by A; Q ∈ {0, …, q}; q is the maximum number of binding sites for A at B; kb is the nominal promoter binding rate; ku is the nominal promoter unbinding rate.

thumbnail
Fig 12. Architecture of the modeled gene regulatory region.

Transcription factor (TF) A binds to regulatory sites B at an effective rate proportional to kb and unbinds from them at an effective rate proportional to ku. Unlike kb, ku is directly related to q (which is the maximal occupancy of B by A); i.e., kb,1 = ⋯ = kb,q, where q is the number of TF binding sites (TFBSs) for A at B. [A] Markov chain transition diagram of TF binding/unbinding dynamics. [B] Every gene has a respective regulatory region with multiple independent binding sites for each of its regulating TFs. Gene activation and repression are both cooperative: exactly q TF copies must be simultaneously bound to their particular sites for activation (cyan triangles) or repression (red squares) of expression. By default, repression takes precedence over activation.

https://doi.org/10.1371/journal.pcbi.1012473.g012

Synthesis and degradation of mRNA.

For the transcription model, we assume that mRNA synthesis occurs as a single-step reaction but it is only possible when the gene promoter is not under control of a repressor TF; recall that we follow the “all-or-nothing” gene activation/repression configuration. We have as well accounted for two concomitant transcription modes: basal and full-induction production. Basal transcription contributes 20% of the maximal average steady-state mRNA copy number. The remaining 80% of the maximum mean steady-state mRNA copy number is contributed by full-induction transcription; which is only possible when an activator TF is occupying all of its binding sites at a given promoter. Accordingly, the mRNA synthesis rate is calculated via the formula . Where: ; ; cbasal and cfind are the basal and full-induction relative contributions (i.e. cbasal + cfind = 1), respectively. The symbol is a shorthand for the maximum mean steady-state value of mRNA copies for a given gene at full activation (Mt is the number of mRNA molecules at time t). The symbol τm denotes the lifetime (or half-life ) for a molecule of mRNA.

For Nanog, this mean mRNA value has been indicated to reach the order of hundreds of copies; approximately 100–400 molecules [72, 104, 191, 192]. For Gata6 and Fgf4, there are no concrete mean mRNA values reported, but it seems they are similar to the average Nanog-expression level [66]. Analogously, the Nanog-mRNA lifetime has been reported to be around 4–5 hours [71, 103, 191, 193], the Gata6-mRNA lifetime has been reported to be around 3–4 hours [194, 195], and we have not found concrete reports about the Fgf4-mRNA lifetime.

For simplicity, we have considered the mean mRNA values for Nanog and Gata6 to be the same, which classifies them as fixed parameter values (we chose copies). In the case of Fgf4, its mean mRNA value is deemed to be identical to the case of the other two genes and it is also considered a fixed parameter value. However, we additionally impose that any Fgf4 expression must be entirely regulated by NANOG and GATA6 levels; in other words, Fgf4 has no basal mRNA production ( copies). Likewise, the mRNA half-lives for Nanog, Gata6, and Fgf4 are determined to be the same (we chose τm,Nanog = τm,Gata6 = τm,Fgf4 = 4 hours).

For the mRNA degradation mechanism, we assumed that it is a first-order process: the nominal degradation rate is simply the multiplicative inverse of the lifetime; .

In a nutshell, Eq (4) illustrates the reaction set of the mRNA synthesis and degradation dynamics. (4) Where: G is a particular gene; M is the mRNA of G; Grep and Gact are indicator random variables representing the current state of G. Grep indicates a repressed gene (full promoter occupancy by repressor TF) and Gact indicates an activated gene (full promoter occupancy by activator TF), respectively.

Synthesis and degradation of protein.

The translation model assumes that, once mRNA is available, protein synthesis occurs as a single-step reaction. This assumption holds for NANOG, GATA6, and FGF4 proteins. For ERK, as there is no mRNA step, spontaneous activity produces its inactive protein form (I-ERK), which can undergo phosphorylation and become active (A-ERK). There are several indications for high abundance of ERK during the developing blastocyst, so it is not a limiting factor for the cell signaling process [114, 178, 196198]. It has also been reported that ERK has a long lifetime (48–72 hours) [182, 199, 200]. To reflect this strong presence of ERK across the cellular reaction domain, every standard model simulation assigns to each cell an initial high amount of ERK. As well, ERK can be synthesized and degraded at a rate directly proportional to its chosen half-life (τp,ERK = 48 hours) and its chosen maximum mean steady-state protein copy number ( copies). Here: is the rate of ERK synthesis; is the rate of ERK degradation.

For the remaining protein species, NANOG-GATA6-FGF4, an extra assumption was made to incorporate an additional sense of bursty production: every mRNA molecule is capable of synthesizing on average 4 protein molecules before it decays naturally; i.e., the theoretical maximum average steady-state value of protein molecules per cell can be calculated to be , , and copies. Bursty gene expression has been shown to significantly increase cell-cell variability of mRNA and protein levels, which itself has been suggested to enable enhanced adaptation to environmental changes and constraints [72, 201]. Thus, the nominal synthesis rates for these proteins can be calculated via the formula . For the cases of NANOG and GATA6, the protein half-lives have been reported to be 2–3 hours [202204] and 1–3 hours [194, 195], respectively; we chose, for simplicity, hours and τp,GATA6 = 2 hours. Just like the previous case, the symbol is a shorthand for the maximum mean steady-state value of protein copies for a given gene at full activation (Pt is the number of protein molecules at time t).

We assumed that any protein degradation process simply follows first-order dynamics. This condition means that, for a particular protein, its nominal degradation rate is the multiplicative inverse of its half-life: . This assumption not only holds for these primary proteins but also applies to all the derivative molecules: P-NANOG, M-FGFR-FGF4, and D-FGFR-FGF4. For NANOG, phosphorylation by ERK reduces its stability, which in turn accelerates its degradation and essentially halves its lifetime [179]. Hence, P-NANOG half-live has been categorized as a fixed parameter value (τp,P-NANOG = 1 hour). For D-FGFR-FGF4, we did not learn any concrete information about its lifetime. However, we determined that any practical decay should occur after monomerization (indirectly via transitions between dimer and monomer configurations); as such the value for D-FGFR-FGF4 half-life is set artificially high and is a fixed parameter value (τp,D-FGFR-FGF4 = 240 hour). The lifetime of M-FGFR-FGF4 (τp,M-FGFR-FGF4) is therefore a free model parameter controlling the actual extracellular stability of FGF4-related resources. A similar challenge happens for FGF4 itself, as there are contrasting reports about its lifetime [205, 206], we made its half-life a free model parameter (τp,FGF4).

In summary, maximal average steady-state protein levels for NANOG, GATA6, FGF4, and ERK are fixed model-parameter values. Protein half-lives for NANOG, GATA6, ERK, P-NANOG, and D-FGFR-FGF4 are also fixed model-parameter values. The lifetimes of FGF4 and M-FGFR-FGF4 are free model parameters. Eq (5) illustrates the reaction set of the protein synthesis and degradation dynamics. (5) Where: M is a particular mRNA; P is the protein of M; kp,s and kp,d are the protein synthesis and degradation rates, respectively.

FGFR activation and inactivation.

Multiple experimental studies have demonstrated that FGF signaling is a fundamental coordinator of the ICM specification into EPI and PRE populations [44, 67, 177, 180, 181]. They have also indicated that FGF4 binds to two receptors: FGFR1 and FGFR2. However, FGFR1 plays the main role in the ICM cell-fate establishment and FGFR2 has a supporting/redundant character in the PRE-lineage regulation [67, 177]. Additionally, FGFR1 is expressed abundantly all over the ICM [140], plus FGF4 signaling via FGFR1 is critical for maintaining physiological levels of NANOG in EPI cells to help them reach primed pluripotency [67]. Nonetheless, we do not model explicitly any FGF receptor; instead, for our model simulations, once FGF4 is available at the plasma membrane, we simply consider it to be the receptor-ligand complex in its inactive (monomer) form. In other words, whenever FGF4 undergoes a diffusive-jump event we relabel it as the monomer complex M-FGFR-FGF4. Furthermore, FGFR activation requires ligand-receptor dimer assembly, which in turn makes possible biochemical transduction of FGF signaling [140, 207]. As such, the D-FGFR-FGF4 (dimer) complex represents this active form triggering the FGF/ERK pathway in our system model.

The nominal rates of FGFR dimerization (activation) and monomerization (inactivation) are fixed model-parameter values. For the dimerization case, we follow the theory of diffusion-controlled reactions in a similar manner to the gene promoter scenario [185]: we use the same formula as for kb but we assume a receptor-ligand complex diffusion constant 30 times slower than the typical TF diffusion constant. For the monomerization case, we assume that FGFR1 phosphorylation and dephosphorylation follow similar kinetics, thus we take the average time (approximately 360 seconds) for reaching the half-saturation point based on some experimental receptor-ligand (FGFR-FGF4) response curves [140, 207].

In brief, Eq (6) recaps the reaction set of FGFR dimerization and monomerization. (6) Where: kdime and kmono are both fixed parameter values; kdime = 4πd(D/30)/V; kmono = 1/360 s−1.

ERK activation and inactivation.

Proper stimulation of the FGF/ERK signalling pathway is a requisite for ICM cell differentiation during mouse blastocyst formation [208, 209]. This stimulation is also essential for escaping naive pluripotency in mouse embryonic stem cells (ESCs) [114, 208]. It has been shown experimentally that activation of ERK occurs via phosphorylation; this mechanism is present at ICM progenitors as well as PRE and EPI tissues [209]. After all, ERK is the main FGF-signaling effector, which relays FGF4 fluctuations downstream of FGFR1 and FGFR2. This signaling pathway is a basic component for the regulation of cellular differentiation and homeostasis [200].

Previous reports have suggested that ERK experiences highly heterogeneous dynamics [128, 178]. This high variability has been regarded as an additional layer of plasticity, which could enhance the cell-fate decision process by augmenting cellular heterogeneity during ICM/blastocyst development [7]. As such, ERK activity has been extensively examined in recent studies [7, 114, 128, 178]. Nevertheless, quantitative information about important reaction rates, relevant molecular concentrations, and many other kinetic parameters remains mostly elusive. Here, we use current in vitro and in silico reports on oscillations of ERK nuclear translocation for related biological systems, as well as experimental descriptions of ERK activity dynamics when exogenous FGF4 is present in mouse ESC systems [9, 196]. We gathered all this information in order to establish generous bounds for the free model parameters representing the ERK phosphorylation (activation) and dephosphorylation (inactivation) reaction kinetic rates.

Fortunately, there are several succinct studies indicating that ERK has a long lifetime (48–72 hours) [199, 200]. They also report that there is no concrete evidence for feedback mechanisms (or stimulus-induced changes) regulating its protein expression, plus they indicate that ERK displays high physiological protein levels [198]. These observations allows us to treat the collection of ERK molecules as a pool of ready-to-use kinases; in other words, the availability of ERK is not a limiting factor for cell-cell communication, rather its influence on the ICM specification process is controlled upstream of the release of FGF4 into the extracellular environment. Therefore, there is a need for tuning the reaction rates of ERK activation and deactivation.

To recap: we chose τp,ERK = τp,I-ERK = τp,A-ERK = 48 hours; kpho,A-ERK (phosphorylation) and kdoh,I-ERK (dephosphorylation) are free parameters. Eq (7) encapsulates the ERK activation and inactivation reaction dynamics. (7)

NANOG phosphorylation and dephosphorylation.

The Nanog gene plays a key role in the regulation of pluripotency, self-renewal, and differentiation potential in human/mouse ESCs as well as early-embryo development [105, 113, 179]. Thus, tight control of Nanog expression is of relevance for the correct progression of such developmental systems. This control is partially executed by ERK via two complementary mechanisms working at distinct scales. Firstly, A-ERK can recruit other proteins to the Nanog locus and repress its transcriptional activity [105]. Along with this inhibition of Nanog transcription, A-ERK phosphorylates NANOG, which directly leads to a reduction of its protein stability and an increase of its degradation rate [105, 179]. All together, these observations suggest that this NANOG-regulation motif is indirectly orchestrated by NANOG itself. In this sense, ERK-mediated control of NANOG protein levels can be seen as a negative autocrine feedback loop (indirect autorepression), which might emerge following high FGF signaling. As we mentioned before, NANOG phosphorylation by A-ERK decreases/halves its lifetime, which in consequence implicitly decreases/halves its average steady-state protein copy number () [113]. However, we do not have access to any other concrete data about the kinetic parameters related to this process. For this reason, the transition rates between NANOG and P-NANOG must enter our model as inferable parameters.

In short, τp,P-NANOG = 1 hour is a fixed value, while kpho,P-NANOG and kdoh,NANOG are both free parameters. Eq (8) portrays the NANOG phosphorylation and dephosphorylation reaction set. (8)

Model at tissue scale.

Technically, all events (reactions or diffusive jumps) share the same essential characteristics under the SSA formulation. However, we make an explicit distinction between reaction events and jump-diffuse events for two reasons: first, it is a convenient abstraction for distinguishing multiple temporal and spatial scales; second, their respective rates are calculated or estimated in contrasting manners because they depend on distinct features of our system model. Consequently, we have separated the focal bulk of the signaling model from the other reaction groups, and we treat it as a cohesive submodel at tissue level.

The most fundamental idea surrounding our signaling-model approach concerns the concept of conservation of molecular flux. In simple words, once the signaling molecule undergoes a jump-diffusion event, the probability of arrival over each of the cell neighbors (which includes the origin cell itself) must be a conserved feature. This conserved feature depends on the neighborhood configuration and has to be calculated for each cell. For our system model, the full ICM neighborhood representation is a rectangular voxel grid with a one-cell thickness, which mimics a monolayer or 2D cellular culture; each voxel emulates an embryonic cell.

In this sense, two cells (voxels) are categorized as first-degree neighbors only when they share a complete face; they can communicate directly with each other. In other words, if they only share a single edge (2 adjacent vertices), then they are not categorized as first-degree neighbors (any communication occurs indirectly between them); however, they are categorized as second-degree neighbors. This requisite implies that a particular cell must have strictly 2, 3, or 4 members within its first-degree neighborhood (which does not include itself). Thus, by recursion, it is easy to construct high-order neighborhoods; analogously, it is easy to identify low-degree neighborhood decompositions. By definition, each cell by itself forms a zero-degree neighborhood.

We model the release of FGF4 as a jump-diffuse event. When FGF4 molecules experience diffusive jumps, they cooperatively act as a biochemical signal, which is transduced by the respective FGFRs into an intracellular response involving multifold messenger molecules. This signal ultimately stimulates gene expression changes which consequently promote cellular adaptability [139]. For our system model, FGF4 signaling manifests via autocrine and paracrine feedback loops. We also have included, for completeness, a third FGF4-signaling mode: membrane-level exchange of ligand molecules. Although, to the best of our knowledge, there is no experimental study of this auxiliary signaling mode yet, it is natural to think in terms of our modeling framework that once a molecule of FGF4 is available at the cellular membrane, it can experience ligand internalization together with its receptor, or it can sustain diffusive jumps between neighboring membranes. Accordingly, these two premises enter our final reaction set.

We assume that there is a primary flux transporting/releasing FGF4 molecules into the extracellular domain. For our implementation, this molecular flux is modeled as a first-degree process whose reaction rate depends on factors such as cellular geometry, spatial distribution of molecular escape channels, intracellular diffusion constant of FGF4, and many other features. We decided to treat this transport/release rate as a free model parameter because we only have access to some well-informed estimates for its bounds. These bounds are based on several studies of first-passage time distributions for related theoretical problems [210, 211].

Subsequently, the FGF4 stream is split into two secondary fluxes. This split is controlled through the free-parameter relationship χauto + χpara = 1, which dictates the probability ratio of ligand binding to its origin cell (χauto) or one of its neighboring cells (χpara). In short, the exit rate of FGF4 molecules (χescape) eventually gives raise to two complementary signaling channels: autocrine and paracrine loops.

In a similar fashion, the exchange rate of FGF4 between cell membranes (kexchange) is assumed to be limited by ligand-receptor affinity, just like for the case of FGFR monomerization [207]. As such, generous bounds for this rate were placed and its concrete value is treated as a free parameter.

Autocrine signaling.

The autocrine feedback loop is thought to play an essential role for the Nanog self-regulation. In that regard, the collateral Nanog auto-repression is a self-perpetuating process maintaining physiologically-relevant NANOG levels. This process is fundamental because it allows an EPI cell to enter a state of primed pluripotency which supports the correct developmental progression [105]. Within our model, the rate of autocrine signaling is denoted by kescape,auto, and it is calculated via the formula kescape,auto = χautoρautokescape. Here, the argument ρauto = (1 − χparaρmeme)/ρauto involves the variable ρmeme which quantifies the fraction of surface area shared between a particular cell and its first-degree neighborhood.

Paracrine signaling.

The paracrine feedback loop is deemed to have a critical role in inducing the PRE fate. The dose-dependent upregulation of Fgf4 by NANOG prompts FGF4 paracrine communication, which triggers a reaction cascade concurrently downregulating NANOG and upregulating GATA6 levels [102, 105]. Within our model, the rate of paracrine signaling is denoted by kescape,para, and it is calculated via the formula kescape,para = χparaρmemekescape.

Membrane exchange.

We denote the exchange rate of FGF4 molecules between cellular membranes as kexchange. This FGF4 exchange rate is independent of the primary FGF4 secretion rate (kescape); however, it must be adjusted relative to the actual number of first-degree neighbors for each particular cell. Thus, it is preferable to think of kexchange as the maximal membrane-level FG4-exchange rate, which is only possible when a given cell completely shares all of its faces with other neighbors (ρmeme = 1). In other words, we assume that the practical FGF4-exchange rate kmeme is simply directly proportional to the maximum FGF4-exchange rate kexchange, where 0 ≤ kmemekexchange. This constraint enters our model via the formula kmeme = ρmemekexchange.

To finalize, Eq (9) illustrates the reaction set of FGF4 autocrine, paracrine, and membrane-exchange communication modes. (9)

Inventory of model parameter values.

We provide here two complementary tables summarizing the most significant model parameters presented so far. Table 3 recaps compactly all the fixed values we have either chosen based on well-informed estimates or taken from our literature sources. Table 4 presents a quick view of the model parameters we have categorized as free values and their respective ranges. These parameter ranges were derived from data found across all our literature sources, and they generally represent educated guesses made by collecting information about closely related biological systems. Nonetheless, we largely assumed generous bounds for all these ranges, delegating the search for biophysically-relevant values to our parameter inference scheme.

Model parameter inference framework

The exploration of the immense parameter space of a mechanistic model is a demanding computational task, especially for biophysical problems dealing with spatial-stochastic system representations [135]. Several classical inference/optimization methods such as heuristic tuning, stochastic gradient descent, simulated annealing, and approximate Bayesian computation (ABC) have been traditionally used for finding reasonable parameter sets of biophysical mechanistic models [89, 9597, 212]. Nonetheless, all of these aforementioned methods are often computationally inefficient because they require many expensive simulations for properly scanning the model parameter space, and they commonly lack powerful parameter-space interpolation strategies. As a consequence, these inference methods become prohibitively ineffective when they are applied to high-dimensional spatial-stochastic models, which are nowadays usually employed to represent complex biological systems.

Here, by leveraging our access to high-performance computing (HPC) resources, we take advantage of the sequential neural posterior estimation (SNPE) algorithm and combine it with classical ML concepts, in order to implement a comprehensive model parameter inference framework. The SNPE algorithm is part of a ground-breaking family of likelihood-free inference techniques [86, 87, 89]. These state-of-art simulation-based inference (SBI) algorithms fundamentally rely on artificial neural networks (ANNs) to approximate model parameter probability distributions; the respective ANNs are trained with simulation/synthetic data, but are conditioned on target/experimental observations [89, 95].

In our case, this original framework was applied to infer two individual parameter sets for two separate models, which are capable of recapitulating the most fundamental characteristics of the fully-formed mouse blastocyst. The first model represents the wild-type variant of the underlying biophysical system, and it is our principal system model; in other relevant contexts, we also refer to it as the “inferred-theoretical wild-type” or ITWT. This system has a functional cell-cell communication. The second model is an auxiliary system; all the parameter values of the ITWT are reused for this supplementary system, except for the core GRN components which are reinferred in a loss-of-function mutant setting: this system model has a nonfunctional cell-cell communication. We consequently refer to it as the “reinferred-theoretical mutant” or RTM.

To illustrate the key stages of our exploratory Bayesian inferential framework, we present a concise list of the most important ideas of our workflow. For additional information concerning some general considerations of our model parameter inference framework, please see Fig 1 and S1 Appendix.

Constructing the prior distribution.

A key stage of our workflow is the construction of the model parameter prior distribution. Once every suitable model parameter has an appropriate fixed value (see Tables 2 and 3), it is necessary to condense every belief/intuition about the remaining free model parameters into a reasonable prior distribution. In this work, this prior is a multivariate uniform distribution with 19 dimensions (vector components). Every such vector component has a predefined range; those ranges are derived from educated guesses informed by data on closely related biological systems or found across all our literature sources. We largely assume rich but realistic bounds for all of those ranges (see Table 4), as ultimately the aim is to delegate the search for biophysically-relevant values to our parameter inference scheme. But while the general idea applies to both system models (ITWT and RTM), the RTM prior distribution has only 4 dimensions corresponding to the core GRN motif.

Two of the freely-varying model parameters fulfill a particular role, namely “Mean Initial mRNA Count” and “Mean Initial PROTEIN Count”. These parameters are used to define meaningful initial condition distributions (ICDs), as there is no detailed experimental information currently available about them. In the typical setting of our simulation procedure, they operate together as the mean-value vector of an ICD which is itself a composition of Poissonian and binomial distributions (see Computational experiments); as such, they also dictate the variance matrix of this ICD. From that ICD, we sample the starting Nanog-Gata6 mRNA and NANOG-GATA6 (protein) copy numbers; accordingly, this sampling is performed per simulated cell. In this regard, the only stipulated hard molecular constraint follows from imposing the maximal mean mRNA and protein copy numbers reached at full induction: 250 and 1000 molecules, respectively.

Simulation data generation.

At this stage, we explicitly incorporate two of the basic empirical observations we are aiming at recapitulating into the simulation procedure itself. Highly reproducible ratio of 2: 3 for EPI-PRE lineages [38, 99]; absence of FGF4-mediated signaling (no spatial coupling) forces the ICM to almost exclusively adopt the EPI fate [21, 73]. See also S1 Appendix. Via parallel computing, our simulator generates after each run a composite trajectory for two complementary configurations of our system model: functioning cell signaling, and nonfunctioning cell signaling. When cell signaling is functioning, the targeted proportions are 40% for the EPI population and 60% for the PRE population. Whereas, when cell signaling is nonfunctioning, the targeted proportions are 100% for the EPI population and 0% for the PRE population. While these two configurations are necessary for correctly deriving the ITWT system, the valid derivation of the RTM system requires only one configuration: as cell-cell communication is always turned off for the RTM, here the targeted proportions are 40% for the EPI lineage and 60% for the PRE lineage.

Furthermore, each run should produce at least 48 hours of simulated time to sufficiently capture the pertinent model dynamics; for additional information, see Key stages of mouse embryo preimplantation development. By design, the simulation data generation can be executed independently of any other stage, which allows the production of a sufficiently large trajectory batch per run. This simulation batch size is arbitrary and totally influenced by the available computational resources; in our case, the simulation batch size is generally 100 thousand (composite) trajectories per each ANN training round.

Training data generation.

The raw simulation data is ineffective for training the ANN, because of the high dimensionality and the multiscale character of the generated trajectories. Via sequential data transformations, we create features that can be used for successfully training the ANN. In practical terms, these features are low-dimensional projections of high-dimensional temporal-spatial stochastic dynamics data.

Our simulations are genuinely event-driven and therefore produce trajectories irregularly spaced in time. As a preparative step, we accordingly first resample the simulated data onto a regular time grid. For simplicity, we always generate a resampled time series with a sampling period equal to 0.25 hours (or 15 minutes); we deemed that sampling period to be sufficient for fully characterizing the underlying temporal dynamics on the tissue scale.

All subsequent steps are performed on that regularized time series. Per simulated cell, the resampled time series represents the dynamics of all the involved biochemical species. In order to avoid tracking all of the many biochemical species in our model, we focus on three system observables that characterize the specification process at cell scale: total NANOG, which is the aggregate sum of the available unphosphorylated NANOG and phosphorylated NANOG protein molecular counts; total GATA6, which is simply the GATA6 protein molecular count; total FGF4, which is the aggregate sum of all the available FGF4 proteins at both cytoplasm and membrane levels (including receptor-bound FGF4 molecules). While total FGF4 is only useful for analyzing our system under perturbed conditions, total NANOG and total GATA6 are the guiding drivers of our training data generation: these are the main markers determining the lineage for every cell at each simulated time point; for additional details, check Computational experiments.

We next further group the cell-level observables into a tissue-level one. The key tissue-scale observable variable is the total count for each of the three possible cell fates at each simulated time point. As such, this tissue-level observable is used to define a “pattern score” that meaningfully discriminates the targeted/idealized patterning behavior from undesired patterning behaviors of our system. This constitutes the most important part of our data-transformation pipeline, and it is fully described in the next part.

Constructing the pattern score (objective) function.

To generate the actual ANN training dataset, two closely connected steps are required for bringing it to fruition. First, at every time point for each possible fate, the total cell count is compared against the target cell count. Moreover, there is a specific target cell count per each prescribed configuration of the respective system: for the ITWT, the EPI-PRE-UND lineage target proportions are 40-60-0 percent with functioning spatial coupling, and 100-0-0 percent when spatial coupling is inactivated; the RTM (in which spatial coupling is always inactivated) has the EPI-PRE-UND lineage target proportions of 40-60-0 percent. The resultant is a set of marginal scores, one per system configuration; i.e., two marginal scores for the ITWT, and one for the RTM. Second, all these resulting (marginal) configuration scores must be combined into a discriminatory pattern score, which measures the distance between a particular simulation score and the idealized behavior score.

The (joint) pattern score time series performs as the outcome of a vector-valued objective function whose main inputs are the total cell count and the target cell count per system configuration, for each possible fate at every time point. Therefore, our essential goal is to intelligently optimize (maximize) this objective function. As such, the joint pattern score is the foundational element of the ANN training dataset.

For the first step, in formal terms, let Zt = (Zt,0, Zt,1, Zt,2) be a discrete random vector taking values in (naturals or nonnegative integers), which represents the total cell count for each lineage at a given discrete time point . For simplicity, the three possible cell lineages are arbitrarily indexed as 0 (EPI), 1 (PRE), and 2 (UND). Also, let be a discrete vector representing the target cell count for each lineage, and for a given system configuration which is arbitrarily indexed by m ∈ {0, 1}. The configuration score is thus the continuous random variable St,m taking values in (reals), which is a nonlinear transformation mapping from an absolute-difference vector |ztwm| = (|zt,0wm,0|, |zt,1wm,1|, |zt,2wm,2|) to a point st,m in the closed interval [0, 1]; i.e.: (10) Here , ‖wm1/‖Zt1 = 1, and the expression ‖ ⋅ ‖1 indicates the 1 vector norm.

For the second step, in formal terms, let St,0 and St,1 be the marginal scores for the prescribed configurations of the ITWT; clearly, this step is not applicable to the RTM, as it has only one configuration. While this step is far from being a trivial process, to combine these two marginal scores into one pattern score, we employed an 1-inspired penalty method. This penalty directly affects the sum of the configuration scores, and it is intended to increase the discriminatory power of the joint score by favouring similarly high marginal scores, as described by the following formula: (11) Here, is the penalty term.

By applying St, we construct a time series spanning the last 12-hour window of the predetermined simulation period (from 0 hours to 48 hours), and by producing many such time series, we generate the effective ANN training dataset.

Training the ANN.

The key idea of training an ANN is to use a relatively small number of simulations. The exact number of trajectories generated by the simulator should be determined by aiming at meaningfully balancing both the algorithm’s computational feasibility and the ANN training reliability. This balance enables an effective exploration of the massive parameter space for the proposed system model via the trained ANN [135].

To this end, we exploited a state-of-art SBI toolbox which allowed us to easily integrate these novel AI-powered algorithms into our workflow [137]. Specifically, we employed the SNPE procedure to train a deep neural density estimator which directly estimates the model parameter posterior distribution conditional on a goal observation. As our training dataset is already a convenient latent representation of the simulation data, this stage was relatively straightforward. Furthermore, there was no need to tune the algorithm hyperparameters; the applied SBI toolbox has reasonable preset values for them.

Altogether, we trained ANNs to successfully predict model parameter sets capable of recapitulating the targeted patterning behaviors of the systems under study.

Constructing a synthetic target/goal observation.

At heart, our data-transformation pipeline generates a suitable latent (feature) space representation of the simulated dataset for training the ANN, which strongly facilitates predicting inferred parameter distributions that agree with a prescribed effective behavior in that latent space.

Ultimately, our (synthetic) goal observation is simply a score time series whose support spans the last 12-hour window of the prescribed simulation period; in agreement with one of the basic empirical observations: EPI and PRE populations should reach their expected fate proportions roughly 8 or 12 hours before the end of the preimplantation period [2, 3, 30, 37]. See also S1 Appendix. Moreover, each entry/value of this time series is a “1”: the ideal score of the target patterning behavior.

Selecting a posterior distribution.

To account for the inherent stochasticity of the ANN training algorithm (mini-batch stochastic gradient descent) and the implicit randomness of the pattern score trajectories, we train multiple ANNs with the same dataset. Each trained ANN produces a posterior distribution, and from it we compute the maximum-a-posteriori (MAP) estimate of all the model parameters. Using each distinct MAP estimate, we generate an extra simulation batch with many fresh pattern score time series. Moreover, we construct a pattern score distribution for each of these extra batches. The idea is not only to maximize the accuracy of the underlying system behavior, but also to increase its precision, optimizing its robustness. As such, we measure the quality of each separate MAP estimate and summarize it into a single number, in order to select the best learned posterior distribution of the model parameters conditional on the target observation.

This quality measure of the MAP estimate is described by Eq (12), and it is accordingly referred to as the “meta score” (see also Fig 1). (12) Where ( is the realization of the random variable ), α50 denotes the 50-percentile (or median) vector of a given pattern score distribution at each simulated time point, β = ((α95α5)/2)2 is an element-wise penalty vector favouring high accuracy together with high precision, denotes the corresponding zero vector, and the maximum operator is applied to each component of the input vector pair. Thus, the selected model parameter set has the highest associated (MAP estimate) meta score among the available ones.

Performing multiple rounds of inference.

In general terms, single-round inference is not enough for learning truly-applicable model parameter values, even with a significantly high simulation budget. The problem of amortized inference is the wasteful use of the simulated trajectories: the ANN effectively estimates the posterior distribution for all the possible goal observations across the entire prior space of the model parameters.

If there is only one prominent target observation (just like our case), then it is advantageous to perform multiple sequential rounds of (non-amortized) inference. The model parameter search will be focused on this single target observation. This process can be performed arbitrarily many times until the current-round score time series matches as close as possible the target score time series. In this study, we performed 8 consecutive rounds of inference producing 800 thousand (composite) simulations in total for the ITWT system, and we performed 4 consecutive rounds of inference producing 400 thousand simulations in total for the RTM system.

In this context, it is also worth mentioning that the next-round prior does not necessarily need to be the current-round posterior. The proposal distribution can be iteratively adapted to fully leverage the power of multi-round inference, which is all based on the earliest prior and the latest posterior; as such, it is possible to obtain a well-informed next-round mixture distribution [87]. While we do not exploit this technique, it can be easily integrated into our workflow.

Computational experiments

The initial gene expression profile per cell is commonly sampled from a composition of two distributions: the (first) Poissonian part whose mean-value vector construction is dictated by the two parameters “Mean Initial mRNA Count” and “Mean Initial PROTEIN Count”; the (second) binomial part which fairly splits the respective cellular resources. While it is easy to change the goal biochemical species, we generally target only the ICDs of the two main players: Nanog-Gata6 mRNA and NANOG-GATA6 (protein) copy numbers. As such, every typical simulation starts with reasonably balanced cell resources, which guarantees the UND state across the tissue despite the inherent randomness of the initial conditions.

For the starting test of robustness to initial condition perturbations (ICPs), one extra layer of variability is necessary. Instead of using only a two-element composition, the original ICD now integrates an additional layer which performs uniform sampling of mRNA and protein copy numbers per cell. This uniform perturbation of cellular resources is thus the ICD root component for this testing case, whereas the original layers are the other two remaining elements. Simply put, firstly a random number is uniformly sampled from a particular discrete range ([0, 250] for mRNA and [0, 1000] for protein species), secondly this preceding number is employed as the mean value of a Poisson distribution which is sampled accordingly, and thirdly the eventual molecular count for a particular biochemical species is sampled from a fair binomial distribution whose (independent) trial-number parameter is dictated by the precursory Poissonian layer. It is hence easy to see that this ICP greatly increases the early variance of the goal cell resources.

The cell-fate classification thresholds were purposefully chosen to ease the ICM cellular categorization based on a proper evaluation of the most important GRN relationships. This cell-lineage categorization is performed by comparing NANOG and GATA6 (protein) levels against constant threshold levels. Assuming Poissonian noise, a low NANOG/GATA6 cell classification occurs when the respective protein level is below a threshold of approximately 329 copies (mean basal protein level plus five times its standard deviation), a high NANOG cell classification occurs when the respective protein level is above a threshold of approximately 388 copies (mean full-induction-phosphorylation protein level minus five times its standard deviation), and a high GATA6 cell classification occurs when the respective protein level is above a threshold of approximately 842 copies (mean full-induction protein level minus five times its standard deviation). Thus, the EPI categorization occurs when a cell simultaneously displays the low-GATA6 alongside high-NANOG states, the PRE classification occurs when a cell simultaneously displays the low-NANOG alongside high-GATA6 states, and the UND categorization occurs when a cell displays any other combination of states.

For the actual model parameter inference procedure, a 25-cell grid size was employed consistently. The 100-cell tissue size was used for all the test simulations. Any other cell-grid size was exclusively employed to study noise at tissue level (Fig 5). To see a graphical comparison of every tissue size, please check Fig 13.

thumbnail
Fig 13. Overview of cell-grid sizes employed for inference and simulations.

The 25-cell grid size (purple highlighting) was used invariably for the actual model parameter inference scheme. All test simulations (assessment of system properties such as robustness in wild-type-like and mutated conditions) employed the 100-cell tissue size (dark-blue highlighting). The remaining cell-grid sizes (gray) were solely used to study tissue-level noise (see also Fig 5).

https://doi.org/10.1371/journal.pcbi.1012473.g013

Computational simulation details.

The simulator was fully implemented in the Python programming language. There are multiple such software packages supporting the simulator design: NumPy, Numba, SciPy, PyTorch, SBI, Matplotlib, Seaborn, among many others. All the computational simulations were performed in the general-purpose HPC cluster Goethe-HLR belonging to the Goethe-Universität Frankfurt am Main.

Supporting information

S1 Appendix. AI-powered model parameter inference: analysis, considerations, and future directions.

Summary of inferred model parameter interactions (posterior distribution). General considerations for model parameter inference. Reflection and outlook.

https://doi.org/10.1371/journal.pcbi.1012473.s001

(PDF)

S1 Fig. Summary of inferred core GRN motif interactions (ITWT versus RTM).

The central component of our inference scheme is the sequential neural posterior estimation (SNPE) algorithm. Both unconditional (top row [A, B]) and conditional (bottom row [C, D]) posterior parameter distributions were obtained following 8 consecutive rounds of inference. 800 thousand composite simulations were performed for the inferred-theoretical wild-type (ITWT) system (left column [A, C]). For the reinferred-theoretical mutant (RTM) system (right column [B, D]), 4 consecutive rounds of inference were performed producing 400 thousand simulations. For complete details of the model parameter inference procedure, see Model parameter inference framework. [A, B] Model parameter posterior distribution. For ease of visualization, we only show the one-dimensional projection of all posterior components representing the core GRN motif interactions. [C, D] First assessment of model parameter sensitivity. These panels show the same components as in [A, B] but the posterior is now conditioned on the maximum-a-posteriori probability (MAP) estimate of the model parameters.

https://doi.org/10.1371/journal.pcbi.1012473.s002

(PDF)

S2 Fig. Summary of inferred signaling and other model parameter interactions (ITWT only).

The central component of our inference scheme is the sequential neural posterior estimation (SNPE) algorithm. Both unconditional (top rows [A-D]) and conditional (bottom rows [E-H]) posterior parameter distributions were obtained following 8 consecutive rounds of inference. 800 thousand composite simulations were performed for the ITWT system. For complete details of the model parameter inference procedure, see Model parameter inference framework. [A-D] Model parameter posterior distribution. For ease of visualization, the posterior was arbitrarily partitioned into four distinctive groups. We emphasize the top-right group [B, F], which displays the most important signaling model parameter interactions. [E-H] First assessment of model parameter sensitivity. These panels show the same components as in [A-D] but the posterior is now conditioned on the maximum-a-posteriori probability (MAP) estimate of the model parameters.

https://doi.org/10.1371/journal.pcbi.1012473.s003

(PDF)

Acknowledgments

The successful completion of this research project owes much to the collaboration and support of esteemed colleagues and collaborators. We extend our deepest gratitude to Sabine Fischer, Tim Liebisch, Franziska Matthäus, and Simon Schardt for their pivotal roles in fostering insightful discussions and providing constructive feedback.

We also express our appreciation to Roberto Covino and his lab for their invaluable guidance and expertise throughout the project, especially for pointing us in the direction of the powerful SBI framework. Their thoughtful insights and constructive critiques have enriched the quality of our work.

Furthermore, we would like to acknowledge the Center for Scientific Computing (CSC) at Goethe University Frankfurt for granting us access to the Goethe-HLR cluster, which has been instrumental in facilitating the progress of this research.

References

  1. 1. Nissen SB, Perera M, Gonzalez JM, Morgani SM, Jensen MH, Sneppen K, et al. Four simple rules that are sufficient to generate the mammalian blastocyst. PLOS Biology. 2017;15(7):e2000737. pmid:28700688
  2. 2. Płusa B, Piliszek A. Common principles of early mammalian embryo self-organisation. Development. 2020;147(dev183079). pmid:32699138
  3. 3. Saiz N, Hadjantonakis AK. Coordination between patterning and morphogenesis ensures robustness during mouse development. Philosophical Transactions of the Royal Society B. 2020. pmid:32829684
  4. 4. Iturbide A, Torres-Padilla ME. A cell in hand is worth two in the embryo: recent advances in 2-cell like cell reprogramming. Current Opinion in Genetics & Development. 2020;64:26–30. pmid:32599301
  5. 5. Zhu M, Zernicka-Goetz M. Principles of Self-Organization of the Mammalian Embryo. Cell. 2020;183(6):1467–1478. pmid:33306953
  6. 6. Kim J, Koo BK, Knoblich JA. Human organoids: model systems for human biology and medicine. Nature Reviews Molecular Cell Biology. 2020;21(10):571–584. pmid:32636524
  7. 7. Yeh CY, Huang WH, Chen HC, Meir YJJ. Capturing Pluripotency and Beyond. Cells. 2021;10(12):3558. pmid:34944066
  8. 8. Saiz N, Mora-Bitria L, Rahman S, George H, Herder JP, Garcia-Ojalvo J, et al. Growth-factor-mediated coupling between lineage size and cell fate choice underlies robustness of mammalian development. eLife. 2020;9. pmid:32720894
  9. 9. Raina D, Bahadori A, Stanoev A, Protzek M, Koseska A, Schröter C. Cell-cell communication through FGF4 generates and maintains robust proportions of differentiated cell types in embryonic stem cells. Development. 2021;148(21):dev199926. pmid:34651174
  10. 10. Fischer SC, Schardt S, Lilao-Garzón J, Muñoz-Descalzo S. The salt-and-pepper pattern in mouse blastocysts is compatible with signaling beyond the nearest neighbors. iScience. 2023;26(11). pmid:37915595
  11. 11. Cang Z, Wang Y, Wang Q, Cho KWY, Holmes W, Nie Q. A multiscale model via single-cell transcriptomics reveals robust patterning mechanisms during early mammalian embryo development. PLOS Computational Biology. 2021;17(3):e1008571. pmid:33684098
  12. 12. Krupinski P, Chickarmane V, Peterson C. Simulating the Mammalian Blastocyst—Molecular and Mechanical Interactions Pattern the Embryo. PLOS Computational Biology. 2011;7(5):e1001128. pmid:21573197
  13. 13. Rossant J, Tam PPL. Blastocyst lineage formation, early embryonic asymmetries and axis patterning in the mouse. Development. 2009;136(5):701–713. pmid:19201946
  14. 14. Li L, Zheng P, Dean J. Maternal control of early mouse development. Development (Cambridge, England). 2010;137(6):859–870. pmid:20179092
  15. 15. Tosenberger A, Gonze D, Bessonnard S, Cohen-Tannoudji M, Chazaud C, Dupont G. A multiscale model of early cell lineage specification including cell division. npj Systems Biology and Applications. 2017;3(1):1–11. pmid:28649443
  16. 16. Tosenberger A, Gonze D, Chazaud C, Dupont G. Computational models for the dynamics of early mouse embryogenesis. International Journal of Developmental Biology. 2019;63(3-4-5):131–142. pmid:31058292
  17. 17. Habibi E, Stunnenberg HG. Transcriptional and epigenetic control in mouse pluripotency: lessons from in vivo and in vitro studies. Current Opinion in Genetics & Development. 2017;46:114–122. pmid:28763675
  18. 18. Arias AM, Nichols J, Schröter C. A molecular basis for developmental plasticity in early mammalian embryos. Development. 2013;140(17):3499–3510.
  19. 19. Herberg M, Roeder I. Computational modelling of embryonic stem-cell fate control. Development. 2015;142(13):2250–2260. pmid:26130756
  20. 20. Miyamoto T, Furusawa C, Kaneko K. Pluripotency, Differentiation, and Reprogramming: A Gene Expression Dynamics Model with Epigenetic Feedback Regulation. PLOS Computational Biology. 2015;11(8):e1004476. pmid:26308610
  21. 21. Bessonnard S, Coqueran S, Vandormael-Pournin S, Dufour A, Artus J, Cohen-Tannoudji M. ICM conversion to epiblast by FGF/ERK inhibition is limited in time and requires transcription and protein degradation. Scientific Reports. 2017;7(1):12285. pmid:28947813
  22. 22. Stanoev A, Schröter C, Koseska A. Robustness and timing of cellular differentiation through population-based symmetry breaking. Development. 2021;148(3):dev197608. pmid:33472845
  23. 23. Robert C, Prista von Bonhorst F, De Decker Y, Dupont G, Gonze D. Initial source of heterogeneity in a model for cell fate decision in the early mammalian embryo. Interface Focus. 2022;12(4):20220010. pmid:35865503
  24. 24. Mathew B, Muñoz-Descalzo S, Corujo-Simon E, Schröter C, Stelzer EHK, Fischer SC. Mouse ICM Organoids Reveal Three-Dimensional Cell Fate Clustering. Biophysical Journal. 2019;116(1):127–141. pmid:30514631
  25. 25. Liebisch T, Drusko A, Mathew B, Stelzer EHK, Fischer SC, Matthäus F. Cell fate clusters in ICM organoids arise from cell fate heredity and division: a modelling approach. Scientific Reports. 2020;10(1):22405. pmid:33376253
  26. 26. Zhu R, del Rio-Salgado JM, Garcia-Ojalvo J, Elowitz MB. Synthetic multistability in mammalian cells. Science. 2022;375(6578):eabg9765. pmid:35050677
  27. 27. Zernicka-Goetz M. Cleavage pattern and emerging asymmetry of the mouse embryo. Nature Reviews Molecular Cell Biology. 2005;6(12):919–928. pmid:16341078
  28. 28. Ryan AQ, Chan CJ, Graner F, Hiiragi T. Lumen Expansion Facilitates Epiblast-Primitive Endoderm Fate Specification during Mouse Blastocyst Formation. Developmental Cell. 2019;51(6):684–697.e4. pmid:31735667
  29. 29. Simon CS, Hadjantonakis AK, Schröter C. Making lineage decisions with biological noise: Lessons from the early mouse embryo. WIREs Developmental Biology. 2018;7(4):e319. pmid:29709110
  30. 30. Allègre N, Chauveau S, Dennis C, Renaud Y, Meistermann D, Estrella LV, et al. NANOG initiates epiblast fate through the coordination of pluripotency genes expression. Nature Communications. 2022;13(1):3550. pmid:35729116
  31. 31. Meilhac SM, Adams RJ, Morris SA, Danckaert A, Le Garrec JF, Zernicka-Goetz M. Active cell movements coupled to positional induction are involved in lineage segregation in the mouse blastocyst. Developmental Biology. 2009;331(2):210–221. pmid:19422818
  32. 32. Chen Q, Shi J, Tao Y, Zernicka-Goetz M. Tracing the origin of heterogeneity and symmetry breaking in the early mammalian embryo. Nature Communications. 2018;9(1):1819. pmid:29739935
  33. 33. Schultz RM, Stein P, Svoboda P. The oocyte-to-embryo transition in mouse: past, present, and future. Biology of Reproduction. 2018;99(1):160–174. pmid:29462259
  34. 34. Niwayama R, Moghe P, Liu YJ, Fabrèges D, Buchholz F, Piel M, et al. A Tug-of-War between Cell Shape and Polarity Controls Division Orientation to Ensure Robust Patterning in the Mouse Blastocyst. Developmental Cell. 2019;51(5):564–574.e6. pmid:31735668
  35. 35. Fischer SC, Corujo-Simon E, Lilao-Garzon J, Stelzer EHK, Muñoz-Descalzo S. The transition from local to global patterns governs the differentiation of mouse blastocysts. PLoS ONE. 2020;15(5). pmid:32413083
  36. 36. Chan CJ, Hiiragi T. Integration of luminal pressure and signalling in tissue self-organization. Development. 2020;147(5):dev181297. pmid:32122910
  37. 37. Yanagida A, Corujo-Simon E, Revell CK, Sahu P, Stirparo GG, Aspalter IM, et al. Cell surface fluctuations regulate early embryonic lineage sorting. Cell. 2022;185(5):777–793.e20. pmid:35196500
  38. 38. Bessonnard S, De Mot L, Gonze D, Barriol M, Dennis C, Goldbeter A, et al. Gata6, Nanog and Erk signaling control cell fate in the inner cell mass through a tristable regulatory network. Development. 2014;141(19):3637–3648. pmid:25209243
  39. 39. Schrode N. Regulation of cell fate choice in the mouse blastocyst stage embryo [PhD Thesis]. Ludwig-Maximilians-Universität München; 2015. Available from: https://edoc.ub.uni-muenchen.de/18938/.
  40. 40. Boroviak T, Nichols J. The birth of embryonic pluripotency. Philosophical Transactions of the Royal Society B: Biological Sciences. 2014;369 (1657). pmid:25349450
  41. 41. Schröter C, Rué P, Mackenzie JP, Martinez Arias A. FGF/MAPK signaling sets the switching threshold of a bistable circuit controlling cell fate decisions in embryonic stem cells. Development. 2015;142(24):4205–4216. pmid:26511924
  42. 42. Raina D, Fabris F, Morelli LG, Schröter C. Intermittent ERK oscillations downstream of FGF in mouse embryonic stem cells. Development. 2022;149(4):dev199710. pmid:35175328
  43. 43. Plachta N, Bollenbach T, Pease S, Fraser SE, Pantazis P. Oct4 kinetics predict cell lineage patterning in the early mammalian embryo. Nature Cell Biology. 2011;13(2):117–123. pmid:21258368
  44. 44. Krawczyk K, Wilczak K, Szczepańska K, Maleszewski M, Suwińska A. Paracrine interactions through FGFR1 and FGFR2 receptors regulate the development of preimplantation mouse chimaeric embryo. Open Biology. 2022;12(11):220193. pmid:36382369
  45. 45. Burda Z, Krzywicki A, Martin OC, Zagorski M. Motifs emerge from function in model gene regulatory networks. Proceedings of the National Academy of Sciences. 2011;108(42):17263–17268. pmid:21960444
  46. 46. Sokolowski TR, Erdmann T, Wolde PRt. Mutual Repression Enhances the Steepness and Precision of Gene Expression Boundaries. PLOS Computational Biology. 2012;8(8):e1002654. pmid:22956897
  47. 47. Sokolowski TR, Gregor T, Bialek W, Tkačik G. Deriving a genetic regulatory network from an optimization principle; 2023. Available from: http://arxiv.org/abs/2302.05680.
  48. 48. Majka M, Ho RDJG, Zagorski M. Stability of pattern formation in systems with dynamic source regions. Physical Review Letters. 2023;130(9):098402. pmid:36930916
  49. 49. Majka M, Becker NB, Wolde PRt, Zagorski M, Sokolowski TR. Stable developmental patterns of gene expression without morphogen gradients; 2023. Available from: http://arxiv.org/abs/2306.00537.
  50. 50. Gregor T, Wieschaus EF, McGregor AP, Bialek W, Tank DW. Stability and Nuclear Dynamics of the Bicoid Morphogen Gradient. Cell. 2007;130(1):141–152. pmid:17632061
  51. 51. Bollenbach T, Pantazis P, Kicheva A, Bökel C, González-Gaitán M, Jülicher F. Precision of the Dpp gradient. Development. 2008;135(6):1137–1146. pmid:18296653
  52. 52. Little SC, Tkačik G, Kneeland TB, Wieschaus EF, Gregor T. The Formation of the Bicoid Morphogen Gradient Requires Protein Movement from Anteriorly Localized mRNA. PLOS Biology. 2011;9(3):e1000596. pmid:21390295
  53. 53. Richards DM, Saunders TE. Spatiotemporal Analysis of Different Mechanisms for Interpreting Morphogen Gradients. Biophysical Journal. 2015;108(8):2061–2073. pmid:25902445
  54. 54. Smith T, Fancher S, Levchenko A, Nemenman I, Mugler A. Role of spatial averaging in multicellular gradient sensing. Physical Biology. 2016;13(3):035004. pmid:27203129
  55. 55. Ellison D, Mugler A, Brennan MD, Lee SH, Huebner RJ, Shamir ER, et al. Cell–cell communication enhances the capacity of cell ensembles to sense shallow gradients during morphogenesis. Proceedings of the National Academy of Sciences. 2016;113(6):E679–E688. pmid:26792522
  56. 56. Zagorski M, Tabata Y, Brandenberg N, Lutolf MP, Tkačik G, Bollenbach T, et al. Decoding of position in the developing neural tube from antiparallel morphogen gradients. Science. 2017;356(6345):1379–1383. pmid:28663499
  57. 57. Verd B, Crombach A, Jaeger J. Dynamic Maternal Gradients Control Timing and Shift-Rates for Drosophila Gap Gene Expression. PLOS Computational Biology. 2017;13(2):e1005285. pmid:28158178
  58. 58. Vakulenko S, Manu , Reinitz J, Radulescu O. Size Regulation in the Segmentation of Drosophila: Interacting Interfaces between Localized Domains of Gene Expression Ensure Robust Spatial Patterning. Physical Review Letters. 2009;103(16):168102. pmid:19911861
  59. 59. Kicheva A, Bollenbach T, Ribeiro A, Valle HP, Lovell-Badge R, Episkopou V, et al. Coordination of progenitor specification and growth in mouse and chick spinal cord. Science. 2014;345(6204):1254927. pmid:25258086
  60. 60. Raspopovic J, Marcon L, Russo L, Sharpe J. Digit patterning is controlled by a Bmp-Sox9-Wnt Turing network modulated by morphogen gradients. Science. 2014;345(6196):566–570. pmid:25082703
  61. 61. Almuedo-Castillo M, Bläßle A, Mörsdorf D, Marcon L, Soh GH, Rogers KW, et al. Scale-invariant patterning by size-dependent inhibition of Nodal signalling. Nature Cell Biology. 2018;20(9):1032–1042. pmid:30061678
  62. 62. Verd B, Clark E, Wotton KR, Janssens H, Jiménez-Guri E, Crombach A, et al. A damped oscillator imposes temporal order on posterior gap gene expression in Drosophila. PLOS Biology. 2018;16(2):e2003174. pmid:29451884
  63. 63. Morales JS, Raspopovic J, Marcon L. From embryos to embryoids: How external signals and self-organization drive embryonic development. Stem Cell Reports. 2021;16(5):1039–1050. pmid:33979592
  64. 64. Nikolić M, Antonetti V, Liu F, Muhaxheri G, Petkova MD, Scheeler M, et al. Scale invariance in early embryonic development; 2023. Available from: http://arxiv.org/abs/2312.17684.
  65. 65. Guo G, Huss M, Tong GQ, Wang C, Sun LL, Clarke ND, et al. Resolution of Cell Fate Decisions Revealed by Single-Cell Gene Expression Analysis from Zygote to Blastocyst. Developmental Cell. 2010;18(4):675–685. pmid:20412781
  66. 66. Ohnishi Y, Huber W, Tsumura A, Kang M, Xenopoulos P, Kurimoto K, et al. Cell-to-cell expression variability followed by signal reinforcement progressively segregates early mouse lineages. Nature cell biology. 2014;16(1):27–37. pmid:24292013
  67. 67. Kang M, Garg V, Hadjantonakis AK. Lineage Establishment and Progression within the Inner Cell Mass of the Mouse Blastocyst Requires FGFR1 and FGFR2. Developmental Cell. 2017;41(5):496–510.e5. pmid:28552559
  68. 68. Mohammed H, Hernando-Herraez I, Savino A, Scialdone A, Macaulay I, Mulas C, et al. Single-Cell Landscape of Transcriptional Heterogeneity and Cell Fate Decisions during Mouse Early Gastrulation. Cell Reports. 2017;20(5):1215–1228. pmid:28768204
  69. 69. Morgani SM, Saiz N, Garg V, Raina D, Simon CS, Kang M, et al. A Sprouty4 reporter to monitor FGF/ERK signaling activity in ESCs and mice. Developmental Biology. 2018;441(1):104–126. pmid:29964027
  70. 70. Garg V, Yang Y, Nowotschin S, Setty M, Kuo YY, Sharma R, et al. Single-cell analysis of bidirectional reprogramming between early embryonic states reveals mechanisms of differential lineage plasticities; 2023. Available from: https://www.biorxiv.org/content/10.1101/2023.03.28.534648v1.
  71. 71. Ochiai H, Sugawara T, Sakuma T, Yamamoto T. Stochastic promoter activation affects Nanog expression variability in mouse embryonic stem cells. Scientific Reports. 2014;4(1):7125. pmid:25410303
  72. 72. Ochiai H, Hayashi T, Umeda M, Yoshimura M, Harada A, Shimizu Y, et al. Genome-wide kinetic properties of transcriptional bursting in mouse embryonic stem cells. Science Advances. 2020;6(25):eaaz6699. pmid:32596448
  73. 73. Thompson JJ, Lee DJ, Mitra A, Frail S, Dale RK, Rocha PP. Extensive co-binding and rapid redistribution of NANOG and GATA6 during emergence of divergent lineages. Nature Communications. 2022;13(1):4257. pmid:35871075
  74. 74. Cembrowski MS. Single-cell transcriptomics as a framework and roadmap for understanding the brain. Journal of Neuroscience Methods. 2019;326:108353. pmid:31351971
  75. 75. Jashnsaz H, Fox ZR, Hughes JJ, Li G, Munsky B, Neuert G. Diverse Cell Stimulation Kinetics Identify Predictive Signal Transduction Models. iScience. 2020;23(10). pmid:33083733
  76. 76. Calia GP, Chen X, Zuckerman B, Weinberger LS. Comparative analysis between single-cell RNA-seq and single-molecule RNA FISH indicates that the pyrimidine nucleobase idoxuridine (IdU) globally amplifies transcriptional noise; 2023. Available from: https://www.biorxiv.org/content/10.1101/2023.03.14.532632v1.
  77. 77. Qiu C, Martin BK, Welsh IC, Daza RM, Le TM, Huang X, et al. A single-cell time-lapse of mouse prenatal development from gastrula to birth. Nature. 2024;626(8001):1084–1093. pmid:38355799
  78. 78. Gonze D, Gérard C, Wacquier B, Woller A, Tosenberger A, Goldbeter A, et al. Modeling-Based Investigation of the Effect of Noise in Cellular Systems. Frontiers in Molecular Biosciences. 2018;5. pmid:29707543
  79. 79. Tkačik G, Gregor T. The many bits of positional information. Development. 2021;148(dev176065). pmid:33526425
  80. 80. Lin YT, Hufton PG, Lee EJ, Potoyan DA. A stochastic and dynamical view of pluripotency in mouse embryonic stem cells. PLoS Computational Biology. 2018;14(2). pmid:29451874
  81. 81. Lin Y, Elowitz M. Central Dogma Goes Digital. Molecular Cell. 2016;61(6):791–792. pmid:26990983
  82. 82. Vandevenne M, Delmarcelle M, Galleni M. RNA Regulatory Networks as a Control of Stochasticity in Biological Systems. Frontiers in Genetics. 2019;10:403. pmid:31134128
  83. 83. Pantazis P, Bollenbach T. Transcription factor kinetics and the emerging asymmetry in the early mammalian embryo. Cell Cycle. 2012;11(11):2055–2058. pmid:22580473
  84. 84. Bezeljak U, Loya H, Kaczmarek B, Saunders TE, Loose M. Stochastic activation and bistability in a Rab GTPase regulatory network. Proceedings of the National Academy of Sciences. 2020;117(12):6540–6549.
  85. 85. Dirk R, Fischer JL, Schardt S, Ankenbrand MJ, Fischer SC. Recognition and reconstruction of cell differentiation patterns with deep learning. PLOS Computational Biology. 2023;19(10):e1011582. pmid:37889897
  86. 86. Greenberg DS, Nonnenmacher M, Macke JH. Automatic Posterior Transformation for Likelihood-Free Inference; 2019. Available from: http://arxiv.org/abs/1905.07488.
  87. 87. Deistler M, Goncalves PJ, Macke JH. Truncated proposals for scalable and hassle-free simulation-based inference; 2022. Available from: http://arxiv.org/abs/2210.04815.
  88. 88. Baker RE, Peña JM, Jayamohan J, Jérusalem A. Mechanistic models versus machine learning, a fight worth fighting for the biological community? Biology Letters. 2018;14(5):20170660. pmid:29769297
  89. 89. Cranmer K, Brehmer J, Louppe G. The frontier of simulation-based inference. Proceedings of the National Academy of Sciences. 2020;117(48):30055–30062. pmid:32471948
  90. 90. Lagergren JH, Nardini JT, Baker RE, Simpson MJ, Flores KB. Biologically-informed neural networks guide mechanistic modeling from sparse experimental data. PLOS Computational Biology. 2020;16(12):e1008462. pmid:33259472
  91. 91. Seyboldt R, Lavoie J, Henry A, Vanaret J, Petkova MD, Gregor T, et al. Latent space of a small genetic network: Geometry of dynamics and information. Proceedings of the National Academy of Sciences. 2022;119(26):e2113651119. pmid:35737842
  92. 92. Perez SM, Sailem H, Baker RE. Efficient Bayesian inference for mechanistic modelling with high-throughput data. PLOS Computational Biology. 2022;18(6):e1010191.
  93. 93. Tolley N, Rodrigues PLC, Gramfort A, Jones SR. Methods and considerations for estimating parameters in biophysically detailed neural models with simulation based inference. PLOS Computational Biology. 2024;20(2):e1011108. pmid:38408099
  94. 94. Hashemi M, Vattikonda AN, Jha J, Sip V, Woodman MM, Bartolomei F, et al. Amortized Bayesian inference on generative dynamical network models of epilepsy using deep neural density estimators. Neural Networks. 2023. pmid:37060871
  95. 95. Stillman NR, Mayor R. Generative models of morphogenesis in developmental biology. Seminars in Cell & Developmental Biology. 2023;147:83–90. pmid:36754751
  96. 96. Schnoerr D, Sanguinetti G, Grima R. Approximation and inference methods for stochastic biochemical kinetics—a tutorial review. Journal of Physics A: Mathematical and Theoretical. 2017;50(9):093001.
  97. 97. Franzin A, Stützle T. A landscape-based analysis of fixed temperature and simulated annealing. European Journal of Operational Research. 2023;304(2):395–410.
  98. 98. Ramirez-Sierra MA, Sokolowski TR. Comparing AI versus Optimization Workflows for Simulation-Based Inference of Spatial-Stochastic Systems; 2024. Available from: http://arxiv.org/abs/2407.10938.
  99. 99. Saiz N, Williams KM, Seshan VE, Hadjantonakis AK. Asynchronous fate decisions by single cells collectively ensure consistent lineage composition in the mouse blastocyst. Nature Communications. 2016;7. pmid:27857135
  100. 100. Gonçalves PJ, Lueckmann JM, Deistler M, Nonnenmacher M, Öcal K, Bassetto G, et al. Training deep neural density estimators to identify mechanistic models of neural dynamics. eLife. 2020;9:e56261. pmid:32940606
  101. 101. Boelts J, Lueckmann JM, Gao R, Macke JH. Flexible and efficient simulation-based inference for models of decision-making. eLife. 2022;11:e77220. pmid:35894305
  102. 102. Meng Y, Moore R, Tao W, Smith ER, Tse JD, Caslini C, et al. GATA6 phosphorylation by Erk1/2 propels exit from pluripotency and commitment to primitive endoderm. Developmental Biology. 2018;436(1):55–65. pmid:29454706
  103. 103. Abranches E, Guedes AMV, Moravec M, Maamar H, Svoboda P, Raj A, et al. Stochastic NANOG fluctuations allow mouse embryonic stem cells to explore pluripotency. Development. 2014;141(14):2770–2779. pmid:25005472
  104. 104. Xenopoulos P, Kang M, Puliafito A, Di Talia S, Hadjantonakis AK. Heterogeneities in Nanog Expression Drive Stable Commitment to Pluripotency in the Mouse Blastocyst. Cell Reports. 2015;10(9):1508–1520. pmid:25753417
  105. 105. Kale HT, Rajpurohit RS, Jana D, Vishnu VV, Srivastava M, Mourya PR, et al. A NANOG-pERK reciprocal regulatory circuit regulates Nanog autoregulation and ERK signaling dynamics. EMBO reports. 2022;23(11):e54421. pmid:36066347
  106. 106. Erdmann T, Howard M, ten Wolde PR. Role of Spatial Averaging in the Precision of Gene Expression Patterns. Physical Review Letters. 2009;103(25):258101. pmid:20366291
  107. 107. Sokolowski TR, Tkačik G. Optimizing information flow in small genetic networks. IV. Spatial coupling. Physical Review E. 2015;91(6):062710. pmid:26172739
  108. 108. Fancher S, Mugler A. Fundamental Limits to Collective Concentration Sensing in Cell Populations. Physical Review Letters. 2017;118(7):078101. pmid:28256844
  109. 109. Stanoev A, Koseska A. Robust cell identity specifications through transitions in the collective state of growing developmental systems. Current Opinion in Systems Biology. 2022;31:100437.
  110. 110. Gautier A, Gauron C, Volovitch M, Bensimon D, Jullien L, Vriz S. How to control proteins with light in living systems. Nature Chemical Biology. 2014;10(7):533–541. pmid:24937071
  111. 111. Perkins ML, Benzinger D, Arcak M, Khammash M. Cell-in-the-loop pattern formation with optogenetically emulated cell-to-cell signaling. Nature Communications. 2020;11(1):1355. pmid:32170129
  112. 112. Mitra ED, Hlavacek WS. Parameter estimation and uncertainty quantification for systems biology models. Current Opinion in Systems Biology. 2019;18:9–18. pmid:32719822
  113. 113. Kim SH, Kim MO, Cho YY, Yao K, Kim DJ, Jeong CH, et al. ERK1 phosphorylates Nanog to regulate protein stability and stem cell self-renewal. Stem Cell Research. 2014;13(1):1–11. pmid:24793005
  114. 114. Deathridge J, Antolović V, Parsons M, Chubb JR. Live imaging of ERK signalling dynamics in differentiating mouse embryonic stem cells. Development. 2019;146(12):dev172940. pmid:31064783
  115. 115. Müller E, Wang W, Qiao W, Bornhäuser M, Zandstra PW, Werner C, et al. Distinguishing autocrine and paracrine signals in hematopoietic stem cell culture using a biofunctional microcavity platform. Scientific Reports. 2016;6(1):31951. pmid:27535453
  116. 116. Rojas V, Larrondo LF. Coupling Cell Communication and Optogenetics: Implementation of a Light-Inducible Intercellular System in Yeast. ACS Synthetic Biology. 2023;12(1):71–82. pmid:36534043
  117. 117. De Mot L, Gonze D, Bessonnard S, Chazaud C, Goldbeter A, Dupont G. Cell Fate Specification Based on Tristability in the Inner Cell Mass of Mouse Blastocysts. Biophysical Journal. 2016;110(3):710–722. pmid:26840735
  118. 118. Schardt S, Fischer SC. Adjusting the range of cell–cell communication enables fine-tuning of cell fate patterns from checkerboard to engulfing. Journal of Mathematical Biology. 2023;87(4):54. pmid:37679573
  119. 119. Forsyth JE, Al-Anbaki AH, Fuente Rdl, Modare N, Perez-Cortes D, Rivera I, et al. IVEN: A quantitative tool to describe 3D cell position and neighbourhood reveals architectural changes in FGF4-treated preimplantation embryos. PLOS Biology. 2021;19(7):e3001345. pmid:34310594
  120. 120. Paulsson J, Berg OG, Ehrenberg M. Stochastic focusing: Fluctuation-enhanced sensitivity of intracellular regulation. Proceedings of the National Academy of Sciences. 2000;97(13):7148–7153. pmid:10852944
  121. 121. Swain PS, Elowitz MB, Siggia ED. Intrinsic and extrinsic contributions to stochasticity in gene expression. Proceedings of the National Academy of Sciences. 2002;99(20):12795–12800. pmid:12237400
  122. 122. Eldar A, Elowitz MB. Functional roles for noise in genetic circuits. Nature. 2010;467(7312):167–173. pmid:20829787
  123. 123. Tsimring LS. Noise in Biology. Reports on progress in physics Physical Society (Great Britain). 2014;77(2):026601. pmid:24444693
  124. 124. Cuesta FA, Guerberoff G, Rojo AL. Bernoulli and binomial proliferation on evolutionary graphs. Journal of Theoretical Biology. 2022;534:110942. pmid:34717934
  125. 125. Benzinger D, Ovinnikov S, Khammash M. Synthetic gene networks recapitulate dynamic signal decoding and differential gene expression. Cell Systems. 2022;13(5):353–364.e6. pmid:35298924
  126. 126. Frei T, Chang CH, Filo M, Arampatzis A, Khammash M. A genetic mammalian proportional–integral feedback control circuit for robust and precise gene regulation. Proceedings of the National Academy of Sciences. 2022;119(24):e2122132119. pmid:35687671
  127. 127. Briat C, Khammash M. Noise in Biomolecular Systems: Modeling, Analysis, and Control Implications. Annual Review of Control, Robotics, and Autonomous Systems. 2023;6(1):283–311.
  128. 128. Raina D. FGF4 drives intermittent oscillations of ERK activity in mouse embryonic stem cells [PhD Thesis]. Technische Universität Dortmund; 2021. Available from: https://eldorado.tu-dortmund.de/handle/2003/40484.
  129. 129. Munsky B, Neuert G, van Oudenaarden A. Using Gene Expression Noise to Understand Gene Regulation. Science. 2012;336(6078):183–187. pmid:22499939
  130. 130. Aguilera LU, Raymond W, Fox ZR, May M, Djokic E, Morisaki T, et al. Computational design and interpretation of single-RNA translation experiments. PLOS Computational Biology. 2019;15(10):e1007425. pmid:31618265
  131. 131. Maini PK, Baker RE. Developmental Biology: Mathematical Modelling of Development. In: John Wiley & Sons, Ltd, editor. eLS. Chichester, UK: John Wiley & Sons, Ltd; 2012. p. a0001067. Available from: http://doi.wiley.com/10.1002/9780470015902.a0001067.
  132. 132. Massonis G, Villaverde AF, Banga JR. Distilling identifiable and interpretable dynamic models from biological data; 2023. Available from: https://www.biorxiv.org/content/10.1101/2023.03.13.532340v2.
  133. 133. Pessoa P, Schweiger M, Sgouralis I, Pressé S. Accelerating likelihood calculations for biochemical network discovery. Biophysical Journal. 2023;122(3):539a.
  134. 134. Pang Y, Liang J. Probability landscape of a stochastic model of gene expression in single cells through exact solution of chemical master equation. Biophysical Journal. 2023;122(3):539a.
  135. 135. Wang S, Fan K, Luo N, Cao Y, Wu F, Zhang C, et al. Massive computational acceleration by using neural networks to emulate mechanism-based biological models. Nature Communications. 2019;10(1):4354. pmid:31554788
  136. 136. Bonnaffoux A, Herbach U, Richard A, Guillemin A, Gonin-Giraud S, Gros PA, et al. WASABI: a dynamic iterative framework for gene regulatory network inference. BMC Bioinformatics. 2019;20(1):220. pmid:31046682
  137. 137. Tejero-Cantero A, Boelts J, Deistler M, Lueckmann JM, Durkan C, Gonçalves PJ, et al. sbi: A toolkit for simulation-based inference. Journal of Open Source Software. 2020;5(52):2505.
  138. 138. Urban DJ, Roth BL. DREADDs (Designer Receptors Exclusively Activated by Designer Drugs): Chemogenetic Tools with Therapeutic Utility. Annual Review of Pharmacology and Toxicology. 2015;55(Volume 55, 2015):399–417. pmid:25292433
  139. 139. Nies VJM, Sancar G, Liu W, van Zutphen T, Struik D, Yu RT, et al. Fibroblast Growth Factor Signaling in Metabolic Regulation. Frontiers in Endocrinology. 2016;6. pmid:26834701
  140. 140. Karl K, Del Piccolo N, Light T, Roy T, Dudeja P, Ursachi VC, et al. Ligand bias underlies differential signaling of multiple FGFs via FGFR1. eLife. 2024;12:RP88144. pmid:38568193
  141. 141. Torregrosa G, Garcia-Ojalvo J. Mechanistic models of cell-fate transitions from single-cell data. Current Opinion in Systems Biology. 2021;26:79–86.
  142. 142. Andersen T, Newman R, Otter T. Shape Homeostasis in Virtual Embryos. Artificial Life. 2009;15(2):161–183. pmid:19199386
  143. 143. Shahbazi MN. Mechanisms of human embryo development: from cell fate to tissue shape and back. Development. 2020;147(14):dev190629. pmid:32680920
  144. 144. Dingeldein L, Silva-Sánchez D, Dimprima E, Grigorieff N, Covino R, Cossio P. Amortized identification of biomolecular conformations in Cryo-EM using simulation-based inference. Biophysical Journal. 2024;123(3):282a.
  145. 145. Beck M, Covino R, Hänelt I, Müller-McNicoll M. Understanding the cell: Future views of structural biology. Cell. 2024;187(3):545–562. pmid:38306981
  146. 146. Deistler M, Macke JH, Gonçalves PJ. Energy-efficient network activity from disparate circuit parameters. Proceedings of the National Academy of Sciences. 2022;119(44):e2207632119. pmid:36279461
  147. 147. Kaiser J, Stock R, Müller E, Schemmel J, Schmitt S. Simulation-based Inference for Model Parameterization on Analog Neuromorphic Hardware; 2023. Available from: http://arxiv.org/abs/2303.16056.
  148. 148. Cockburn K, Rossant J. Making the blastocyst: lessons from the mouse. The Journal of Clinical Investigation. 2010;120(4):995–1003. pmid:20364097
  149. 149. Maamar H, Raj A, Dubnau D. Noise in Gene Expression Determines Cell Fate in Bacillus subtilis. Science. 2007;317(5837):526–529. pmid:17569828
  150. 150. Mugler A, Kittisopikul M, Hayden L, Liu J, Wiggins CH, Süel GM, et al. Noise Expands the Response Range of the Bacillus subtilis Competence Circuit. PLOS Computational Biology. 2016;12(3):e1004793. pmid:27003682
  151. 151. Wang Q, Zhang X, Yang Y. Effects of noise and harmonic excitation on the growth of Bacillus subtilis biofilm. Biosystems. 2021;201:104329. pmid:33359276
  152. 152. Gruenheit N, Parkinson K, Brimson CA, Kuwana S, Johnson EJ, Nagayama K, et al. Cell Cycle Heterogeneity Can Generate Robust Cell Type Proportioning. Developmental Cell. 2018;47(4):494–508.e4. pmid:30473004
  153. 153. Kar S, Baumann WT, Paul MR, Tyson JJ. Exploring the roles of noise in the eukaryotic cell cycle. Proceedings of the National Academy of Sciences of the United States of America. 2009;106(16):6471–6476. pmid:19246388
  154. 154. Ng KK, Yui MA, Mehta A, Siu S, Irwin B, Pease S, et al. A stochastic epigenetic switch controls the dynamics of T-cell lineage commitment. eLife. 2018;7:e37851. pmid:30457103
  155. 155. Sherman MS, Lorenz K, Lanier MH, Cohen BA. Cell-to-Cell Variability in the Propensity to Transcribe Explains Correlated Fluctuations in Gene Expression. Cell Systems. 2015;1(5):315–325. pmid:26623441
  156. 156. Justman QA. An Explicit Source for Extrinsic Noise. Cell Systems. 2015;1(5):308–309. pmid:27136238
  157. 157. de Jong TV, Moshkin YM, Guryev V. Gene expression variability: the other dimension in transcriptome analysis. Physiological Genomics. 2019;51(5):145–158. pmid:30875273
  158. 158. Bartz J, Jung H, Wasiluk K, Zhang L, Dong X. Progress in Discovering Transcriptional Noise in Aging. International Journal of Molecular Sciences. 2023;24(4):3701. pmid:36835113
  159. 159. Gillespie DT. Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry. 1977;81(25):2340–2361.
  160. 160. Elf J, Ehrenberg M. Fast Evaluation of Fluctuations in Biochemical Networks With the Linear Noise Approximation. Genome Research. 2003;13(11):2475–2484. pmid:14597656
  161. 161. Munsky B, Khammash M. A multiple time interval finite state projection algorithm for the solution to the chemical master equation. Journal of Computational Physics. 2007;226(1):818–835.
  162. 162. Anderson DF. A modified next reaction method for simulating chemical systems with time dependent propensities and delays. The Journal of Chemical Physics. 2007;127(21):214107. pmid:18067349
  163. 163. Gillespie DT, Hellander A, Petzold LR. Perspective: Stochastic algorithms for chemical kinetics. The Journal of Chemical Physics. 2013;138(17):170901. pmid:23656106
  164. 164. Simoni G, Reali F, Priami C, Marchetti L. Stochastic simulation algorithms for computational systems biology: Exact, approximate, and hybrid methods. Wiley Interdisciplinary Reviews Systems Biology and Medicine. 2019;11(6):e1459. pmid:31260191
  165. 165. Erban R, Chapman SJ. Stochastic Modelling of Reaction–Diffusion Processes. Cambridge Texts in Applied Mathematics. Cambridge: Cambridge University Press; 2020. Available from: https://www.cambridge.org/core/books/stochastic-modelling-of-reactiondiffusion-processes/9BB8B46DE0B898FC019AFBEA95608FAE.
  166. 166. Gupta A, Schwab C, Khammash M. DeepCME: A deep learning framework for computing solution statistics of the chemical master equation. PLOS Computational Biology. 2021;17(12):e1009623. pmid:34879062
  167. 167. Coulier A, Hellander S, Hellander A. A multiscale compartment-based model of stochastic gene regulatory networks using hitting-time analysis. The Journal of Chemical Physics. 2021;154(18):184105. pmid:34241042
  168. 168. Gillespie DT. A rigorous derivation of the chemical master equation. Physica A: Statistical Mechanics and its Applications. 1992;188(1):404–425.
  169. 169. Andrews SS, Addy NJ, Brent R, Arkin AP. Detailed Simulations of Cell Biology with Smoldyn 2.1. PLOS Computational Biology. 2010;6(3):e1000705. pmid:20300644
  170. 170. Gupta S, Czech J, Kuczewski R, Bartol TM, Sejnowski TJ, Lee REC, et al. Spatial Stochastic Modeling with MCell and CellBlender; 2018. Available from: http://arxiv.org/abs/1810.00499.
  171. 171. Engblom S. Stochastic Simulation of Pattern Formation in Growing Tissue: A Multilevel Approach. Bulletin of Mathematical Biology. 2019;81(8):3010–3023. pmid:29926381
  172. 172. Sokolowski TR, Paijmans J, Bossen L, Miedema T, Wehrens M, Becker NB, et al. eGFRD in all dimensions. The Journal of Chemical Physics. 2019;150(5):054108. pmid:30736681
  173. 173. Hellander S, Hellander A, Petzold L. Reaction-diffusion master equation in the microscopic limit. Physical Review E. 2012;85(4):042901. pmid:22680526
  174. 174. Barrows D, Ilie S. Parameter estimation for the reaction–diffusion master equation. AIP Advances. 2023;13(6):065318.
  175. 175. Fange D, Berg OG, Sjöberg P, Elf J. Stochastic reaction-diffusion kinetics in the microscopic limit. Proceedings of the National Academy of Sciences. 2010;107(46):19820–19825. pmid:21041672
  176. 176. Plusa B, Piliszek A, Frankenberg S, Artus J, Hadjantonakis AK. Distinct sequential cell behaviours direct primitive endoderm formation in the mouse blastocyst. Development. 2008;135(18):3081–3091. pmid:18725515
  177. 177. Molotkov A, Soriano P. Distinct mechanisms for PDGF and FGF signaling in primitive endoderm development. Developmental Biology. 2018;442(1):155–161. pmid:30026121
  178. 178. Simon CS, Rahman S, Raina D, Schröter C, Hadjantonakis AK. Live Visualization of ERK Activity in the Mouse Blastocyst Reveals Lineage-Specific Signaling Dynamics. Developmental Cell. 2020;55(3):341–353.e5. pmid:33091370
  179. 179. Liu X, Yao Y, Ding H, Han C, Chen Y, Zhang Y, et al. USP21 deubiquitylates Nanog to regulate protein stability and stem cell pluripotency. Signal Transduction and Targeted Therapy. 2016;1(1):1–10. pmid:29263902
  180. 180. Ornitz DM, Itoh N. The Fibroblast Growth Factor signaling pathway. WIREs Developmental Biology. 2015;4(3):215–266. pmid:25772309
  181. 181. Ornitz DM, Itoh N. New developments in the biology of fibroblast growth factors. WIREs Mechanisms of Disease. 2022;14(4):e1549. pmid:35142107
  182. 182. Lavoie H, Gagnon J, Therrien M. ERK signalling: a master regulator of cell behaviour, life and fate. Nature Reviews Molecular Cell Biology. 2020;21(10):607–632. pmid:32576977
  183. 183. Aiken CEM, Swoboda PPL, Skepper JN, Johnson MH. The direct measurement of embryogenic volume and nucleo-cytoplasmic ratio during mouse pre-implantation development. Reproduction. 2004;128(5):527–535. pmid:15509698
  184. 184. van Zon JS, Morelli MJ, Tănase-Nicola S, ten Wolde PR. Diffusion of Transcription Factors Can Drastically Enhance the Noise in Gene Expression. Biophysical Journal. 2006;91(12):4350–4367. pmid:17012327
  185. 185. Vijaykumar A, Bolhuis PG, Wolde PRt. The intrinsic rate constants in diffusion-influenced reactions. Faraday Discussions. 2017;195(0):421–441.
  186. 186. Milo R, Jorgensen P, Moran U, Weber G, Springer M. BioNumbers—the database of key numbers in molecular and cell biology. Nucleic Acids Research. 2010;38(suppl_1):D750–D753. pmid:19854939
  187. 187. Coulier A, Singh P, Sturrock M, Hellander A. Systematic comparison of modeling fidelity levels and parameter inference settings applied to negative feedback gene regulation. PLOS Computational Biology. 2022;18(12):e1010683. pmid:36520957
  188. 188. Frank SA. Input-output relations in biological systems: measurement, information and the Hill equation. Biology Direct. 2013;8(1):31. pmid:24308849
  189. 189. Chen M, Li F, Wang S, Cao Y. Stochastic modeling and simulation of reaction-diffusion system with Hill function dynamics. BMC Systems Biology. 2017;11(3):21. pmid:28361679
  190. 190. Bottani S, Veitia RA. Hill function-based models of transcriptional switches: impact of specific, nonspecific, functional and nonfunctional binding. Biological Reviews. 2017;92(2):953–963. pmid:27061969
  191. 191. Feigelman J. Stochastic and deterministic methods for the analysis of Nanog dynamics in mouse embryonic stem cells [PhD Thesis]. Technische Universität München; 2016. Available from: https://mediatum.ub.tum.de/1279519.
  192. 192. Skinner SO, Xu H, Nagarkar-Jaiswal S, Freire PR, Zwaka TP, Golding I. Single-cell analysis of transcription kinetics across the cell cycle. eLife. 2016;5:e12175. pmid:26824388
  193. 193. Tan FE, Elowitz MB. Brf1 posttranscriptionally regulates pluripotency and differentiation responses downstream of Erk MAP kinase. Proceedings of the National Academy of Sciences. 2014;111(17):E1740–E1748. pmid:24733888
  194. 194. Elatmani H, Dormoy-Raclet V, Dubus P, Dautry F, Chazaud C, Jacquemin-Sablon H. The RNA-Binding Protein Unr Prevents Mouse Embryonic Stem Cells Differentiation Toward the Primitive Endoderm Lineage. Stem Cells. 2011;29(10):1504–1516. pmid:21954113
  195. 195. Chitforoushzadeh Z, Ye Z, Sheng Z, LaRue S, Fry RC, Lauffenburger DA, et al. TNF-insulin crosstalk at the transcription factor GATA6 is revealed by a model that links signaling and transcriptomic data tensors. Science signaling. 2016;9(431):ra59. pmid:27273097
  196. 196. Fujioka A, Terai K, Itoh RE, Aoki K, Nakamura T, Kuroda S, et al. Dynamics of the Ras/ERK MAPK Cascade as Monitored by Fluorescent Probes*. Journal of Biological Chemistry. 2006;281(13):8917–8926. pmid:16418172
  197. 197. Tian T, Song J. Mathematical Modelling of the MAP Kinase Pathway Using Proteomic Datasets. PLOS ONE. 2012;7(8):e42230. pmid:22905119
  198. 198. Aoki K, Takahashi K, Kaizu K, Matsuda M. A Quantitative Model of ERK MAP Kinase Phosphorylation in Crowded Media. Scientific Reports. 2013;3(1):1541. pmid:23528948
  199. 199. Buscà R, Pouysségur J, Lenormand P. ERK1 and ERK2 Map Kinases: Specific Roles or Functional Redundancy? Frontiers in Cell and Developmental Biology. 2016;4. pmid:27376062
  200. 200. Saba-El-Leil MK, Frémin C, Meloche S. Redundancy in the World of MAP Kinases: All for One. Frontiers in Cell and Developmental Biology. 2016;4. pmid:27446918
  201. 201. Zoller B, Little SC, Gregor T. Diverse Spatial Expression Patterns Emerge from Unified Kinetics of Transcriptional Bursting. Cell. 2018;175(3):835–847.e25. pmid:30340044
  202. 202. Abranches E, Bekman E, Henrique D. Generation and Characterization of a Novel Mouse Embryonic Stem Cell Line with a Dynamic Reporter of Nanog Expression. PLOS ONE. 2013;8(3):e59928. pmid:23527287
  203. 203. Wu J, Tzanakakis ES. Distinct Allelic Patterns of Nanog Expression Impart Embryonic Stem Cell Population Heterogeneity. PLOS Computational Biology. 2013;9(7):e1003140. pmid:23874182
  204. 204. Bates LE, Alves MRP, Silva JCR. Auxin-degron system identifies immediate mechanisms of OCT4. Stem Cell Reports. 2021;16(7):1818–1831. pmid:34143975
  205. 205. Ding I, Peterson AM. Half-life modeling of basic fibroblast growth factor released from growth factor-eluting polyelectrolyte multilayers. Scientific Reports. 2021;11(1):9808. pmid:33963247
  206. 206. Daneshpour H, van den Bersselaar P, Chao CH, Fazzio TG, Youk H. Macroscopic quorum sensing sustains differentiating embryonic stem cells. Nature Chemical Biology. 2023;19(5):596–606. pmid:36635563
  207. 207. Sarabipour S, Hristova K. Mechanism of FGF receptor dimerization and activation. Nature Communications. 2016;7(1):10262. pmid:26725515
  208. 208. Lanner F, Rossant J. The role of FGF/Erk signaling in pluripotent cells. Development. 2010;137(20):3351–3360. pmid:20876656
  209. 209. Azami T, Bassalert C, Allègre N, Valverde Estrella L, Pouchin P, Ema M, et al. Regulation of the ERK signalling pathway in the developing mouse blastocyst. Development. 2019;146(14):dev177139. pmid:31320324
  210. 210. Grebenkov DS, Metzler R, Oshanin G. Full distribution of first exit times in the narrow escape problem. New Journal of Physics. 2019;21(12):122001.
  211. 211. Grebenkov DS, Metzler R, Oshanin G. Distribution of first-reaction times with target regions on boundaries of shell-like domains. New Journal of Physics. 2021;23(12):123049.
  212. 212. Prescott TP, Warne DJ, Baker RE. Efficient multifidelity likelihood-free Bayesian inference with adaptive computational resource allocation. Journal of Computational Physics. 2024;496:112577.