Stochasticity and the Molecular Mechanisms of Induced Pluripotency

The generation of induced pluripotent stem cells from adult somatic cells by ectopic expression of key transcription factors holds significant medical promise. However, current techniques for inducing pluripotency rely on viral infection and are therefore not, at present, viable within a clinical setting. Thus, there is now a need to better understand the molecular basis of stem cell pluripotency and lineage specification in order to investigate alternative methods to induce pluripotency for clinical application. However, the complexity of the underlying molecular circuitry makes this a conceptually difficult task. In order to address these issues, we considered a computational model of transcriptional control of cell fate specification. The model comprises two mutually interacting sub-circuits: a central pluripotency circuit consisting of interactions between stem-cell specific transcription factors OCT4, SOX2 and NANOG coupled to a differentiation circuit consisting of interactions between lineage-specifying master genes. The molecular switches which arise from feedback loops within these circuits give rise to a well-defined sequence of successive gene restrictions corresponding to a controlled differentiation cascade in response to environmental stimuli. Furthermore, we found that this differentiation cascade is strongly unidirectional: once silenced, core transcription factors cannot easily be reactivated. In the context of induced pluripotency, this indicates that differentiated cells are robustly resistant to reprogramming to a more primitive state. However, our model suggests that under certain circumstances, amplification of low-level fluctuations in transcriptional status (transcriptional “noise”) may be sufficient to trigger reactivation of the core pluripotency switch and reprogramming to a pluripotent state. This interpretation offers an explanation of a number of experimental observations concerning the molecular mechanisms of cellular reprogramming by defined factors and suggests a role for stochasticity in reprogramming of somatic cells to pluripotency.

The TGF-β signaling pathway is central to control of chondrogenesis and adipogenesis via SOX9 and PPAR-γ.
For example, TGF-β stimulates chondrogenesis through SMAD3 upregulation of SOX9 [23] while simultaneously inhibiting adipogenesis by repressing C/EBP transactivation function also via SMAD3 [24]. However full details of how SOX9 and PPAR-γ expression are balanced during chondrogenesis and adipogenesis are currently lacking.
Therefore, in order to complete the network we reasoned that SOX9 increases sensitivity to TGF-β signaling, thus introducing autoactivation of SOX9 via TGF-β and SMAD3 and cross-repression of PPAR-γ by SOX9 via TGF-β, SMAD3 and C/EBP.
Taken together these interactions form the regulatory network given in Fig. 1 of the main text. This network may be coarse-grained by observing that each master-gene either directly or indirectly autoactivates its own expression, while cross-repressing those of the other two. We note here that since positive feedback loops can be destabilizing, it is likely that alternative negative feedback mechanisms are also present to balance the autoactivation loops. So, for example, both in mice and rats RUNX2 directly down-regulates its own promoter activity [25] indicating that there is a competition between auto-upregulation (via BMP2 and TAZ ) and direct auto-downregulation of RUNX2.
We anticipate that analogous negative feedback controls are present during chondrogenesis and adipogenesis and act to stabilize SOX9 and PPAR-γ expression (for more details see model derivation below).

Derivation of Model Equations
Here we derive the computational model based upon the integrated transcriptional logic given in Fig. 2 in the main text. The core of our model is derived from the following stoichiometric equations which describe transcription factor binding to gene promoters: In total there are 32 stoichiometric equations, and we have used an index notation for simplicity of nomenclature: P i for i = 1, 2, 3 represent the three pluripotency genes (PGs), with P 1 = OCT4, P 2 = SOX2 and P 3 = NANOG.
Similarly, L i , for i = 1, 2, 3 represent the three lineage-specifying master genes (LSMGs) with L 1 = RUNX2, L 2 = SOX9 and L 3 = PPAR-γ. We assume that OCT4, SOX2 and NANOG proteins associate to form an OCT4 -SOX2 complex, represented by C 1 , and an OCT4 -SOX2 -NANOG complex, represented by C 2 . Free binding sites in the promoters of the PGs are denoted F Pi , while free binding sites in the promoters of the LSMGs are denoted F Li .
Bound sites in the promoters of the PGs are denoted B PiY , where Y ∈ {C j , L j } depending on whether the site is bound by one of the pluripotency complexes or a LSMG respectively. Similarly, bound sites in the promoters of the LSMGs are denoted B LiY . The parameters k C1 , k C2 and are the equilibrium dissociation constants for the respective reactions. Here we have assumed that the LSMGs bind co-operatively to each others promoters with binding site affinity 2. This assumption has been taken in other computational models of transcription factor binding [26], and ensures that the resulting feedback loops are nonlinear, yet also keeps the resulting mathematics transparent. In order to determine the effect of binding site affinity on model solutions, we conducted extensive numerical simulations using a range of other binding affinities.
We found that the qualitative conclusions of the model are not significantly affected by variations in transcriptional binding site affinity.
Assuming the reactions are in quasi-steady state gives the equations: We can now determine T Pi the total number of binding sites in the promoter regions of each of the PGs: Total Sites = Free Sites + Bound Sites (1) and T Li the total number of binding sites in the promoter regions of each of the LSMGs: Total Sites = Free Sites + Bound Sites (5) These equations may be rearranged to give the total number of free sites in the promoters of the PGs and LSMGs: The extended transcriptional network we have derived suggests that competition between signaling pathways ensures that generically each master-gene regulates its own promoter in a biphasic manner: at low nuclear concentrations of transcription factor gene expression is activated upon binding; while at high nuclear concentrations, gene expression is repressed by transcriptional binding. This biphasic relationship between transcription factor concentration and gene expression ensures that autoactivating feedback loops are tightly controlled, and has been experimentally observed in mesenchymal systems. For example, the trans-acting factor protein AP2 regulates both cartilagederived retinoic acid-sensitive protein (CD-RAP ) and insulin-like growth factor binding protein-5 Gene (IGFBP5 ) expression in a biphasic manner [27,28], possibly by transcriptional self-interference in which AP2 protein molecules interact with putative co-factors to interfere with AP2 regulation of gene expression [29].
In order to capture the essential features of biphasic regulation in a tractable manner, we assumed that the protein product of each LSMG is inactivated by binding to putative co-factor(s), and that binding to co-factors is increased at high concentrations possibly due to co-regulation of cofactor expression or post-transcriptional modification of LSMG products. Letting [L iA ] denote the nuclear concentration of the activating form of L i we make the phenomenological hypothesis: where m is a constant in the range 0 ≤ m ≤ 1. Note that this law ensures that We assume that the rate of production of the protein product of each gene is proportional to the number of binding sites in its promoter bound by an activator. Thus, Rate of Production of Here we have assumed that the OCT4 -SOX2 and OCT4 -SOX2 -NANOG complexes are always in an activating form. Substituting Eqs. (9-10) into the above equations, and assuming that the protein products decay at constant rates b Pi and b Li we obtain the following set of nonlinear ordinary differential equations to describe the concentrations of the various species Here A i and B i are constants of proportionality which incorporate T Pi and T Li . Thus we have adopted Michaelis-Menten type kinetics with generalized Hill coefficients [30].
In order to clarify analysis, we make the following biologically reasonable simplifying assumptions: (1) Binding of the OCT4 -SOX2 and OCT4 -SOX2 -NANOG complexes to the promoters of the PGs and LSMGs occurs at the same rate. Thus, we take (2) Binding of the protein-products of all LSMGs to the promoters of the PGs occurs at the same rate for all PGs. Thus, we Binding of the protein product of all LSMGs to their own promoter occurs at the same rate. Thus, we take K LiLi = K L for all i. (4) Binding of the protein product of LSMG i to the promoter of LSMG j = i occurs at a base rate independent of i and j (we shall consider how external stimuli affect this base binding rate in more detail below). Thus, we take K LiLj = K i LL for all i, j = i. (5) The constants of proportionality relating rate of production of the protein product to base rate of activator binding to the promoter is the same for all PGs and the same for all LSMGs. Thus, we take A i = A and B i K L =k 2i for all i (we shall consider how external stimuli affect these base binding rates in more detail below). (6) All protein products decay at the same constant rate. Thus, we take b Pi and b Li = β for all i.
Additionally, we nondimensionalise by taking the following scalings: where hatted variables represent nondimensional quantities.
Thus, we get where , and we have dropped hats for notational simplicity.
In order to examine cellular differentiation, we assume that the external environment contains specific elements Similarly, since RA suppresses OCT4, SOX2 and NANOG expression [32], the three lineage-specific stimuli all also suppress expression of the core PGs.
Thus, we assumed that the rate of binding of activators to PG promoters is inversely proportional to the external concentration of nonspecific differentiation stimuli; while the rate of binding activators to LSMG promoters is proportional to the external concentration of specific and nonspecific differentiation stimuli. Consequently, we allowedk 1i ,k 2i andk i LL to depend upon external stimuli and assumed that all other model parameters were constant. For the sake of simplicity, we assumed that stimulatory factors are not produced or degraded, but rather are present in the extracellular environment at a known concentration, as would be the case during an in vitro experiment. Specifically, we tookk Here k 0 , k 1i , k 2 , k 3 and k LL are constants and s i represents the combination of growth factors which stimulates differentiation along the ith lineage. For simplicity we have assumed that stimulation of the ith lineage by the jth stimulus occurs at the same rate for all i = j, and that the PGs are equally suppressed by all three stimuli.
The model has the following 11 free parameters: k 1i for i = 1, 2, 3 (k 11 = k 12 ) represent the relative base rates of binding of the OCT4 -SOX2 and OCT4 -SOX2 -NANOG complexes to the PG promoters; k 0 represents the sensitivity the PGs to exogenous inhibition with RA; K P L represents the relative rate of binding of the LSMGs to the promoters of the PGs; k 2 represents the sensitivity the LSMGs to exogenous activation by lineage-specific stimulus; k 3 represents the relative sensitivity of the LSMGs to nonspecific stimulus; k LC1 represents the relative rate of binding of the OCT4 -SOX2 complex to the promoters of the LSMGs; k LC2 represents the relative rate of binding of the OCT4 -SOX2 -NANOG complex to the promoters of the LSMGs; K LL represents the relative rate of binding of LSMG i to the promoters of LSMG j = i; b represents the relative half-life of the protein products of the LSMGs and PGs; and 0 < m ≤ 1 gives the relative degree of transcriptional self-interference for each of the LSMGs and characterizes the biphasic form of the autoactivating feedback loops.

Mathematical Details
We include here a mathematical derivation of the existence and stability of steady-state solutions to a simplified system which captures the essential secondary bifurcation behavior of Eqs. (21)- (22). In particular, we assume that the pluripotency switch is off ([P i ] = 0 for all i) and consider the system: where a is a generic differentiation stimulus. We shall consider the general case of N distinct lineages (i = 1 . . . N ) and we have simplified notation by denoting [L i ] by x i . In general 0 ≤ m ≤ 1. We shall discuss solutions in the limiting case m = 1 and indicate how this limiting case illuminates the more general solution structure.
The equations of interest are: In general Eqs. (24) may possess steady-state solutions in which genes are transcribed at various unequal (but constant) rates. Let α i be the steady-state expression level of gene i where α i is a solution to Upon rearrangement this gives α i = 0 or for all i = j. Thus α i and α j satisfy the same quadratic equation. Therefore, the set of active genes can be partitioned into at most 2 distinct blocks, with transcriptional levels α and β respectively, where from Eq. (25) Thus generic steady-state solutions to Eqs. (24) have the form: x * = (α, α, . . . , α where p + q + r = N . The Jacobian J N (x * ) of the system at x * is the N × N matrix with block-diagonal form: where J pq is the (p + q) × (p + q) Jacobian of the system projected onto X ∼ = R p+q , the subspace spanned by {e i : i = 1, 2, . . . , p + q}, and J r is the r × r Jacobian of the system projected onto Y ∼ = R r , the subspace spanned by {e i : i = p + q + 1, p + q + 2, . . . , N }, where {e i } is the standard basis of R N . The eigenvalues of J N (x * ) are therefore the eigenvalues of J pq along with those of J r .
Since x = 0 for all x ∈ Y , the Jacobian J r is diagonal: where p + r = N .
Notice that the Jacobian at x * still has the diagonal form given in Eq. (27) with q ≡ 0. Thus the stability of all remaining steady-state solutions in the N dimensional system can be determined by consideration of homogeneous all-on solutions projected onto X for varying dimension p. In other words, once a gene has been switched off we can (as far as stability calculations are concerned) consider it removed from the system, and examine the behavior of the system in N − 1 dimensions.
Consider now the general case in which p genes are expressed at the same level. Steady-state solutions then have the form x * = (α, α . . . , α), where α is a solution to: Thus, For p ≥ 2, both branches of Eq. (32) are non-negative and real when .
Since a > 0, we also require that b ≤ 1 2 for existence. For p = 1, we require only b ≤ 1 2 for existence.
We turn our attention now to the stability of the homogeneous steady-state solution x * which is (linearly) stable if Re(λ) < 0 for all eigenvalues λ of the Jacobian J p (x * ). Since matrix representation of elements of the symmetric group S p commute with the Jacobian of the system projected onto X, the steady-state x * = (α, α, . . . , α) has only 2 distinct eigenvalues: λ 1 and λ 2 , where λ 2 has multiplicity p − 1, and Upon differentiating and simplifying using Eq. (31) we get: For a, b, α and p positive, λ 1 < λ 2 so x * is stable for λ 2 < 0. Thus, both branches of solutions in Eq. (32) are unstable for a ≥ 1.
For a < 1, x * is stable when Substituting the lower branch of solutions from Eq. (32) into Eq. (37) and rearranging gives stability for the lower branch when So the lower branch of solutions is unstable for all real positive a. Similarly, the upper branch of solutions is stable with bifurcation occurring at equality, when The positive root of this equation is always negative. Since a is necessarily positive there is only one critical value at which the upper branch of solutions loses stability: Solutions of the form given in Eq. (30) are therefore stable for 0 ≤ a < a p , and unstable for a ≥ a p .
Notice that these solutions come in 'groups': 3 solutions have only one active gene (p = 1); 3 solutions have 2 active genes (p = 2), and one solution has all three genes active (p = 3). This grouping of equilibria results from the symmetry of Eqs. (24). These 'groups' of equilibria are formally known as orbits. In this system, the orbit of a fixed-point solution x * p ∈ R N is the set Thus, fixed-point solutions to Eqs. (24) come in S N -orbits. The symmetry structure of Eqs. (24) ensures that all fixed-point solutions in the same orbit have the same stability. It is therefore convenient to consider orbits of steady-state solutions rather than individual solutions per se. We say that the orbit ∆(x * p ) is stable when x * p is stable. In the three dimensional case, the system given by Eqs. (24) has 3 distinct orbits of steady-state solutions: The first orbit represents the set of steady-state solutions in which one gene is active; the second orbit, the set of steady-state solutions in which 2 genes are active; and the third orbit the steady-state solution in which all three genes are active. Notice that these orbits order the steady-state solutions into a well-defined hierarchy: ∆ i is the set of solutions in which i genes are active.
Consider a cell in which all N master-genes are concurrently homogeneously expressed (that is, a steady-state solution in ∆ N ). This configuration is stable as long as 0 ≤ a < a N . However, as a is increased to a N , this all-on configuration loses stability and the cell is forced to adopt an alternative genetic state. Extensive numerical simulations suggest that this system does not admit any other more exotic solutions such as limit cycles, or chaotic trajectories. Thus, the cell has to adopt an alternative steady-state genetic configuration. S N -equivariant bifurcation theory suggests that heterogeneous configurations do not bifurcate continuously from this homogeneous state, but rather that the system exhibits a 'jump' bifurcation, and transitions rapidly to a disparate configuration [33,34]. Since a i < a i−1 for all 3 ≤ i ≤ N , at the bifurcation point a = a N , all orbits with less than N genes active are still stable. Thus, at bifurcation the system jumps to a configuration in an orbit with at most N − 1 genes active, and the cell is forced to move down a level in the orbit hierarchy. Heteroclinic connections between steady-state solutions in different orbits may be thought of as lineage-restricting differentiation events. Numerical simulations suggest that at the bifurcation point, the transition from one orbit to the next occurs rapidly and heteroclinic connections to all lower orbits exist. However we find that generically for a cell in orbit ∆ i a connection from ∆ i to ∆ i−1 is established and the cell moves only one step down the orbit hierarchy (applying a perturbation which respects an appropriate subgroup of S N can induce a connection from ∆ i to ∆ i−j for all 1 < j ≤ i − 1, however we view this as a degenerate situation). Thus, generically, at a = a N the cell moves from a steady-state in which N genes are active to a steady-state in which N − 1 genes are active and the last gene is switched off.
For a < a N −1 this state is stable, however as a is increased further to a N −1 this state too loses stability and the cell again moves to a lower point in orbit hierarchy exactly as before. Since the sequence of bifurcation points is monotonic decreasing, movement back up the orbit hierarchy (that is, bifurcation from ∆ i to ∆ j for j > i) is not possible. Thus, deterministic bifurcation through the orbit hierarchy is rigid and one-way.
We may view the orbit hierarchy in phase-space as corresponding to a well-defined biological hierarchy as follows.
While cell types may be associated with attractors of genetic regulatory systems [35], stable orbits of steady-states naturally associate with stages of differentiation. As an example, consider differentiation along the mesenchymal lineages under the control of RUNX2, SOX9 and PPAR-γ. When ∆ 3 is stable it represents stable genetic configurations in which all 3 transcription factors are active and bifurcation to orbits ∆ 2 or ∆ 1 is possible. Thus, this orbit may be associated with a tripotent tissue-specific stem cell. Similarly, as a is increased bifurcation from ∆ 2 to ∆ 1 is possible (but not back to ∆ 3 ) and stable steady-states in ∆ 2 represent progenitor cells with bipotent differentiation potential. No bifurcation from ∆ 1 back up the hierarchy is possible as a is increased, and stable solutions in ∆ 1 therefore represent terminally differentiated cells in which only one master gene is active.
In the context of equivariant bifurcation theory, the process by which a system moves from a more to a less symmetric state is known as spontaneous symmetry breaking [34]. Therefore, we interpret cellular differentiation from stem-through progenitor-to differentiated cell-types to occur through a cascade of spontaneous symmetrybreaking bifurcations.