Multiplexing information flow through dynamic signalling systems

We consider how a signalling system can act as an information hub by multiplexing information arising from multiple signals. We formally define multiplexing, mathematically characterise which systems can multiplex and how well they can do it. While the results of this paper are theoretical, to motivate the idea of multiplexing, we provide experimental evidence that tentatively suggests that the NF-κB transcription factor can multiplex information about changes in multiple signals. We believe that our theoretical results may resolve the apparent paradox of how a system like NF-κB that regulates cell fate and inflammatory signalling in response to diverse stimuli can appear to have the low information carrying capacity suggested by recent studies on scalar signals. In carrying out our study, we introduce new methods for the analysis of large, nonlinear stochastic dynamic models, and develop computational algorithms that facilitate the calculation of fundamental constructs of information theory such as Kullback–Leibler divergences and sensitivity matrices, and link these methods to a new theory about multiplexing information. We show that many current models such as those of the NF-κB system cannot multiplex effectively and provide models that overcome this limitation using post-transcriptional modifications.


List of Figures
In the following, we provide technical details and various supplementary illustrations to those provided in the main paper that hereafter is referred as I. 1 Mathematical details and notation Transpose of a vector or matrix and inner product We denote a T and A T the transpose of the vector a ∈ R n and matrix A ∈ R n×n , respectively. We use the convention to write vectors as columns, i.e. a ∈ R n×1 , and therefore the transpose, a T , is a row vector, i.e. a T ∈ R 1×n . The inner or dot product of two vectors a, b ∈ R n is a T b = a 1 b 1 + · · · + a n b n .
The Kronecker product of an m × n matrix A = (a ij ) and an s × t matrix B is the ms × nt matrix A ⊗ B = (a ij B) ij . Then we have the following very useful equality: vec(ABC) = (C T ⊗ A)vec(B) (see [7]).
Probability density functions For a stochastic process {X(t), t ≥ 0} with continuous state space, we write the probability density function (pdf) at time t ≥ 0 as P (X(t)). The corresponding conditional pdf of X(t), t > t 0 , given a fixed initial state X(t 0 ) = x 0 or an initial distribution X(t 0 ) ∼ P 0 are respectively denoted as P (X(t)|X(t 0 ) = x 0 ) and P (X(t)|X(t 0 ) ∼ P 0 ). The indicator LN A, as in P LNA , is used in probability functions such as the above to denote that these are derived under the linear noise approximation (LNA).
Multivariate Normal distribution We write ζ ∼ M V N (m, V ) or P is M V N (m, V ) to mean that the random variable ζ or the probability distribution P , respectively, is multivariate normal with mean vector m and covariance matrix V .
Singular Value Decomposition (SVD) Singular Value Decomposition (SVD) gives a decomposition of the matrix A ∈ F m×k into a product of the form A = U DV T where U is a m × k column-orthogonal matrix (U T U = I k ), V is a k × k orthogonal matrix (V T V = V V T = I k ) and D = diag(σ 1 , . . . , σ k ) is a diagonal matrix. This is the version of SVD that is often called thin SVD (see [14] for more details). The SVD applies for matrices defined in arbitrary fields F, but henceforth we only consider real matrices, i.e. F = R. The ordered elements σ 1 ≥ · · · ≥ σ k are the singular values of A. We write the ith singular value σ i = σ i (A). We note that the columns V j , j = 1, . . . k of V form an orthonormal basis for R k . If the last n of the singular values are zero then the last n columns are an orthonormal basis for the kernel of A. If m < k then n ≥ k − m.
In what follows we will refer to the column vectors V j = (V 1j , . . . , V kj ) T and U j = (U 1j , . . . , U mj ) T , j = 1, 2, . . . , k as the right and left singular vectors respectively, for which AV j = σ j U j and U T j A = σ j V T j· , with V T j· the jth row of V .

QR-decomposition (QR)
The QR-decomposition with column pivoting produces a decomposition of any m × k matrix M as M P = QR where P is a permutation matrix, which is unitary (P T P = P P T = I), Q is an k × k orthogonal matrix with orthonormal columns Q j , j = 1, . . . , k, i.e. Q T j Q i = 0, if j = i and Q T i Q i = 1, and R is an upper-triangular m × k matrix. If the row rank of M is q then only the first q of these entries are positive.

Fisher Information, Sensitivity Matrix & Multiplexing Capacity
We next outline an approach to sensitivity analysis and multiplexing. The key point will be that the sensitivity matrix s, previously introduced in [13], can be used to characterise the ability of the system to multiplex. We consider the stochastic response R = (R 1 , . . . , R n ) T , that has probability distribution P S which depends on the signal S = (S 1 , S 2 , . . . , S s ) T ∈ S.

Fisher Information Matrix & KL Divergence
Let (S; r) = log p S (r) be the log-likelihood function of P S that has pdf p S (r), r ∈ R. The Fisher Information Matrix (FIM), I(S), of the probability distribution P S at S ∈ S is a symmetric positive-(semi)definite s × s matrix with entries Here E P S denotes the expectation function under the distribution P S , ∂ i the partial derivative with respect to the ith component S i evaluated at S ∈ S and ∂ 2 ij the corresponding second order derivative.
If the signals S i , i = 1, . . . , s, act through changing parameters θ j of the system (i.e. so that θ = θ(S)) and I θ the FIM with respect to θ = (θ 1 , . . . , θ m ) we have that where dθ(S) is the derivative of θ with respect to S at S ∈ S . This well-known property of the FIM follows from the chain rule for derivatives. The Kullback-Leibler (KL) divergence of two probability distributions P and Q with density functions p(x) and q(x), x ∈ X , is given by That is, the KL divergence D KL (P Q) is the expected value of the likelihood ratio λ(x) = log p(x)/q(x) with the expectation taken with respect to P with the usual conventions when p(x) = 0 or q(x) = 0. If P = P S+δS and Q = P S , then (see [3]), That is, the FIM is the hessian matrix of the above KL divergence at S ∈ S.

Information geometry
If the Fisher information matrix I(S 0 ), S 0 ∈ S, is positive definite, it can also be used to define a Riemannian metric over the space of probability distributions. Firstly, we use it to define a metric on the signal space S at S 0 by This inner product is interesting to us because, Therefore, assuming that I(S) depends in a C 1 fashion on S, if S and S are perturbations of S 0 If this is the case we say D KL (P S ||P S ) = 1 2 S − S , S − S S up to third order.

The multiplexing sensitivity matrix s
Since the FIM I = I(S 0 ) is symmetric and positive semi-definite its SVD is of the form V DV T where V is orthogonal and D is diagonal with entries σ 2 1 ≥ · · · ≥ σ 2 s ≥ 0. The sensitivity matrix s associated with I is s = D 1/2 V T . We explain the reason for this name below.

Multiplexing Capacity
Recall the definition of D depends upon the set of signals (S 1 , . . . , S k ), k ≤ s, being considered. If we need to stress this we will write D (i,S 0 ) Given any set of integers 1 ≤ i 1 < · · · < i r < s distinct from i consider the columns s i j of s. Denote by n(s i |s i 1 , . . . , s ir ) = n(i|i 1 , . . . , i r ) the unique vector n normal to the vectors s i 1 , . . . , s ir such that s i − n lies in the span of s i 1 , . . . , s ir . That is, In the case where 1 ≤ i 1 < · · · < i r < s consists of all j with j = i we use the notation n(s i |s j , j = i). We now prove that, up to third order terms, We first prove (4) for the case where i = s. Let s = QR be the QR decomposition of s. By definition n = n(s s |s j , j = s) = R ss Q s and therefore n = |R ss |. To check this, one can write n = i c i Q i and use (3) to derive the coefficients c j = 0, j < s and c s = R ss .
Then, see that I(S 0 ) = s T s = R T R = V DV T since Q T Q is the identity matrix. But, by (2), up to third order, where δS = S − S . Because R is upper triangular and δS s = S s the last entry of the vector RδS is R ss S s . It follows that, up to third order terms, where (RδS) j the jth entry of the vector RδS and the minimum is achieved when S j is such that (RδS) j = 0, for j = 1, . . . , s − 1. These S j 's can be found if we let R be the (s − 1) × (s − 1) upper triangular matrix obtained by deleting the last row and column from R and let R s be the (s − 1)-dimensional vector obtained by deleting the last element from the last column of R. Then, if δS is the (s − 1)-dimensional vector solving R δS = −R k and δS is the s-dimensional vector obtained by adding δS s to the end of δS we have that RδS 2 = |δS s | 2 . Therefore, up to third order terms, 2D (i,S 0 ) KL = R 2 ss . In case i < s, we can simply use a permutation matrix, P , such that the last column of the matrix sP is s i , and follow the same steps as above to show that, up to third order terms, 2D where R ss the bottom right entry of the R matrix that satisfies sP = QR. This proves (4).

Finding optimal multiplexing sets
In view of the above result we ask the following question, given a set of signals S 1 , . . . , S s , for which subsets S i 1 , . . . , S ir , r ≤ s, can we effectively multiplex in the sense that all the multiplexing capacities for S i 1 , . . . , S ir are above a given threshold.
We first choose the column pivoting matrix P * so that the diagonal entries, R jj , of the R matrix in the QR decomposition sP * = [s i 1 · · · s is ] = QR give the minimum multiplexing capacities MX(S i 1 , . . . , S i k ). That is, up to third order, This QR decomposition can be easily derived by an algorithm such as this provided in Table  A. The idea is to start with the full set of signals and use QR to find and remove from the current set of signals, say S k = {S i 1 , . . . , S i k }, k = s, s − 1, . . . , 2, the signal that gives the minimum multiplexing capacity min i 1 ,...,i l n(s i |s i l ∈ S k \ s i ) 2 . These multiplexing capacities can be derived by the following steps: (i) pivoting through an appropriate P the s i l columns, l = 1, . . . , k, to get sP with last column equal to s i , (ii) performing QR decomposition to get sP = QR and (iii) using that R 2 kk = n(s i |s i l ∈ S k \ s i ) 2 to compute the minimum D KL (see Table A). Note that this is a backward reduction QR decomposition as opposed to the standard (forward addition) column pivoting QR decomposition provided by software such as MATLAB [11].
It is easy to see that, as in the standard column pivoting QR decomposition, the diagonal entries R ii of R in sP * = QR are in decreasing order This is because, for l ≤ s, |R ll | = n(s i l |s 1 , . . . , s i l−1 ) ≤ n(s i l−1 |s 1 , . . . , s i l−2 , s i l ) ≤ n(s i l−1 |s i 1 , . . . , s i l−2 ) = |R l−1,l−1 |.
However, the backward reduction QR, unlike the forward addition QR, ensures that the diagonal entries of R can be used to derive the minimum multiplexing capacities as in (6).
Once the appropriate ordering of signals S i 1 , . . . , S is is identified by the backward reduction QR, we can perform a QR decomposition without pivoting to the sensitivity matrix s * = [s i 1 . . . s is ] = QR and by examining the diagonal entries of R, we can see which signals S i 1 , S i 2 , . . . , S i k , k = 1, . . . , s can multiplex before the multiplexing capacity MX(S i 1 , S i 2 , . . . , S i k ) = R 2 kk /2 falls below a threshold. Equivalently, one can start from the full set of signals S 1 , . . . , S s and find the minimum subset of signals, S i s−r+1 , . . . , S is , r ≥ 0, that needs to be removed (i.e. keep unperturbed) to derive a multiplexing subset, i.e. a subset that has MX(S i 1 , S i 2 , . . . , S i s−r ) = R 2 s−r,s−r /2 larger than a threshold.

Sensitivity analysis for multivariate normal distributions
In this section we assume that P S is MVN(µ, Σ), with mean µ = µ(S), and covariance Σ = Σ(S). Then the entries of the FIM are where all derivatives are taken at S = S 0 . This can also be written using vec notation as Now consider the N × s matrix (N = n + n 2 ) which is the derivative (linearisation matrix) of the mapping S → Θ(S) = (µ(S), vecΣ(S)) at S 0 . That is, if we let δµ = (δµ 1 , . . . , δµ n ) T and δvec(Σ) = (δΣ 11 , . . . , δΣ nn ) T , with δµ i = µ i (S + δS) − µ i (S) and δΣ ij = Σ ij (S + δS) − Σ ij (S), then If we also define the N × N matrix F as the Cholesky decomposition of the positive-definite matrix F (i.e. then we can write the Fisher information in (7) as Therefore, FL is a linear map from S to R N which sends the ·, · S 0 metric to the standard one in R N : δS, δS S 0 = δS T I(S 0 )δS = (FLδS) T (FLδS ) = δS T (FL) T (FL)δS and relates the FI metric in S to the standard one in R n+n 2 :
We now consider the (thin) SVD FL = W DV T (as described in section 1) to deduce some useful facts about the vector FLδS. Firstly, note that the N × 1 orthogonal column vectors W i of W and the s × 1 column vectors where the N = (n + n 2 )-dimensional vectors U i can be written as U i = (U µ i , U Σ i ) to reflect the correspondence of the first n entries to the mean vector µ and the last n 2 entries to the covariance matrix Σ.
By (10), up to terms that are O( δS 2 ), and therefore since the U i are orthonormal in the ·, · S 0 metric and the coefficient of These equations are the reason we call s = D 1/2 V T the sensitivity matrix. It describes how the distribution changes as the signal S is changed.
Similarly, up to terms that are O( δS 2 ), i.e. FL = W · s. We can now make a few useful observations: 1. Since the FIM is I(S 0 ) = s T s, the eigenvalues of I(S 0 ) are the squares of the singular values σ i of s and the right singular vectors of s, V i , are the eigenvectors of I. Further, since the column vectors V i of V are unit vectors the squared norm of the ith row of s, 2. Equation (12) shows that the change in the probability distribution M V N (µ(S), Σ(S)) produced by a change S → S + δS, according to the Fisher information metric, is a weighted sum of the vectors F(U µ j , U Σ j ) T with weights-coefficients σ j V T j δS. The change in the probability distribution is reflected to the mean through the U µ j directions and the covariance matrix through the U Σ j directions.
3. The coefficients σ j V T j δS are proportional to the singular values σ j and the inner products V T j δS. The latter are the coordinates of δS in the orthonormal basis of S defined by the columns of the matrix V . Therefore, if the singular values are chosen in non-increasing order, i.e. σ 1 ≥ · · · ≥ σ s , the largest change in the probability distribution, subject to fixed δS , occurs when the change δS is parallel to V 1 that corresponds to the largest singular value σ 1 . If the singular values decay fast, there are only a few directions of the signal space that can produce a relatively large change in the MVN(µ(S), Σ(S)) distribution (subject to fixed δS ). 4. Furthermore, the overall contribution of each coordinate δS i of δS in the change of the probability distribution is measured, according to the Fisher information metric, by s j=1 W j s ij = W s i . That is, if δS = e i with e i ∈ R s the usual unit vector with only non-zero entry e ii = 1, then the corresponding change in the probability distribution MVN(µ(S), Σ(S)) according to the Fisher information metric is

Optimality of s
The above sensitivity matrix is optimal for capturing as much sensitivity as possible in the low order principal components of δΘ = (δµ, δvec(Σ)) T . That is, for any (sensitivity) matrix s which for some orthogonal matrix U satisfies the sensitivity matrix s satisfies the following inequalities for all < k, i.e. among all such sensitivity matrices s squeezes as much of the sensitivity effect as possible into the lower i components.
For the above reasons, we call s ij , for j = 1, . . . , k, the principal coefficients of sensitivity of MVN(µ(S), Σ(S)) to changes in the ith component of the signal S i , i = 1, . . . , k.

The multiplexing capacities of a model
So far as modelling is concerned the typical situation is where the signals S i change some of the parameters θ j , j = 1, . . . , m of the model so that θ = θ(S). Assume that the parameters that are changed by the signals have been reordered so that ν 2 j = 2 · MX(θ 1 , . . . , θ j ) satisfy ν 1 ≥ ν 2 ≥ · · · ≥ ν s as can be done using the results of Sect. 2.4.1. Assume also that the signals have also been reordered so that ν j , where ν j 2 = 2 · MX(S 1 , . . . , S j ), similarly decrease. We show that then where dθ = dθ(S) is the derivative of θ with respect to S at S ∈ S as in Sect. 2.4.1. It follows from (15) that if these parameters have a large multiplexing capacity then signals for which dθ is well-conditioned will also multiplex well. To show eq. (15) we note that in this case the FIM of the parameters I θ and signals I(S) are related by equation (1). Consider the corresponding sensitivity matrices s θ and s(S), respectively. By equation (1) Therefore, the matrices s(S) and s θ have the same singular values and the same determinant.

Reaction Networks & Stochastic NF-κB model
Stochastic models of cellular processes in signaling and regulatory systems are usually described in terms of reaction networks. A system of multiple different molecular subpopulations has state vector, . . , n, denotes the number of molecules of each species at time t. These molecules undergo a number of possible reactions (e.g. transcription, translation, degradation) where the reaction of index j changes Y (t) to Y (t) + ν j , ν j ∈ R n . The vectors ν j are called stoichiometric vectors. Each reaction occurs randomly at a rate w j (Y (t)) (often called the intensity of the reaction), which is a function of Y (t) but may also depend on t. Therefore the state of the system at some time t is with Z j independent unit Poisson processes corresponding to the j-th reaction channel. The Stochastic Simulation algorithm (SSA) [4] use the properties of the above stochastic process to generate stochastic trajectories. These have exactly the same distribution with Y (t).
More specifically, the SSA, often referred as Gillespie algorithm, generates the time to the first next reaction, ∆t, at a given time t that has an exponential distribution with rate w 0 (Y (t)) = j w j (Y (t)) and the type of the next reaction that is of type j with probability w j (Y (t))/w 0 (Y (t)). Then the time t is updated to t + ∆t and the state is updated to become Y (t) + ν j . The same steps are repeated until t > T , for some time T > 0.

System size parameter
It is common in studying stochastic systems to introduce a system size Ω which is a parameter that occurs in the intensities of the reactions w j (Y (t)). The precise description of this parameter depends on the system. In population models it might be considered to be of the same order of magnitude as the total population size while in chemical systems and cellular biology a natural choice is to use molar concentrations and therefore regard Ω as Avogadro's number in the appropriate molar units (e.g. nM −1 ) multiplied by the volume of the reacting solution (e.g. the cell) in appropriate units (e.g. in litres (L)). In the NF-κB system, we consider it has units L/µM. The system size governs the size of the state fluctuations and therefore the size of the jumps. Larger system sizes generally imply less fluctuations and vice versa. In a certain sense the system size parameter is just a mathematical convenience to enable the study of the dependence of stochastic fluctuations upon system size. Indeed, the methods we develop in this paper can and should be applied to systems that do not involve a system size parameter but then it will be necessary to ensure that the population sizes achieved in the given system are large enough. While having a system size parameter is not necessary to apply our methods, it allows one to study the dependence of stochastic fluctuations upon system size and to calculate the deterministic equations that describe the evolution of the concentration vector X(t) = Y (t)/Ω in the limit of Ω → ∞ (see next section). A sufficient condition to derive this limit is that the rates w j (Y (t)) depend upon Ω (cf. [8,9,10]) as where u j (x) the macroscopic (Ω → ∞) rates that generally depend on the concentration vector x = x(t). For example, linear degradation rates have the form w deg (N ) = cN , where N stands for the relevant population number. The rate w deg (N ) is equal to Ωu deg (n), where u deg (n) = cn the macroscopic rate and n = N/Ω the corresponding concentration level. Similarly binding reaction rates have the form w bind (N 1 , N 2 ) = cN 1 N 2 = Ωu bind (n 1 , n 2 ) =cn 1 n 2 wherec = cΩ, and n 1 = N 1 /Ω, n 2 = N 2 /Ω.

The classical deterministic approximation
Assuming condition (17), the equation (16) describing the time evolution of Y (t) can be written in terms of X = Y /Ω as Using the law of large numbers, as Ω → ∞, we have that where here x(t) is the limit of convergence in probability for X(t) as Ω → ∞. Consequently, x(t) satisfies the ordinary differential equation (ODE) which is the classical macroscopic-deterministic approximation of the time-evolution of the concentration levels of the system. In matrix form, the n × r stoichiometry matrix and u(x(t)) = (u 1 (x(t)), u 2 (x(t)), . . . , u r (x(t))) T the r × 1 macroscopic rates vector. Herein we useẋ to denote the time-derivative dx/dt. Throughout we consider autonomous systems (i.e. F (t) = F (x(t)), but the method applies to non-autonomous systems with appropriate simple re-parameterisation (see for example [6]).

Stochastic NF-κB model
The model of NF-κB system used in this SI describes the oscillatory response of the system following stimulation by tumor necrosis factor alpha (TNFα). The system is initiated at its stable equilibrium state reached under the parameter settings of no stimulation (see Table  B). This is followed by a period of TNFα stimulation either continuous or in the form of repeated pulses (e.g. 5 min stimulation every 100 minutes). The stimulation cause a transient oscillation, which in the case of pulses is repeated after every pulse, and in the case of continuous stimulation relaxes to a stable limit cycle.
The model of NF-κB considered here is a slight modification of that in [2]. In particular, the redundant (no feedback) variable of the phosphorylated IκBα is removed as well as the phosphorylated complex of NF-κB and IκBα and the inactive IKK that are also redundant because the sum of IKK and NF-κB are fixed in the model in [2]. These changes ensure that there are no rank-deficiencies in the covariance matrices of the LNA of the system.
We also write the system in a form where concentrations are in terms of the same volume. The original system in [2] is written in cytoplasmic concentrations for all species except the nuclear NF-κB, IκBα and their nuclear complex that are written in nuclear concentrations and are set to be 3.3 larger than cytoplasmic concentrations. Here, the necessary adjustments to the system equations are made so that all concentrations are of the same scale and no further conversions are necessary.
The ODE model for the NF-κB system is therefore given in Table C. The solution of the ODE system is provided in Fig A. The reactions and their rates used for the SSA are provided in Table D    NFkB-IkBa association Half-max A20 inhibition concentration 0.0018 µM/L TNFα Tumor necrosis factor alpha level 10 ng/mL TNF-κB total NF-κB concentration 0.08 µM T IKK total IKK concentration 0.08 µM

The mNF-κB model
In addition to the variables and parameters in the base model, the mNF-κB model has the extra variables and parameters provided in tables F and K, respectively. As you can see in Free complex of modified NF-κB and phosphorylated IκBα 0.0001

Another model for NF-κB modification
We now consider another model for NF-κB modification where the NF-κB molecules that are bound by IκBα molecules, are transformed to another complex of modified NF-κB molecules bounded by IκBα molecules. For the modification to happen, a transition to an intermediate state needs to happen first. This transition is controlled by an independent signal S 2 . A similar mechanism using an intermediate state is used for the reverse modification. The diagram of the model, which we will call amNF-κB model, is depicted in Fig D(a), while the solutions of the ODE equations provided in table J are provided in Fig D. As we can see in Fig E(b), the singular values of the FIM of amNF-κB are larger than the singular values of the base and mNF-κB models. The multiplexing capacities are also increased with parameters related to this modification taking some of the higher values (see Fig 3.5(c)). The sensitivity coefficients are generally increased with those parameters related to the modification having high sensntiviity (see Fig 3.5(d)).

LNA, pcLNA and calculating the FIM
We next describe the LNA [1] and pcLNA [13] as well as how to derive the FIM using pcLNA.

The linear noise approximation (LNA)
The LNA ansantz describes the state X(t) at some time t as the sum of the deterministic solution x(t) of (19) and the noise, ξ(t), scaled by Ω −1/2 , i.e.
The noise satisfies the linear stochastic differential equation (sde) and W t an n × 1 Wiener process (i.e. W t is a random vector of independent one-dimensional Wiener processes). Here J (x(t)) is the Jacobian matrix of (19), i.e.
M the stoichiometry matrix, and U (x) the diagonal matrix with main diagonal the rates vector u(x). The noise sde can be solved and the solution can be written as where C(t 0 , t) is the fundamental matrix of (19), i.e. C(t 0 , t) is the solution of the initial value problemĊ = J C, C(t 0 , t 0 ) = I, and the symmetric positive-definite matrix V (t 0 , t) is the solution of the initial value probleṁ and the LNA anstantz in (20) that P LNA (X(t)|ξ It is worth noting that the LNA can be derived by applying the Central Limit Theorem, as Ω → ∞, to the Poisson random variables in (18) (see [1], [8], [9]). It was also derived by van Kampen (see [10]) as an expansion, in powers of Ω, of Kolmogorov's forward equations (also called Master equation) of the Poisson process Y (t) in (16) ignoring terms of order O(Ω −1/2 ).

Phase corrected LNA (pcLNA)
The pcLNA method described below is very similar to this in [13]. However, for clarity, we provide a description more appropriate to the current context. Unlike [13], we consider solutions γ given by x = g(t), t ≥ 0, of (19) without assuming that γ is a limit cycle. However, especially for low Ω, that is where the fluctuations are large, the method provides better approximation when the underlying deterministic solutions are such that transversal sections are well defined (e.g. oscillatory systems). We first provide several definitions relevant to the theory developed herein and in I.
Transversal sections Consider the solution γ of the ode in (19) given by x = g(t), t > 0. By a transversal section through γ we mean an (n − 1)-dimensional linear hyperplane S x containing x and transversal to the tangent vector, F (x), to γ at x. A particular example is the hyperplane normal to γ at x. A transversal system is a family S g(t) of transversal sections that vary smoothly with t in the sense that the unit normal vector to S g(t) varies smoothly with t.
A transversal system defines a mapping G of a neighbourhood of γ onto γ where if X ∈ S x then G(X) = x ∈ γ. In cases where X= X(t) lies in more than one transversal sections, then G(X(t)) = x(s) with s = arg min {s : G(X(t))=x(s )} |s − t| the closest time to t. We denote this mapping for the normal transversal system by G N .
Computation of the normal mapping G N (X(t)) To compute the mapping G N (X) = x ∈ γ, of X = X(t), for a given solution x = g(t), t > 0, of the ode in (19), we wish to minimise the squared distance X − g(s) 2 with respect to s > 0. Note that the minimum s is achieved when i.e. when the tangent vector F (g(s)) is orthogonal to X − g(s) and thus X lies on the normal transversal section S g(s) . To find the appropriate s, we first use a Newton's optimisation to derive the approximate limit of the sequence s n+1 = s n − h(s)/ḣ(s), n = 0, 1, . . .
for different s 0 , stopping the search when the square of the step h(s)/ḣ(s) is appropriately small. If different limits are achieved for different s 0 , we choose the one that is closest to t.
Note that other methods of optimisation could be used here.
Adapted coordinate systems An adapted coordinate system C g(t) at a point g(t) on γ is one determined by a set of orthonormal basis vectors e 1 (t), . . . , e n (t) with e 1 (t) the unit normal vector to S g(t) and the vectors e 2 (t), . . . , e n (t) forming an orthonormal basis of S g(t) . If these are defined for t in some interval in R + then we always assume that the e i (t) have smooth (i.e. C 2 ) dependence upon t. It is important that the coordinates are defined by an orthonormal basis in the original coordinates because this effectively preserves the covariance matrix in the sense that a covariance matrix V in the adapted coordinates is RV R T in the original coordinates with R a real orthogonal matrix. In particular, the eigenvalues are preserved.
The state X of the process (written in original coordinates corresponding to concentrations of each reacting molecule) relates to the adapted coordinates (X (1) , X (2) ) at x = g(t) with Transversal points Consider a stochastic trajectory X(t i ), i = 0, 1, 2, . . . and let g(t), t ≥ 0, be a solution of (19) with g(t 0 ) = g 0 . Suppose that the stochastic trajectory has initial condition X(0) in the neighbourhood of g 0 . For the SSA trajectories presented in I, this is satisfied by taking X 0 = g 0 , but this is not necessary. Find the transversal section S g(S 0 ) for which G N (X(t 0 )) = g(s 0 ) and S 0 = arg min {s :G N (X(t 0 ))=g(s )} |s − t 0 |. Then, for i = 1, 2, . . . , define the transversal sections S g(s i ) , i = 1, 2, . . . , such that Suppose that we wish to find the point, Q x , at which the stochastic trajectory intersects the transversal section S x=g(s) where s ∈ (s i , s i+1 ) for some i > 0. Then some form of interpolation between X(t i ) and X(t i+1 ) needs to be used. For the SSA trajectories presented in I, we first use that there are no jumps between t i and t i+1 , then write X(t i ) in coordinates (r i , q i ) adapted at g(s), i.e. use the C g(s) coordinates, and take the transversal point Q x = q i .
Note that backward moves and therefore multiple intersections to the same transversal section are possible. However, as we have shown in [13], the distribution of the number of intersections decreases exponentially and the distribution of the transversal points in different intersection is indistinguishable. Therefore, hereafter, we assume that in each trajectory there is only a single intersection to each transversal section, S x , of x ∈ γ, called transversal point Q x .

The pcLNA distribution
Consider a solution g(t), t > 0 of (19) that satisfies g 0 = g(0) and a point x = g(t) of this solution. Let S x=g(t) be the corresponding transversal section at x = g(t). We wish to approximate the distribution of the transversal point Q x=g(t) as defined in section 4.2. We use the adapted coordinates C x and write the state of the system at time t as X(t) = (X (1) (t), X (2) (t)) T , with X (2) (t) ∈ S x . Then the transversal distribution P (Q x |X(0)) is estimated in pcLNA by the distribution P LNA (X (2) (t)|X(0)). This is the LNA distribution of the state of the system on the transversal section S x=g(t) at time t.
To compute this distribution, henceforth called pcLNA distribution and written (27) in the original coordinates (that correspond to concentration levels of each of the reacting molecules). Then, find an orthogonal matrix R x = [R (1) x R (2) x ] such that R T x (X(t) − x) = (X (1) and the moments of the noise process on the transversal are with m t , V t given in (26). This pcLNA distribution is compared with the empirical distribution of transversal points derived from SSA simulations as explained in the previous section. The results are presented in I Figure 4 and demonstrate that the two distributions are hardly distinguishable. Note that the above pcLNA distributions are slightly simpler than those presented in [13]. Here the marginal LNA distributions of X (2) are used instead of the conditional distributions (X (2) |X (1) = 0). The performance of the two distributions compared to the SSA simulations are very similar with differences identified only during the initial transient response of NF-κB to TNFα signals. The simpler marginal distributions provided here give slightly better fit during this period due to their extra variability that fits better the SSA simulations. The properties of pcLNA distributions, in terms of accuracy compared to SSA, robustness as t → ∞, and analytical tractability are thoroughly discussed in [13].

The pcLNA joint distribution of multiple transversals
We now wish to approximate the joint distribution of multiple transversal points P (Q 1:k |X(0) ∼ M V N (µ 0 , Σ 0 )) = P (Q x 1 , Q x 2 , . . . , Q x k |X(0) ∼ M V N (µ 0 , Σ 0 )) where x i = g(t i ) and 0 ≤ t 1 ≤ · · · ≤ t k . This can be written as a product of transition probabilities The first transition probability on the rhs of the above equation, as explained in the previous section, is approximated by the M V N (µ x 1 , Σ x 1 ) distribution, by taking µ 0 = x 0 + Ω −1/2 m 0 , Σ 0 = S 0 /Ω and x = x 1 = g(t 1 ). To derive the approximation of the transition probabilities from Q x i to Q x i+1 note that x i X (2) (t i ))), by (28).
We assume that the term describing the transition through C(t i , t i+1 ) of the tangental fluctuations at time t i , R x i X (1) (t i ), to fluctuations on the transversal coordinate, R (2) x i+1 , at time t i+1 , i.e.
is small compared to the transition of transversal fluctuations and ignore it to derive that More generally, the pcLNA transition probability with where (again ignoring the tangental fluctuations transduced on the transversal) Then, the pcLNA joint distribution is and by adding and subtracting the term C (2,2) (t 1 , t 2 )µ x 1 in the second term of the rhs above, we have that this can be written as the kernel of a M V N distribution with mean vector and precision matrix (inverse of covariance matrix) and precision matrix, A x 1:k , with a block tridiagonal form where the diagonal blocks, A (ii) x 1:k , i = 1, 2, . . . , k − 1, and lower diagonal entries, A (i+1,i) To write the pcLNA joint distribution in the original coordinates we can use the transformation x 1:k + R 1:k Q 1:k where x 1:k = (x 1 , x 2 , . . . , x k ) T , R 1:k = Diag R (2) t 1 , . . . , R (2) t k ) to get the mean vector µ 1:k = x 1:k + R 1:k µ x 1:k (29) and precision matrix As discussed above, the pcLNA joint distributions have been shown to have hardly distinguishable distributions to the empirical transversal distributions derived by SSA in the NF-κB system (see Figure 4 in I and [13]).

pcLNA simulation
The approach is to amend the LNA Ansatz X(t) = g(t)+Ω −1/2 ξ(t) to X(t) = g(s)+Ω −1/2 κ(s) where g(s) = G N (X(t)) and to use resetting of t to s to cope with the growth in the variance of phase deviations keeping the LNA fluctuation κ(s) normal to γ. While for free-running oscillators the variance of ξ(t) grows without bound as t increases, κ(s) has uniformly bounded variance.
The pcLNA simulation algorithm iteratively uses standard LNA steps of length ∆τ to move from a state X(s i−1 ) to a new state X(s i−1 + ∆τ ) = X i , i = 1, 2, . . . . After each LNA step, the phase of the system is reset or "corrected" such that g(s i ) = G N (X i ) and the (global) fluctuations ξ(s i−1 + ∆τ ) = Ω −1/2 (X i − g(s i−1 + ∆τ )) are replaced by the normally transversal fluctuation κ(s i ) = Ω −1/2 (X i − g(s i )) which are MVN distributed and, as we showed in the previous section, approximate well the transversal fluctuations under the exact Markov Jump process.
The steps of the pcLNA simulation algorithm are described next in more detail (see also Fig 7).

For iteration
(c) set s i to be such that G N (X i ) = g(s i ) and κ i = Ω 1/2 (X i − g(s i )).
In the for loop of step 3, C i = C(s i−1 , s i−1 + ∆τ ) and V i = V (s i−1 , s i−1 + ∆τ ) that are derived as solutions of (24) and (25). The simulated sample X i corresponds to time t i = t 0 + i∆τ , i = 1, 2, . . . , where t 0 is the initial time. The time t i is not necessarily equal to the phase s i , defined by the relation g(s i ) = G N (X i ), which is stochastic and has variance linearly increasing with the time step ∆τ .
If one wants to record simulated trajectories at a finer time-scale than ∆τ then one can run the algorithm with ∆τ replaced by ∆τ /M for some integer M > 1 and only carry out the phase correction in step 3(c) every M th step and at all the other steps just proceeding as in the standard LNA (ignoring step 3(c)). This gives the same distribution as if the intermediate points had not been calculated because of the transitive nature of the LNA i.e. the distribution P LNA (X(s + t)|X(0)) is equal to the distribution P LNA (X(t)|X(s) ∼ P LNA (X(s)|X(0))). We have noticed that roughly 3-5 corrections per peak-to-peak time typically give good balance between speed and precision. Having calculatedκ(s i−1 ) one uses κ(s i−1 ) as the initial state and updates it using the LNA and a time-step ∆τ to obtain ξ at s i−1 + ∆t. Then ξ(s i−1 + ∆τ ) is replaced by κ(s i ) so that g(s i−1 + ∆τ ) + Ω −1/2 ξ(s i−1 + ∆τ ) = g(s i ) + Ω −1/2 κ(s i ) where κ(s i ) is normal to the limit cycle. Therefore, s i gives the phase of κ(s i ) and the corresponding time is t i = t 0 + i ∆τ .

Channel capacity estimation
In our comparisons to the experimental results of the NF-κB system response to TNFα stimulus in the section "Mutual Information" and Figure 4(d) of I, we allowed the total number of NF-κB molecules to vary in different cells-simulated trajectories. Specifically, at the beginning of the simulation of each trajectory, we first generated the total number of NF-κB molecules from a log-Normal distribution that is previously used in [16]. This has pdf x > 0 and the parameters A = 10 5 , σ = 1/ √ 2, µ = −1/4 are chosen to give a mean number of molecules equal to 10 5 that agrees with the fixed value assumed in the deterministic model used in [2] as well as the rest of the models used here. The log-Normal is chosen to give a positive support (i.e. only positive number of the total number of molecules are possible), a single mode value and tunable variance.
After generating the number of NF-κB molecules, we used the pcLNA simulation algorithm to obtain a stochastic trajectory. We initiated the stochastic simulation from the equilibrium point under no dose and immediately applied a continuous TNFα dose. We then obtained the nuclear NF-κB concentration at t = 30min.
The dose was assumed to have a discrete distribution with weights obtained using a constrained optimisation performed by the fmincon function in MATLAB [11].
We show the results assuming that the TNFα dose has only two levels (high and low), but the use of medium doses did not change the estimation substantially.

4.5
Channel capacity under the stochastic model of NF-κB in Tay et al. [16] We used the code provided by [16] to run 500 simulations over a high (10ng/mL) and a low (0.5ng/mL) TNF dose. Similarly to above, we collected the nuclear concentration of NF-κB at t = 30min after the initiation of the continuous TNFα stimulus. The histograms are provided in Fig G. The channel capacity was estimated to be 0.72 assuming that the dose has a discrete distribution. The optimal weights of the dose distribution were derived using a constrained optimisation performed by the fmincon function in MATLAB [11].  [16] Histograms of the n = 500 stochastic simulations computed using the code provided in [16] at time t = 30min after the beginning of continuous TNFα stimulation.

Simulation of EGR1 and COX-2 genes.
Here we explain the stochastic simulation of copies of mRNA molecules of the EGR1 and COX-2 genes as targets of the NF-κB. R1C2 As explained in I, we use a similar model as the one described in SI section 3.4, the difference being that the modification is now promoted by the active state of IKK. More specifically, in equation 12 in table G the term p a1 S 2

Nc
Nc+k pnf is replaced with p a1 × K a × S 2

Nc
Nc+k pnf and the parameters p a1 = 0.0025, S 2 = 1, k pnf = 0.01. The reverse modification parameters are also slightly changed to k pnf p = 0.01, p d1 = 1 × 10 −5 . Because the model does not include any feedback of these genes to the NF-κB system, they can be simulated separately to the base system, and this makes the hybrid algorithm explained below somewhat simpler than hybrid algorithms such as those discussed for example in [17]. We therefore used the pcLNA simulation algorithm to derive stochastic trajectories of the base system and then applied the SSA to generate stochastic simulations for the EGR1 and COX-2 genes. The reason for using SSA in the latter simulation rather than LNA or pcLNA is that EGR1 and COX-2 activation is highly stochastic and not oscillatory. This method proved to be fairly efficient for the purposes of our demonstration, as we exploited the computational advantages of using pcLNA in the base system, while SSA simulations of only two genes expressed in similar scales proved to also be fairly efficient.
The algorithm proceeds as follows.
) be a trajectory of the TF (in molecule numbers) derived using an appropriate stochastic simulation algorithm (e.g. pcLNA). Define the continuous-time function Y F (t), t ≥ 0 of the observed at discrete times Y F using an appropriate (smooth) interpolation method. Also define the degradation, u degr (δ, Y g ), and transcription, u tran (τ, Y F ), rate functions respectively. The algorithm proceeds as follows: 1. initialise the gene expression Y g := Y g (0) and set t := 0, 2. generate the next time of degradation event, t 1 ∼ t + Exp(u degr (δ, Y g )), 3. generate u ∼ U ([0, 1]) and integrate the transcription rate function up to last time t 2 for which I = t 2 t u tran (τ, Y F (s))ds + log(u) ≤ 0.
Note that a Gibson-Bruck type of this algorithm (see [17,18]) could be used alternatively, but will make little difference for the purposes of our demonstration. In our simulation, the degradation reaction rate of EGR-1 and COX-2 are and the transcription rates where Y A the state of the activator TF and Y R the state of the inhibitor TF. As shown in I  Fig 4(a), the activator of EGR1 is the (unmodified) nuclear NF-κB , N n , and the inhibitor its modified form, (N m ) n and for COX-2 vice versa. The parameter values used for EGR1 are δ 1 = 1 × 10 −3 , τ = 8 × 10 −7 , k A = 0.03, k R = 0.03 and for COX-2 are δ 2 = 5 × 10 −6 , τ = 2 × 10 −7 , k A = 0.015, k R = 0.005, k g 2 = 1 × 10 −3 , also h = 5.

Fisher Information Matrix (FIM) under the pcLNA
To compute FIM under the pcLNA, we need to compute derivatives of the solution x = g(t), and the matrices C(t 0 , t), V (t 0 , t) with respect to the parameter vector θ.
The Jacobian matrix J (x) = J (x; θ) at x = η(t; t * , x * , θ) has entries Define the first and second derivatives of the solution x = η(t; t * , x * , θ) with respect to the initial conditions and parameters As we demonstrate below, the first derivative provides the columns of the fundamental matrix C(t * , t). The second provides the derivatives of the ode solution wrt to the parameters. The third provides the derivatives of the entries of the fundamental matrix wrt to the j-th parameter. All three are useful in computing the derivatives required for the FIM computation. The derivative u k in (31) is the solution of the initial value problem where e l = 0, for l = k, e l = 1, for l = k. Proof.
Note that the principal fundamental matrix C is the solution of the latter initial value problem. The matrix can be written as C(t * , t) = (u 1 (t), u 2 (t), . . . , u n (t)).
Furthermore, v j in (31) is the solution of the initial value problem where h j (t) = ϑF ϑθ j (note: this is the partial derivative of F to θ, i.e. x is taken as constant to θ). Proof.
Furthermore, w jk in (31) can be derived as the solution of the initial value problem, Also, Finally, if V = V (t * , t) in (25), the n × n matrix X j = dV /dθ j can be derived as the solution of Proof. We differentiate (25) wrt to θ j to get The derivative dJ /dθ j is derived in (34), while ϑ ϑθ j U (η, θ) is a diagonal matrix with main diagonal entries, ϑu l ϑη T ϑη ϑθ j + ϑu l ϑθ j = ϑr l ϑη T v j + ϑr l ϑθ j) .

FIM computation for pcLNA joint distributions
The multivariate normality of the pcLNA joint distributions P pcLNA (Q x 1 , Q x 2 , . . . , Q x k |X(0) ∼ M V N (µ 0 , Σ 0 )), for k = 2, . . . , allows us to compute the corresponding FIM as in (7) by computing the derivatives of its mean vector µ and precision matrix A in (29) and (30), respectively. As part of this, the derivatives of ϑx(t i )/ϑθ, ϑC(t i , t i+1 )/ϑθ, and ϑV (t i , t i+1 )/ϑθ are derived by solving the initial value problems in (32), (33) and (35), respectively. The derivatives ϑR (2) t /ϑθ are derived by using the fsolve function of MATLAB to derive a solution of the linear problem with constraints defined by differentiating the equations F (x(t)) T R

NF-κB target genes
We used the list of NF-κB target genes provided in [5]. This comprises of the list of target genes taken from the Boston University web site at http://www.bu.edu/nf-kb/gene-resources/targetgenes/ 1 , combined with target genes found via an extensive ChIP-seq search performed in [15]. There are a total of 1361 genes in this list.
The profile data of the 8 target genes are displayed in Fig I and J. Fig K illustrates how these 8 genes can be used to distinguish between 14 different experimental conditions. An approximate expression profile alphabet is provided, which includes profiles that can be described as no response, or early/medium/late, sustained or not sustained response. These profiles are identified by clustering [19]. Then we see on the table that each of these 14 different experimental conditions gave a different response combination over these 8 genes.